Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.

Network
- Generator
- Fully convolutional network with residual blocks and a conv-deconv architecture (Hourglass)
- Discriminator
- Same as Generator, except
- inputed with RGB image and heatmaps
- output heatmaps used for distinguish real from fake
- Distinguish reconstructed input heatmaps
- Same as Generator, except
Generator Loss
Loss = Adversial Loss + Error Loss ( Generated - GT )
Discriminator
Reconstruct a new set of heatmaps. Quality of reconstruction is determined by how similar to the input heatmaps. (Same notion as autoencoder) Loss is error between input heatmaps and recontructed heatmaps.
Training
Minimize
- Minimize the error between GT and Recontructed
- Maximize the error between Generated and Reconstructed
- The value means how good the confidence of this pixel is.
- It offers detailed ‘comments’ on the input heatmaps and suggests which parts in the heatmaps do not yield a real pose
When the generator gets better than the discriminator, i.e., L_fake is smaller than γ L_real , the generated heatmaps are real enough to fool the discriminator. Hence, k_t will increase, to make the term L_fake more dominant, and thus the discriminator will be trained more on recognizing the generated heatmaps
Adversarial Training

Note
- Fewer original points
- Replicating description of contribution done by others
- Too detailed description of trivial and same things as in other experiments