Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.
Title: Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
This work use adversarial network to improve the data augmentation process during training. It introduces how the data augmentation network works and how it is jointly trained with a pose estimater.
Abstract
Jointly training data augmentation and network, the data augmentation will try to generate “hard” examples, exploring the weakness of the target network
Introduction
-
Data costly
-
Natural images follow a long-tail [1,2] Distribution
[1] X. Zhu, D. Anguelov, and D. Ramanan. Capturing long-tail distributions of object subcategories. In CVPR, 2014.
[2] Z. Tang, Y. Zhang, Z. Li, and H. Lu. Face clustering in videos with proportion prior. In IJCAI, 2015.
Random aug. problem:
-
Not considering individual diff
-
Not matching training status (static distrib.)
-
Gaussian distrib. not addressing long-tail issue
Proposed
-
Adversarial learning
-
Augmentation network as generator
- create “hard” augs to make pose estimater fail
-
Pose network as discriminator
-
Generate adversarial data augs online
-
conditioned on
-
input images
-
training status
-
-
-
Transformations:
-
Straight forward design is problematic in convergence
- generating adversairal pixels, deformations
-
Instead, to sample scaling, rotating, occluding and so on to create data points
-
-
reward and penalty policy (addressing the issue of missing supervisions)
-
Augmentation network takes hierarchical features of pose network as input, instead of a raw image
Adversarial Data Aug
Augmentation network
- generate hard augs
Pose network
-
learn from adversarial augmentations
-
evaluate quality of generations

Generating path
outputs adversarial aug result
Random aug result
Maxmizing

means the generation of is conditioned both on image and current status of target network
is a predefined loss function and is annotation
Discrimination path
play two roles:
-
evaluates generation quality as indicated in Eq. 1
-
Learn from adversarial generations
Minimizing

Joint training
Aug op not differentiable
Propose a reward and penalty policy to create online GT of
Adversarial Human Pose Estimation
Adversarial Scaling and Rotating (ASR)
divide the augmentation ranges (scaling and rotating parameters?) into and bins, each corresponds to a small bounded Gaussian
predict distributions over bins
sampling from distributions

ASR pre-training
-
For every training image, sample totally augs, each drawn from a pair of Gaussians
-
Fed into pose network to calc the loss (representing difficulty of this aug)
-
Accumulate these losses into corresponding bins (result in sum)
-
Normalizing the sum, generate two vectors of probabilities: and , approx the GT of scaling and rotating distrib
-
Given approx GT distrib and , use KL-divergence loss to pre-train the aug net

( is distribution predicted by aug net)
Advantages of predicting distribs instead of direct aug:
-
introduce uncertainties to avoid upside-down aug
-
help to address missing GT issue during joint training
Adversarial Hierarchical Occluding (AHO)
-
Occluding part of the image(features), encouraged to learn strong references among visible and invisible joints.
-
more effective to occlude features than pixels
-
generate a mask (, then scaled up to scales of features) indicating occluded part
AHO pre-training
aug net predict an occluding distrib. instead of mask
To create GT of occluding distrib.:
-
vote a joint to one of () , indicating importance
-
counting and normalizing, generate
Pre-train:

Joint Training of Two Networks
Reward and penalty
Update according to current status of target net
-
evaluated by comparing with a reference
-
if adversarial aug is harder
-
rewarding, by increasing probability of the bin or cell
-
-
-
Otherwise
-
penalizing, decreasing probability
-
-
Equally split every mini batch into three shares:
-
random
-
ASR
-
AHO

