# Paper Reading: Learning Feature Pyramids for Human Pose Estimation

Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.

## Intuition

inter-personal body shape variations and foreshortening

## Contribution ## Framework • Stacked Hourglass Network as the basic network structure
• Building block is Pyramid Residual Module (PRM)

### Hourglass

• Hourglass can only capture one scale
• PRM can capture multi-scale

### Pyramid Residual Module (PRM)   ### Generation of input feature pyramids

Fractional max-pooling in generating traditional image pyramids Discussions

• PRM can be used as basic building block for various CNN architectures (Stack Hourglass, Wide Residual Nets, ResNeXt)
• Variants
• Separate input feature maps, fractional max-pooling, convolution, upsampling, summation (PRM-A)
• Shared input feature maps, fractional max-pooling, convolution, upsampling, summation (PRM-B)
• Shared input feature maps, fractional max-pooling, convolution, upsampling, concatenation (PRM-C)
• Shared input feature maps, dilated convolution, summation (PRM-D)
• Weight sharing
• Sharing weights across different levels of pyramid f_c(.)
• Complexity
• 128-d within residual unit instead of original 256-d
• Fewer feature channels for branches with smaller scales

## Training

Loss:

$\mathcal{L} = \frac12 \sum^N_{n=1} \sum^K_{k=1} \left\Vert \mathbf{S}_k - \mathbf{\hat{S}}_k \right\Vert^2$

Inference:

$\mathbf{\hat{z}}_k = \argmax_\mathbf{p} \mathbf{\hat{S}}_k (\mathbf{p}), \quad k=1,\dots,K$

### Initialization Multi-Branch Networks ### Forward propagation

\begin{aligned} \mathbf{y}^{(l)} &= \mathbf{W}^{(l)} \sum^{C_i^{(l)}}_{c=1} \mathbf{x}_c^{(l)} + \mathbf{b}^{(l)} \\ \mathbf{x}^{(l+1)} &= f \left(\mathbf{y}^{(l)} \right) \end{aligned}

Thus ### Backward propagation Thus  ### Combined ### Output Variance Accumulation

#### Residual unit #### Hourglass residual unit #### Solution Author:
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Texot !
TOC