Paper Reading: Learning Feature Pyramids for Human Pose Estimation


Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.

Intuition

inter-personal body shape variations and foreshortening

Contribution

Framework

  • Stacked Hourglass Network as the basic network structure
  • Building block is Pyramid Residual Module (PRM)

Hourglass

  • Hourglass can only capture one scale
  • PRM can capture multi-scale

Pyramid Residual Module (PRM)

Generation of input feature pyramids

Fractional max-pooling in generating traditional image pyramids

Discussions

  • PRM can be used as basic building block for various CNN architectures (Stack Hourglass, Wide Residual Nets, ResNeXt)
  • Variants
    • Separate input feature maps, fractional max-pooling, convolution, upsampling, summation (PRM-A)
    • Shared input feature maps, fractional max-pooling, convolution, upsampling, summation (PRM-B)
    • Shared input feature maps, fractional max-pooling, convolution, upsampling, concatenation (PRM-C)
    • Shared input feature maps, dilated convolution, summation (PRM-D)
  • Weight sharing
    • Sharing weights across different levels of pyramid f_c(.)
  • Complexity
    • 128-d within residual unit instead of original 256-d
    • Fewer feature channels for branches with smaller scales

Training

Loss:

L=12n=1Nk=1KSkS^k2\mathcal{L} = \frac12 \sum^N_{n=1} \sum^K_{k=1} \left\Vert \mathbf{S}_k - \mathbf{\hat{S}}_k \right\Vert^2

Inference:

z^k=arg maxpS^k(p),k=1,,K\mathbf{\hat{z}}_k = \argmax_\mathbf{p} \mathbf{\hat{S}}_k (\mathbf{p}), \quad k=1,\dots,K

Initialization Multi-Branch Networks

Forward propagation

y(l)=W(l)c=1Ci(l)xc(l)+b(l)x(l+1)=f(y(l))\begin{aligned} \mathbf{y}^{(l)} &= \mathbf{W}^{(l)} \sum^{C_i^{(l)}}_{c=1} \mathbf{x}_c^{(l)} + \mathbf{b}^{(l)} \\ \mathbf{x}^{(l+1)} &= f \left(\mathbf{y}^{(l)} \right) \end{aligned}

Thus

Backward propagation

Thus

Combined

Output Variance Accumulation

Residual unit

Hourglass residual unit

Solution


Author: Texot
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Texot !
  TOC