Paper Reading: Human Pose Regression by Combining Indirect Part Detection and Contextual Information


Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.

Diogo C. Luvizon, Hedi Tabia, David Picard

ETIS Lab., UMR 8051, Universit´e Paris Seine, Universit´e Cergy-Pontoise, ENSEA, CNRS.

Soft-argmax

Φ(hi,j)=ehi,jk=1Wl=1Hehk,l\Phi(\mathbf{h}_{i,j}) = \frac{e^{\mathbf{h}_{i,j}}}{\sum^W_{k=1} \sum^H_{l=1}e^{\mathbf{h}_{k,l}}}

Ψd(h)=i=1Wj=1HWi,j,dΦ(hi,j)\Psi_d(\mathbf{h}) = \sum^W_{i=1} \sum^H_{j=1} \mathbf{W}_{i,j,d} \Phi (\mathbf{h}_{i,j})

Wi,j,x=iW,Wi,j,y=jH\mathbf{W}_{i,j,x} = \frac{i}{W}, \mathbf{W}_{i,j,y} = \frac{j}{H}

Predicted point of detection heat maps (or context heat maps):

y=(Ψx(h),Ψy(h))T\mathbf{y} = (\Psi_x(\mathbf{h}), \Psi_y (\mathbf{h}))^T

Joint Probability

sigmoid activation on the global max-pooling from heat map h_n

Detection and context aggregation

yn=αynd+(1α)i=1Ncpi,ncyi,nci=1Ncpi,nc\mathbf{y}_n = \alpha \mathbf{y}_n^d + (1-\alpha) \frac{\sum^{N_c}_{i=1} \mathbf{p}^c_{i,n} \mathbf{y}^c_{i,n}}{\sum^{N_c}_{i=1} \mathbf{p}^c_{i,n}}

where ynd=Soft-argmax(hnd)\mathbf{y}_n^d = \textit{Soft-argmax}(\mathbf{h}_n^d) and ync=Soft-argmax(hnc)\mathbf{y}_n^c = \textit{Soft-argmax}(\mathbf{h}_n^c)

Training

Joint Regression Loss

Ly=1NJn=1NJyny^n1+yny^22L_{\mathbf{y}} = \frac{1}{N_J} \sum^{N_J}_{n=1} \lVert \mathbf{y}_n - \hat{\mathbf{y}}_n \rVert_1 + \lVert \mathbf{y}_n - \hat{\mathbf{y}} \rVert^2_2

Joint Probility Estimation

Lp=1NJn=1NJ[(pn1)log(1p^n)pnlogp^n]L_{\mathbf{p}} = \frac{1}{N_J} \sum^{N_J}_{n=1} \left[ (\mathbf{p}_n - 1) \log (1-\hat{\mathbf{p}}_n) - \mathbf{p}_n \log \hat{\mathbf{p}}_n \right]


Author: Texot
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Texot !
  TOC