Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.
- Predict location and scale box (Faster RCNN detector)
- Estimate keypoints
- Detecting keypoints: ResNet
- To combine outputs, introduce aggregation procedure to obtain hightly localized predictions
- Using keypoint-based NMS instead of curder box-level NMS
- Using keypoint-based confidence score estimation, instead of box-level scoring
- Top-down approach
- Faster-RCNN method on top of a ResNet-101 CNN, as in J. Huang
- ResNet: predict activation heatmaps and offsets for each keypoint, similar to L. Pinshchulin and E. Insafutdinov combining their predictions using a novel form of heatmap-offset aggregation
- Avoid duplicate pose detections by keypoint-based NMS
- Propose keypoint-based confidence score estimator rather than using Faster-RCNN box scores
J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al. Speed/accuracy trade-offs for modern convolutional object detectors. arXiv:1611.10012, 2016.
L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. Gehler, and B. Schiele. Deepcut: Joint subset partition and labeling for multi person pose estimation. In CVPR, 2016.
E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In ECCV, 2016.
This part of the paper is comprehensive, great
A. Bulat and G. Tzimiropoulos. Human pose estimation via convolutional part heatmap regression. In ECCV, 2016.
V. Belagiannis and A. Zisserman. Recurrent human pose estimation. In arxiv, 2016.
- Infer part relationships
- Infer pairwise joint locations
G. Gkioxari, A. Toshev, and N. Jaitly. Chained predictions using convolutional neural networks. In ECCV, 2016.
- Inspired by work in sequence-to-sequence
- Predicted sequentially rather than independently
- Parts conditioned on all other parts
Person Box Detection
Person Pose Estimation
- Keypoint Disk Heatmap
- Offset Field
Keypoint Disk Heatmap
Hough voting: each point j in the image crop grid casts a vote with its estimate for the position of every key-point, with the vote being weighted by the probability that it is in the disk of inﬂuence of the corresponding keypoint
OKS-Based Non Maximum Suppression
For person detector, use OKS instead of IOU to eliminate reduplicated detections
Heatmap Loss: sum of logistic losses for each position and keypoint separately
H is Huber robust loss
At test time, not use person detector(for generate bounding box) confidence as score