Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.
Network

Spatio-temporal inference layer
Modeling



Inference



Solved via generalized distance transform
Learning
- Training fully convolutional layers
- Joint training with flow warping and inference layers
Training fully convolutional layers
Joint training with flow warping and inference layers
Result Analysis
- (Fact, not model feature) parts such as elbows and wrists are the most flexible joints of our body. This flexibility can yield configurations with very large variation and these joints are also prone to be occluded by other parts of the body.
- Note that predictions for shoulders can be negatively influenced by sending or receiving messages from elbows through spatial inference only