Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.

## Network

**Spatio-temporal inference layer**

### Modeling

### Inference

Solved via generalized distance transform

## Learning

- Training fully convolutional layers
- Joint training with ﬂow warping and inference layers

### Training fully convolutional layers

$f = \sum^K_{i=1} \sum_p \left\Vert b^i (p) - b_{*}^i (p) \right\Vert^2$

### Joint training with ﬂow warping and inference layers

$f = \sum_{i=1}^K \sum_p \max \left( 0, 1-b^i (p) \cdot I^i (p) \right)$

## Result Analysis

- (Fact, not model feature) parts such as elbows and wrists are the most ﬂexible joints of our body. This ﬂexibility can yield conﬁgurations with very large variation and these joints are also prone to be occluded by other parts of the body.
- Note that predictions for shoulders can be negatively inﬂuenced by sending or receiving messages from elbows through spatial inference only