Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.

This work introduces relation and attention strategy (Scaled Dot-Product Attention) as a module to model the location relation between objects.

Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei

# Scaled Dot-Product Attention

from

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.

# Relation Module

Given input set of $N$ objects $\{(\mathbf{f}_A^n, \mathbf{f}_G^n)\}^N_{n=1}$

*geometric feature*: $\mathbf{f}_G$ (4-dimensional object bounding box)

*appearance feature*: $\mathbf{f}_A$ (task dependant)

**Relation Module**:

realation feature $\mathbf{f}_R(n)$ of the whole object set with respect to the $n^{th}$ object is

where

($w_G^n$ is extra compared to attention model in [1])

where

where

# Duplicate removal network

Duplicate removal is a two class **classiﬁcation** problem. For each ground truth object, only one detected object matched to it is classiﬁed as * correct*. Others matched to it are classiﬁed as

*. This classiﬁcation is performed*

**duplicate****via a network**, which output binary classiﬁcation probability $s_1 \in [0, 1]$ (1 for

*correct*and 0 for

*duplicate*). The multiplication of two scores $s_0 s_1$ is the

**ﬁnal**classiﬁcation score.