# Paper Reading: Relation Networks for Object Detection

Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.

This work introduces relation and attention strategy (Scaled Dot-Product Attention) as a module to model the location relation between objects.

Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei

# Scaled Dot-Product Attention

from

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.

# Relation Module

Given input set of $N$ objects $\{(\mathbf{f}_A^n, \mathbf{f}_G^n)\}^N_{n=1}$

geometric feature: $\mathbf{f}_G$ (4-dimensional object bounding box)

appearance feature: $\mathbf{f}_A$ (task dependant)

Relation Module:

realation feature $\mathbf{f}_R(n)$ of the whole object set with respect to the $n^{th}$ object is

where

($w_G^n$ is extra compared to attention model in [1])

where

where

# Duplicate removal network

Duplicate removal is a two class classiﬁcation problem. For each ground truth object, only one detected object matched to it is classiﬁed as correct. Others matched to it are classiﬁed as duplicate. This classiﬁcation is performed via a network, which output binary classiﬁcation probability $s_1 \in [0, 1]$ (1 for correct and 0 for duplicate). The multiplication of two scores $s_0 s_1$ is the ﬁnal classiﬁcation score.

Author:
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Texot !
TOC