Paper Reading: Relation Networks for Object Detection

Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.

This work introduces relation and attention strategy (Scaled Dot-Product Attention) as a module to model the location relation between objects.

Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei

Scaled Dot-Product Attention


[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.

Relation Module

Given input set of NN objects {(fAn,fGn)}n=1N\{(\mathbf{f}_A^n, \mathbf{f}_G^n)\}^N_{n=1}

geometric feature: fG\mathbf{f}_G (4-dimensional object bounding box)

appearance feature: fA\mathbf{f}_A (task dependant)

Relation Module:


realation feature fR(n)\mathbf{f}_R(n) of the whole object set with respect to the nthn^{th} object is


(wGnw_G^n is extra compared to attention model in [1])



Duplicate removal network

Duplicate removal is a two class classification problem. For each ground truth object, only one detected object matched to it is classified as correct. Others matched to it are classified as duplicate. This classification is performed via a network, which output binary classification probability s1[0,1]s_1 \in [0, 1] (1 for correct and 0 for duplicate). The multiplication of two scores s0s1s_0 s_1 is the final classification score.

Author: Texot
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Texot !