Note: this post is only meant for personal digestion and interpretation. It is incomplete and may mislead readers.
This work introduces relation and attention strategy (Scaled Dot-Product Attention) as a module to model the location relation between objects.
Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei
Scaled Dot-Product Attention
from
[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762, 2017.

Relation Module
Given input set of objects
geometric feature: (4-dimensional object bounding box)
appearance feature: (task dependant)
Relation Module:

realation feature of the whole object set with respect to the object is

where

( is extra compared to attention model in [1])
where


where


Duplicate removal network
Duplicate removal is a two class classification problem. For each ground truth object, only one detected object matched to it is classified as correct. Others matched to it are classified as duplicate. This classification is performed via a network, which output binary classification probability (1 for correct and 0 for duplicate). The multiplication of two scores is the final classification score.
