|
|
|
|
|
|
|
(Top) Most Object Detection datasets have exhaustive annotations for foreground/positives. During training, the unlabelled regions can be safely considered as background/negatives. Sparsely Annotated Object Detection datasets (bottom) have missing annotations. This results in foreground regions (shown in red) being considered as negatives during training, deteriorating the performance of the classifier. |
Training with sparse annotations is known to reduce the performance of object detectors. Previous methods have focused on proxies for missing ground truth annotations in the form of pseudo-labels for unlabeled boxes. We observe that existing methods suffer at higher levels of sparsity in the data due to noisy pseudo-labels. To prevent this, we propose an end-to-end system that learns to separate the proposals into labeled and unlabeled regions using Pseudo-positive mining. While the labeled regions are processed as usual, self-supervised learning is used to process the unlabeled regions thereby preventing the negative effects of noisy pseudo-labels. This novel approach has multiple advantages like improved robustness to higher sparsity when compared to existing methods. We conduct exhaustive experiments on five splits on the PASCAL-VOC and COCO datasets achieving state-of-the-art performance. We also unify various splits used across literature for this task and present a standardized benchmark. On average, we improve by 2.6, 3.9 and 9.6 mAP over previous state-of-the-art methods on three splits of increasing sparsity on COCO. |
| |
Illustration of SparseDet for sparsely annotated object detection. SparseDet consists of a backbone network that extracts features from the original and augmented views of an image. The common RPN (C-RPN), concatenates the features, to generate a set of region proposals. A region proposal can belong to one of three groups, namely 1) labeled regions, 2) unlabeled foreground regions, or 3) background regions. For a given set of ground-truth annotations, the first group, i.e. labeled regions can be automatically identified. The problem then becomes to identify and separate the second group i.e. the unlabeled regions, from the background regions. Given all the region proposals, a pseudo- positive mining (PPM) step identifies the unlabeled regions and segregates them from the background regions. The labeled and unlabeled regions are trained using supervised and self-supervised losses respectively. |
| |
Results showing the unlabeled regions identified by PPM. The red boxes correspond to the available ground truth. A class agnostic NMS was performed on the regions and the result is shown in white. |
| |
Results comparing the output of a model trained using available ground truths (top) to a model trained using our approach (bottom). Predictions with a class confidence score greater than 0.9 are shown. Red: Person, Cyan: Dog, Purple: Horse, Yellow: Clock, Green: Stop sign, Blue: Parking meter, Violet: Giraffe, Orange: Potted plant, Black: Surfboard, Dark green: Boat. |
S. Suri*, S. Rambhatla*, R. Chellappa, A. Shrivastava. SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining (Paper | Supplementary | arXiv) |