|
Reviews For Paper
Paper ID |
2709 |
Title |
Soft-NMS -- Improving Object Detection With One Line of Code |
Masked Reviewer ID:
|
Assigned_Reviewer_1
|
Review:
|
|
Question | |
Paper Summary. Please summarize in your own words what the paper is about. |
The paper proposes soft-NMS as a replacement for NMS, a standard step in current object detection systems. The method is quite simple: instead of completely removing overlapping boxes, their scores should be modulated (reduced) as a function of the overlap. This ensures that nearby objects can still be detected. I like the simplicity of the paper and feel that the community will quickly adopt it. However, I have a few concerns (see later).
|
Paper Strengths. Positive aspects of the paper. Be sure to comment on the paper's novelty, technical correctness, clarity and experimental evaluation. Notice that different papers may need different levels of evaluation: e.g., a theoretical paper vs. an application paper |
- Well written - Simple, easy to implement - Widely applicable - Good quantitative results - Good ablation analysis
|
Paper Weaknesses. Discuss the negative aspects: lack of novelty or clarity, technical errors, insufficient experimental evaluation, etc. Please justify your comments in great detail. If you think the paper is not novel, explain why and provide evidences |
Overall, I like the simplicity and wide applicability of the paper. I do have some concerns/comments listed below.
- There is a trade-off between FP (including double counting) and FN, which NMS deals with. Can the authors analyze where exactly this method helps/hurts and by how much? - Can the authors comment if soft-NMS worsens the double counting problem, which is penalized in standard MAP metrics? To avoid this problem, NMS removes overlapping boxes from consideration altogether; but with soft-NMS, such boxes can only be suppressed with aggressive thresholds or aggressive drop in scores. - The relevant trick from "bells-and-whistles" [C] is iterative bounding-box regression, NMS and weighted voting (introduced in [D], studied in [C, E]). It would be interesting to see if soft-NMS complementary to this. - Most "bells-and-whistles" (L52, [C]) do not require re-training (multi-scale training is the only one that does). Given this, I would recommend redoing the first paragraph of introduction where adoption is the only argument.
Related work: - I agree with L249-252. However, it is important to describe why certain methods [A,B] cannot be applied to generic object detection. Authors provide a description in L197-205, but are there minor changes to [A,B] that can make them suitable for generic object detection.
Experimental details: - L395: What is the detection threshold used in the paper (is it different from standard settings)? What is the relationship between that threshold and final performance after soft-NMS? Given that soft-NMS doesn't discard any boxes, it might be useful to study this. - L469-73: How does the runtime change when a lower threshold is used? Often in practice, practitioners use [0.005, 10e-4] as detection thresholds. Given that soft-NMS doesn't discard any boxes, it is important to study how having more boxes impacts the runtime. - Please add AR numbers in Table 1. Recall is as important to study for this paper. - L427-28: Why the mismatch in training sets?
Minor details: - L141: "..it removes it.." -> re-phrase. - In Table 1, replace "coco-minival" with "trainval35k" ("-" can be confusing) - In all eq., LHS can be replaced with f(s_i)? - In background section, cite which all object detectors are being described (Faster R-CNN, R-FCN). - Replace \mathcal{M} for a box with b^m to make the notation consistent?
[A]. Spatial Semantic Regularisation for Large Scale Object Detection [B]. Non-maximum suppression for object detection by passing messages between windows [C]. OHEM: Training Region-based Object Detectors with Online Hard Example Mining [D]. MR-CNN: Object detection via a multi-region & semantic segmentation-aware CNN model [E]. ION: Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
|
Preliminary Rating |
Weak Accept
|
Preliminary Evaluation. Please indicate to the AC, your fellow reviewers, and the authors your current opinion on the paper. Please summarize the key things you would like the authors to include in their rebuttals to facilitate your decision making. |
Overall, I like the simplicity and wide applicability of the paper. One concern might be that the paper is too simple for a conference publication. More analysis might help make the paper stronger (see questions above).
|
Confidence. Write “Very Confident" to stress you are absolutely sure about your conclusions (e.g., you are an expert working in the area), “Confident” to stress you are mostly sure about your conclusions (e.g., you are not an expert but are knowledgeable). "Not Confident" in all the other cases. |
Confident
|
Final Recommendation. After reading the author's rebuttal and the discussion, please explain your final recommendation. Your explanation will be of highest importance for making acceptance decisions and for decidinf between posters and orals |
Given the applicability, simplicity and rigorous study, I'm inclined towards seeing it as a poster in the conference. Please add the suggestions from the reviewers to the paper.
|
Final rating. After reading the author's rebuttal, please rate the paper according to the following choices |
Poster
|
Masked Reviewer ID:
|
Assigned_Reviewer_2
|
Review:
|
|
Question | |
Paper Summary. Please summarize in your own words what the paper is about. |
The paper proposes a simple method to improve Non-maximum suppression.
When doing greedy NMS, the standard approach is to sort the candidate bounding boxes by score. At each iteration, the best is selected. To avoid multiple detection of the same object, all of the other bounding boxes are compared to the selected one and if the IoU is higher than a certain threshold,those candidate bounding boxes are pruned.
This paper propose a soft-NMS that instead reduce the score of the other bounding boxes, by mutliplying their score with a function of the IoU.
|
Paper Strengths. Positive aspects of the paper. Be sure to comment on the paper's novelty, technical correctness, clarity and experimental evaluation. Notice that different papers may need different levels of evaluation: e.g., a theoretical paper vs. an application paper |
The paper reads very well and provide a good description of related work and background, motivating the problem. Even outside of the contribution of this paper, i would recommend its paper to people getting started with object detection as it provides a clear description of the part of the pipelines it deals with.
The changes proposed to traditional NMS are simple so I can definitely see them being used widely in practice as it is essentially a free win.
Experiments are extensive and fair. The fact that they use off the shelf, pre-trained models, makes me confident in the result. Additionally, the improvements are fairly consistent. In addition, a lot of analysis is done on the robustness of the method to hyperparameter and on understanding where the benefits of the method are obtained.
|
Paper Weaknesses. Discuss the negative aspects: lack of novelty or clarity, technical errors, insufficient experimental evaluation, etc. Please justify your comments in great detail. If you think the paper is not novel, explain why and provide evidences |
The only thing that could be said against that paper is that the changes proposed are pretty minor but I would count this as an advantage as it's much more likely to be used in practice.
Some tiny details:
Small correction: In Figure 2, shouldn't the comparison between the IoU and the threshold be reversed? You prune bounding boxes when they have too high of an overlap with the picked bounding box
Citation: * Feature pyramid networks for object detection is a CVPR paper, you should adjust the citation * The format of the citations is a bit inconsistent. This an extremely small details
|
Preliminary Rating |
Strong Accept
|
Preliminary Evaluation. Please indicate to the AC, your fellow reviewers, and the authors your current opinion on the paper. Please summarize the key things you would like the authors to include in their rebuttals to facilitate your decision making. |
This paper does focus on a niche but crucial aspect of the detection pipeline. It provides a method that is simple enough that it will get adoption and delivers consistent improvements. The method is well analyzed and the exposition is extremely clear.
Will this completely revolutionize Computer Vision? Probably not, but it will clearly be beneficial for people in the field to be aware of this paper.
|
Confidence. Write “Very Confident" to stress you are absolutely sure about your conclusions (e.g., you are an expert working in the area), “Confident” to stress you are mostly sure about your conclusions (e.g., you are not an expert but are knowledgeable). "Not Confident" in all the other cases. |
Confident
|
Final Recommendation. After reading the author's rebuttal and the discussion, please explain your final recommendation. Your explanation will be of highest importance for making acceptance decisions and for decidinf between posters and orals |
Even after discussion with the other reviewers, I maintain my opinion that this paper should be accepted. Improvement are consistent and analysis is extensive. The fact that the changes are small shouldn't hinder the fact that they are helpful, and that the community would benefit from being aware of them.
|
Final rating. After reading the author's rebuttal, please rate the paper according to the following choices |
Poster
|
Masked Reviewer ID:
|
Assigned_Reviewer_3
|
Review:
|
|
Question | |
Paper Summary. Please summarize in your own words what the paper is about. |
The paper introduces generalizations to NMS where instead of setting the suppressed box probability to zero it instead modifies it using by a function of the IOU between the high confidence box and the lower confidence box. They show improvements on COCO and VOC using faster r-cnn and r-fcn.
|
Paper Strengths. Positive aspects of the paper. Be sure to comment on the paper's novelty, technical correctness, clarity and experimental evaluation. Notice that different papers may need different levels of evaluation: e.g., a theoretical paper vs. an application paper |
There are a large number of experiments across datasets and two detection methods, good sensitivity analysis. The paper tries to address what is a significant problem in detection, namely detection systems struggle with multiple objects that have a high degree of overlap, even if it is obvious to a human observer.
|
Paper Weaknesses. Discuss the negative aspects: lack of novelty or clarity, technical errors, insufficient experimental evaluation, etc. Please justify your comments in great detail. If you think the paper is not novel, explain why and provide evidences |
The method provides only a small increase in mAP, equivalent to doing some parameter tuning on a model perhaps. Other papers have addressed adjustments to NMS in the past and instead of taking a more theoretically or data-driven approach this paper just employs a simple, hand-tuned heuristic.
|
Preliminary Rating |
Weak Reject
|
Preliminary Evaluation. Please indicate to the AC, your fellow reviewers, and the authors your current opinion on the paper. Please summarize the key things you would like the authors to include in their rebuttals to facilitate your decision making. |
While the technique may prove to be useful, it doesn't seem to merit a full paper at a top conference. It would be better in a larger research project that tries to address the problem of detecting overlapping objects in object detection. A method that could reliably detect overlapping objects with high confidence but still not produce duplicate detections would be very interesting and important work in the field.
I tried implementing the technique for detections in YOLOv2 and found it hurt for the VOC metric and helped by a small amount on the COCO metric on COCO dataset. Perhaps it works better on region proposal based methods since they tend to have more duplicate detections? Either way, I don't think every small tweak that gives a 1% boost in mAP deserves a paper.
Normal NMS Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.254 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.481 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.245
Linear Soft NMS Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.259 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.479 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.256
|
Confidence. Write “Very Confident" to stress you are absolutely sure about your conclusions (e.g., you are an expert working in the area), “Confident” to stress you are mostly sure about your conclusions (e.g., you are not an expert but are knowledgeable). "Not Confident" in all the other cases. |
Very Confident
|
| |