The NSF Quality of Life Technology (QoLT) ERC investigates
technologies to transform lives of people with reduced function
capabilities. One important QoLT research thrust is to develop tools
to understand users’ environments from visual input, including,
as the most basic module, the ability to recognize common objects in
the environment that are typically used in activities of daily living
(ADL), e.g. bottles, chairs, and trash-cans. However, existing object
recognition systems (mostly based on SIFT) generally perform well only
on objects that are highly textured (e.g. book covers, paintings, etc)
or have very unique colors that don't otherwise appear in the
environment. Since most common household objects don't fit into these
two categories, using current techniques for this task has proved
highly problematic. Working closely with QoLT investigators, we propose
to develop an App that can detect and recognize common household
objects (without strong texture or color cues) in real-time.
We envision using this app in the context of QoLT in a few different
modes. For example, in a user-supervised mode, images of the objects
would be selected by the user to learn the individual models used for
detection. We envision using this app with the sensor systems developed
in QoLT including, in particular, wearable cameras that observe the
environment from the user’s point of view. This capability will mesh
well with QoLT's planned concept for assistive systems (e.g., “where is
object X?”) as well as our concept for enhancing user’s vision. Underlying Technology
Our app showcases technology that is based on our recently published
work [1][2]. The conceptually simple but surprisingly powerful method
combines the effectiveness of a discriminative object detector
with the explicit correspondence offered by a nearest-neighbor
approach. The method is based on training a separate linear SVM
classifier for every exemplar in the training set. Each of these
Exemplar-SVMs is thus defined by a single positive instance and
millions of negatives. While each detector is quite specific to its
exemplar, we empirically observe that an ensemble of such
Exemplar-SVMs offers surprisingly good generalization.
For the purposes of this project, we developed fast and real-time implementations of
the method in both C/C++ and Matlab, both of which interface with a wearable ego-centric imaging system
and can perform detection of tens of object categories in real-time.
Associated Projects
Data-driven Visual Similarity for Cross-domain Image Matching [2]
We have released the code for underlying
technology (exemplar-based object detection [1] and cross-domain image matching [2]) in July 2011 and January 2012 respectively. This system was then implemented on an ego-centric wearable camera (hardware was provided by QoLT). After initial proof-of-concept testing, the real-time implementation (C/C++) was developed and released internally to QoLT, and we plan to release it publically once the QoLT completes their testing of the code in real scenarios.
Demo videos of our system in various scenarios is shown below.
-- To Autoplay videos, add this to the end of videoURl: rel=0&border=&autoplay=1-->
Generic Scenario (2x)(Legacy Version)
Office Scenario (1x)
Office Scenario (1x)
Kitchen Scenario (1x)
A live demo of the our system implemented on a wearable ego-centric sensor system was shown in
a QoLT Coordination meeting on August 29, 2011 and to the Advisory Board during annual NSF Site Visit on April 24-26, 2012.
Software Release
Source code for the basic infrastructure used in this work (Exemplar-SVM
infrastructure for large-scale training using a cluster, fast detection,
etc.) is available for download:
You can also directly navigate to the Exemplar-SVM Github project page, which has download instructions, a wiki, and additional starter-guides.
The C/C++ version will be released soon, after internal testing by QoLT.
References:
[1] Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Ensemble of
Exemplar-SVMs for Object Detection and Beyond. In ICCV, 2011.
[2] Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Data-driven Visual Similarity for Cross-domain Image Matching. ACM Transactions on Graphics (SIGGRAPH Asia), 2011