PhD Proposal: Context for Object Localization and Its Application to Accessibility Assessment Problems

Talk
Jin Sun
Time: 
02.03.2017 13:00 to 14:30
Location: 

AVW 4172

Computer vision plays important roles in more and more real world problems. With enormous amounts of visual data available, powerful computational resources and evolving learning algorithms, it is now practical to develop algorithms that can substantially improve the quality of life for society.

We are interested in object localization problems in complex scenes. This is one of the fundamental problems in computer vision. The task is to provide spatial information on objects of interest given an image with a dynamic environment. Object localization algorithms can be applied to accessibility assessment problems, where the goal is to find issues in built environments such as city streets, sidewalks and businesses that are inconvenient to people with disabilities. Google Street View (GSV) images are shown to be reliable sources for evaluating the accessibility of an area. An important type of object for this task is the curb ramp on sidewalks: with it, people in wheelchairs can cross streets easily.

Unlike objects in standard computer vision datasets (e.g., car, dog, and airplane), a curb ramp is visually ambiguous with other structures in a scene (e.g., curbs). Also, its location is largely affected by its environment: a curb ramp usually appears at the side of a sidewalk, a corner of an intersection, and with crosswalks by its side. To model the interactions between curb ramps and their environments, we need a flexible context model.

In this research proposal, we study how to encode context into object localization pipelines, with the goal of solving the curb ramp detection problem. In the first preliminary study, we present a system, called Tohme, that combines an existing object detection algorithm, a collection of simple context cues, a dynamic task allocation scheduler and crowdsourcing techniques to detect constructed curb ramps using GSV images in four city areas: Washington DC, Los Angeles, Baltimore, and Saskatoon. Our experiment shows that Tohme achieves similar performance compared to a manual labeling approach, with a 13% reduction of time cost. In the second preliminary study, we demonstrate a convolutional neural network model that learns a standalone context model for finding missing objects. By training with a Siamese Network strategy and fully convolutional structures, the context network produces efficiently a probabilistic map of where objects should be in an image, even when no object instances are present. Our experiment shows promising results of using the context network to find missing curb ramps in city intersections. In future works, we plan to explore more on 1) learning efficient visual-spatial representations to model context and object parts; 2) proper loss functions for localization tasks; 3) accessibility assessment problems beyond the curb ramp detection problem.

Examining Committee:

Chair: Dr. David Jacobs

Dept rep: Dr. Jon Froehlich

Member: Dr. Larry Davis