General Information |
|
|
Review session for final will be at 3pm on Wed., May 13, in our usual classroom.
Review sheet for Final here.
Midterm review will be 1:00-2:30 on Monday, 3/9, in AV Williams Room 3165.
Practice Midterm: here.
Due on 3/11: a one paragraph description of your proposed class project. Include a description of the algorithms you will implement and the data set you will use.
One of the most basic problems in vision is to use images to recognize that a particular object or event that we’ve never seen before belongs to a particular class of objects or events. To do this we must have a rich notion of what an object is, that can capture what is common in them. For example, chairs vary tremendously in their shape and material properties. How do we look at a chair we’ve never seen before and identify it as a chair? Accounting for this variation in recognition is largely an unsolved problem. In this course we will survey a number of approaches to representing and recognizing objects. We will draw inspiration by looking at work from philosophy, psychology, linguistics, and mathematics. However, our primary focus will be more concrete, to learn the algorithms and analytic tools that have been applied in visual object classification.
The class will alternate between lectures teaching the basic mathematical and algorithmic techniques of these methods, and discussion of vision research papers that apply these techniques. It will be essential for students to have a solid understanding of basic topics in math, such as linear algebra, probability and statistics, and calculus. It will also be useful to have some knowledge of computer vision, image processing, functional analysis, stochastic processes, or geometry. In general, the more math a student knows, the easier the course will be.
Here is my current plan for the workload of the class.
1) Reports. There are 11 classes scheduled in which we will discuss research papers. Prior to each of these classes, students must turn in one page, in which they discuss a preassigned question concerning the reading. Late papers will not be accepted, since the goal of these reports is to get you to think about papers before we discuss them. However, each student need not turn in a report when they are giving a presentation. In addition, students may skip one paper. Consequently, each student will be required to complete this assignment for 9 classes. 15% of grade
2) Presentation. Students will give group presentations later in the semester, in which they synthesize material from a number of papers. We will settle on the exact format of these presentation in class. 15% of grade
3) Midterm and Final. These will be based on material from the lectures. 50% of grade
4) Project. Students will implement some recognition algorithms and test them on an appropriate data set. Each student must implement and compare at least two algorithms. These will generally be algorithms studied in class, but students are free to implement other algorithms, or to devise their own. Each student should discuss their proposed work with me. 20% of grade
5) Class Participation. Everyone should read papers before class and contribute to discussion of them. Extra credit.
Note: visitors or auditors are welcome. However, if you are attending a class in which we will discuss papers, you should complete a report on one of these papers (see requirement 1).
None of this schedule is written in
stone. Feel
free to suggest other papers or topics you’d like to discuss.
Date
|
Topic
|
Background
|
|
Concepts. The Stanford Encyclopedia of Philosophy. L. Wittgenstein, Philosophical Investigations, sections 65-78. | |
Search in pose space (gradient descent, Hough Transform, chamfer matching...), correspondence space (interpretation trees, alignment) and their relationship
|
| |
|
|
Y. Lamdan, J. T. Schwartz, and H. J.
Wolfson. Affine
invariant model-based object recognition. IEEE Journal of Robotics
and Automation, 6:578--589, 1990
I. Weiss. Geometric Invariants and Object Recognition.
Intl. J. Computer Vision, 10:207--231, 1993
J. Burns, R. Weiss, and E. Riseman, ``The Non-Existence of General-Case View-Invariants’’, in Geometric Invariance for Computer Vision, edited by
J. Mundy and
A. Zisserman, Appendix –
Projective Geometry for Machine Vision, in Geometric Invariance for
Computer Vision,
edited by J.
Mundy and A. Zisserman, MIT Press, 1992.
|
|
| |
|
|
Shimon Ullman and Ronen Basri, Recognition by Linear Combinations of Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10): 992-1006, 1991.Available at: http://www.wisdom.weizmann.ac.il/~ronen/publications.html |
Question: Both of these methods are demonstrated on faces. What other classes of objects (if any) are they appropriate to? For example, could they be used effectively to identify animal species? Chairs? Motorcycles? Justify your answer. |
Moghaddam, Jebara and Pentland, “Bayesian Face Recognition.” MERL TR 2000-42.
Discriminant analysis of principal components for face
recognition
T.F. Cootes and C.J. Taylor, "Statistical models of appearance for medical image analysis and computer vision", Proc. SPIE Medical Imaging 2001.
Lohmann, G.P. 1983. Eigenshape analysis of microfossils: a general morphometric procedure for describing changes in shape. Mathematical Geology 15:659-672. | |
|
|
``Lambertian Reflectance and Linear Subspaces,'' IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(2):218-233, (2003). R. Basri and D. Jacobs. Available at: http://www.wisdom.weizmann.ac.il/~ronen/publications.html
``In Search of Illumination Invariants,'' IEEE Conference on Computer Vision and Pattern Recognition, pp.~{254--261}, (June 2000). H. Chen, P. Belhumeur, and D. Jacobs. |
|
| |
|
|
Shape and
Shape Theory, by
Metamophosis Through Lie Group Action, by Trouvé and Younes, Foundations of Computational Mathematics, 2004. |
Discussion: Statistics in the tangent space
Veeraraghavan
Durrleman, Pennec, Trouvé, Thompson and Ayache. Inferring Brain Variability from Diffeormorphic Deformations of Currents: an Integrative Approach
Question: Both these papers do shape analysis in the tangent space to an image manifold. What are the advantages and limitations of such an approach? |
| |
|
|
Statistical
Shape Analysis by
Geometric Morphometrics: Ten Years of Progress Following the ‘Revolution’ Dean C. Adams, F. James Rohlf , and Dennis E. Slice. |
Discussion: Modeling deformations
D’arcy Thompson, On Growth and Form,
|
| |
|
|
|
|
| |
|
Joshua
B. Tenenbaum, Vin de
Silva, John C. Langford,
``A
Global Geometric Framework for Nonlinear Dimensionality Reduction’’,
Science.
Sam T. Roweis,
| |
Weinberger and Saul. Unsupervised Learning of Image Manifolds by Semidefinite Programming.
|
| |
|
|
|
Discussion: Using features
Scalable Recognition with a Vocabulary Tree, by D. Nister and H. Stewenius, CVPR 2006. [pdf]
V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid.
Groups of Adjacent Contour Segments for Object Detection.
E. Nowak, F. Jurie, and B. Triggs. Sampling Strategies for Bag of Features
Image Classification.
[pdf] Question: Based on these three papers, what do you think are the strengths and limitations of Bag of Features approaches to recognition? How much further can this approach be pushed? |
| |
|
|
|
Crandall, Felzenszwalb and Huttenlocher. Object Recognition by Combining Appearance and Geometry
Question: Both of the approaches described in these papers rely on making assumptions about conditional independence. For what types of recognition problems do you think these assumptions will be appropriate? When we they be inappropriate? |
| |
|
|
```Support-Vector Networks,'' Machine Learning 20, 273--297, 1995, Cortes and Vapnik. Pattern Classification, Duda, Hart and Stork.
Additive Logistic Regression: a Statistical View of Boosting: |
Kumar, Belhumeur and Nayar. FaceTracer: a search engine for large collections of images with faces.
O. Boiman, E. Shechtman and M. Irani. In Defense of Nearest-Neighbor Based Image Classification.
Question: Most machine learning techniques are not developed specifically for computer vision. What are the main challenges that you see addressed in these papers when it comes to adapting these learning methods for computer vision applications? |
||
|
|
|
Discussion: Parts
Y.Jin and S.Geman. Context and hierarchy in a probabilistic image model.
P. Felzenszwalb, D. McAllester and D. Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model.
Y.Amit and A.Trouve. POP: Patchwork of parts models for object recognition
Question: Which of these methods offers the most promise for handling objects with parts? How do you think this compares with other methods for handling parts that we've discussed in class? |
| |
|
|
|
|
| |
|
|
Perhaps the most popular current approach to
classification involves using local descriptors, such as SIFT. In order to
advance the state of the art in this area, it is most important that we: 1) Develop better descriptors; Mohammed Eslami, Marco Adelfio, John Karvounis. 2) Develop better ways of using these descriptors. Anne Jorstad, Nitesh Shroff, Joao Soares. |
|
||
|
|