General Information |
|
One of the most basic problems in vision is to use images to recognize that a particular object or event that we’ve never seen before belongs to a particular class of objects or events. To do this we must have a rich notion of what an object is, that can capture what is common in them. For example, chairs vary tremendously in their shape and material properties. How do we look at a chair we’ve never seen before and identify it as a chair? Accounting for this variation in recognition is largely an unsolved problem. In this course we will survey a number of approaches to representing and recognizing objects. We will draw inspiration by looking at work from philosophy, psychology, linguistics, and mathematics. However, our primary focus will be more concrete, to learn the algorithms and analytic tools that have been applied in visual object classification.
First we will study approaches based on the idea that objects can be described by a set of necessary and sufficient image properties. This has taken the form of invariant representations. We will study the advantages and limitations of geometric and photometric invariants. Second we will consider classification approaches based on powerful similarity measures. These include methods of shape comparison, deformable template matching, and lighting insensitive matching. Third we will consider methods that attempt to represent the images of classes of objects using subspaces. This includes approaches based on PCA, linear combinations, and manifold representations of classes. Fourth, we will look at approaches to building generative models of classes, including the use of hidden markov models and pattern theory. Finally, we will consider the idea of building classifiers directly, without explicit representations of the class, using methods such as support vector machines, Winnow, and naïve Bayes.
The class will alternate between lectures teaching the basic mathematical and algorithmic techniques of these methods, and student-led discussion of vision research papers that apply these techniques. It will be essential for students to have a solid understanding of basic topics in math, such as linear algebra, probability and statistics, and calculus. It will also be useful to have some knowledge of computer vision, image processing, functional analysis, stochastic processes, or geometry. In general, the more math a student knows, the easier the course will be.
Here is my current plan for the workload of the class. This may change prior to the first day of class.
1) Reports. There are 15 classes scheduled for the presentation of papers. Prior to each of these classes, students must turn in a one page summary and critique of one of the papers to be discussed. Late papers will not be accepted, since the goal of these reports is to get you to think about papers before we discuss them. However, each student need only turn in reports for 12 of these classes. 20% of grade
2) Presentation. Students will be assigned in pairs to present (usually) two papers in one class. This will be a substantial part of the grade. Presentations should be well prepared. If enrollment is low enough, students may be expected to do this twice. 20% of grade
3) Midterm and Final. These will be based on material from the lectures. 40% of grade
4) Project. Student will choose one: 20% of grade
a) Write a detailed, paper, approximately five pages in length, proposing research that extends or adapts one of the approaches discussed in class. You may choose to base this on the papers you have presented.
b) Programming project: student will implement a technique discussed in class, and apply it to some real data. This is not meant to be a research project, but something closer to an extended problem set.
The schedule below is probably overly ambitious, so expect that we won't get to a couple of these classes. Pairs of students will lead the discussion of papers, as indicated. There will probably not be enough students to lead all these classes, so I will lead discussion for any extras. Classes October 14 and 16 will have to be rescheduled, due to the International Conference on Computer Vision.
Class |
Presenters | Topic | Background Reading |
1. 9/2 | Jacobs | Introduction (view as web page). | |
2. 9/4 | Jacobs | Paper Presentation: (view
as web page).
Students must review (a) or (b). a. Women, Fire and Dangerous Things by Lakoff, Chapters 1 and 2. On reserve. b. S. Laurence and E. Margolis, ``Concepts and Cognitive Science'', in Concepts edited by E. Margolis and S. Laurence, MIT Press, 1999. On reserve. c. L. Wittgenstein, Philosophical Investigations, sections 65-78. On reserve. |
|
3.9/9
|
Jacobs | Lecture: Affine and projective geometry and invariants. |
Introduction to Projective Geometry, C.R.
Wylie, McGraw-Hill Book Co., 1970. Y. Lamdan, J. T. Schwartz, and H. J.
Wolfson. Affine invariant model-based object recognition. IEEE
Journal of Robotics and Automation, 6:578--589, 1990 I. Weiss. Geometric Invariants and
Object Recognition. Intl. J. Computer Vision, 10:207--231, 1993 J. Burns, R. Weiss, and E. Riseman, ``The
Non-Existence of General-Case View-Invariants’’, in Geometric
Invariance for Computer Vision, edited by J. Mundy and A. Zisserman,
MIT Press, 1992 Moses, Y. and Ullman, S. (1992).
``Limitations of non model-based recognition schemes’’. In Sandini,
G., editor, Proc. 2nd European Conf. on Computer Vision, Lecture Notes in
Computer Science, volume 588, pages 820--828. Springer Verlag. J. Mundy and A. Zisserman, Appendix – Projective
Geometry for Machine Vision, in Geometric Invariance for Computer Vision,
edited by J. Mundy and A. Zisserman, MIT Press, 1992. ``In Search of Illumination Invariants,'' IEEE Conference on Computer Vision and Pattern Recognition, pp.~{254--261}, (June 2000). H. Chen, P. Belhumeur, and D. Jacobs. |
4. 9/11 | Jacobs | Lecture: Geometric invariance (conclusion) and photometric invariance. (view as web page). | |
5. 9/16 | Jacobs | Presentation, David Jacobs: Classification with
Invariants
b. Biederman, I. (1987). Recognition--by--components: A theory of human image understanding. Psychological Review, 94(2):115--147. On reserve. |
Jepson, A., W.
Richards, and D. Knill, Modal
structure and reliable inference, in ``Perception as Bayesian
Inference," eds. D. Knill and W. Richards, Cambridge Univ. Press,
1996, pp. 63-92. Biederman has many papers on this topic, including: Biederman, I. Gerhardstein, P. (1993). Recognizing depth-rotated objects: Evidence and conditions for three-dimensional viewpoint invariance." Journal of Experimental Psychology: Human Perception and Performance, 19, 1162-1182. Biederman,
I. Gerhardstein, P. (1995). Viewpoint-dependent mechanisms in visual
object recognition: reply to Tarr and Bulthoff Journal of Experimental
Psychology: Human Perception and Performance, 21(6), 1506-1514. D. Jacobs. ``What Makes Viewpoint Invariant Properties Perceptually Salient,'' Journal of the Optical Society of America A, in press. |
6. 9/18 | Rao | Class canceled so students may attend a lecture by C. R. Rao: Has Statistics a Future? If so, in what form?, 3:30-4:30, 1524 Van Munching Hall, the Howard Frank Auditorium. | |
7. 9/23 | Jacobs | Lecture: Linear subspaces – geometry & PCA | Duda, Hart and Stork, pp. 114-117. On
reserve in library.
Shimon Ullman and Ronen Basri, Recognition by Linear Combinations of Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10): 992-1006, 1991.Available at: http://www.wisdom.weizmann.ac.il/~ronen/publications.html D. Jacobs "Matching 3-D Models to 2-D Images," the International Journal of Computer Vision, (21)1/2:123--153, January, 1997. On reserve. Turk, M. & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3, 71-86
|
8. 9/25 | Jacobs | Lecture: Linear subspaces – photometry |
Shashua. On photometric
issues to feature-based object recognition. Int. J. Computer Vision,
21:99-- 122, 1997. ``Lambertian Reflectance and Linear Subspaces,'' IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(2):218-233, (2003). R. Basri and D. Jacobs. Available at: http://www.wisdom.weizmann.ac.il/~ronen/publications.html
|
9. 9/30 | Nanda
Sen |
Presentations: Linear Representations of
Classes. T.F. Cootes and C.J. Taylor, "Statistical models of appearance for medical image analysis and computer vision", Proc. SPIE Medical Imaging 2001. Presentation. “Face Recognition Based on 3D Shape Estimation from Single Images”, by Blanz and Vetter, CGF-TR 2, October 2002, University of Freiburg. There is a version in PAMI 2003 available on line, I'll add a link. Presentation. (view as web page). |
``Statistical models of appearance for computer vision'' by Cootes and Taylor Lohmann, G.P. 1983. Eigenshape analysis of microfossils: a general morphometric procedure for describing changes in shape. Mathematical Geology 15:659-672. |
10.9/30
4:45-6:00 "Optional" |
Jacobs |
Presentation: Prototypes and Natural Categories. (view as web page). Posner,
M.I., & Keele, S.W. (1968). On the genesis of abstract ideas. Journal
of Experimental Psychology, 77, 353-363. E. Rosch, C. Mervis, W. Gray, D. Johnson, and P. Boyes-Braem, ``Basic Objects in Natural Categories'', Cognitive Psychology, 8:382--439. On reserve. |
S. Ullman, High-level Vision, MIT Press, 1996, Chapter 6. Ronen Basri, Recognition by Prototypes, International Journal of Computer Vision, 19(2): 147-168, 1996.
|
11. 10/2 | Cuntoor
Raykor |
Presentations: Non-linear subspaces
S. Edelman, ``Representation is Representation of Similarities’’, Behavioral and Brain Sciences. Presentation (view as web page). Joshua B. Tenenbaum, Vin de
Silva, John C. Langford, ``A
Global Geometric Framework for Nonlinear Dimensionality Reduction’’,
Science.
Sam T. Roweis, Lawrence K. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science, Science. Presentation (view as web page). |
H. Sebastian Seung and Daniel D. Lee The Manifold Ways of Perception, Science.H. Murase and S.K. Nayar. Visual Learning and Recognition of 3D Objects from Appearance. International Journal of Computer Vision, vol. 14, no. 1, Jan 1995, pp 5-24. Ronen Basri, Dan Roth, and David Jacobs, Clustering Appearances of 3D Objects, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Santa Barbara: 414-420, 1998 Cutzu, F., and S. Edelman, Representation of object similarity in human vision: psychophysics and a computational model, Vision Research 38:2227-2257, 1998 |
12.10/7 | Eaton
Mativo |
Presentations: The psychology of
similarity and view-based recognition.
E. Goldmeier, Similarity in Visually Perceived Forms, International Universities Press, Psychological Issues, Volume VIII, Number 1, Monograph 29. Chapters: 1-5. On reserve. H. Bulthoff, S. Edelman, and M. Tarr, ``How Are Three-Dimensional Objects Represented in the Brain?'' MIT AI Memo #1479. |
Check out Mike Tarr's class
page for many relevant references.
Poggio, T. and Edelman, S., A network that learns to
recognize three-dimensional objects.
Nature, 343:363-266. |
13.10/9 | Jacobs |
Presentations, mixed with lecture: Shape and Nature. Students must review (a). (a) D’arcy Thompson, On Growth and Form, Dover Books, 1992, Chapters 1 and 17. On reserve. (b) A guide to Tree Identification (just skim this). On reserve.
|
Morphometric tools for Landmark data, by Bookstein D.
G. Kendall. A survey of the statistical theory of shape.
Statistical Science, 4(2):87120, 1989 Shape and Shape Theory, by Kendall, Barden, Carne and Le Statistical Shape Analysis by I.
L. Dryden and Kanti
V. Mardia |
10/14 | No Class, ICCV | ||
10/16 | No Class, ICCV | ||
14. 10/21 | Jacobs | Lecture: Morphometrics. (view as web page). | See previous class. |
15. 10/23 | Yi (?)
Tahmoush |
Presentations: Morphometrics and Recognition. Serge Belongie, Jitendra Malik and Jan Puzicha Shape Matching and Object Recognition Using Shape
Contexts PAMI,
24(4):509-522, April 2002.
|
|
16. 10/28 | Jacobs | Lecture: Fourier descriptors and wavelets. (view as web page) |
A Wavelet Tour of Signal Processing,
by Mallat. “Introduction and Overview of Fourier Descriptors”, by
Lestrel. |
17. 10/30 | Aggarwal
Nath |
Presentations: Wavelet-based texture
classification.
J. S. De Bonet and P. Viola Texture Recognition Using
a Non-parametric Multi-Scale Statistical Model, CVPR ’98.
Available at: http://www.debonet.com/Research/Publications/ Presentation (view as web page). J Portilla and E P Simoncelli. A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients. Int'l Journal of Computer Vision. 40(1):49-71, October, 2000. Presentation |
|
18. 11/4 | Ran
Jacobs |
Presentations: Wavelet-based representations for object classification. M. Lades, J.C. Vorbruggen, J.
Buhmann, J. Lange, C. von der Malsburg, R.P. Wurtz, W. Konen. Distortion
Invariant Object Recognition in the Dynamik Link Architecture. IEEE
Transactions on Computers 1992, 42(3):300-311. Available at: http://citeseer.nj.nec.com/lades93distortion.html |
C.Schmid & R.Mohr (1997) Local Grayvalue Invariants for Image Retrieval, IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(5), 530-535. |
19.11/6 | Jacobs | Lecture Hidden Markov models (view as web page). |
Rabiner, "A Tutorial
on Hidden Markov Models and Selected Applications in Speech Recognition.
|
20.11/11 | Ho
Lee |
Presentations: HMMs for classification
J. Yamato, J. Ohya, and K. Ishii, “Recognizing
Human Action in Time-Sequential Images Using Hidden Markov Model,” CVPR
’92, pages 379-385. J. Li, A. Najmi, and R. Gray, ``Image Classification by a Two Dimensional Hidden Markov Model,’’ IEEE Transactions on Signal Processing, February 2000. Presentation as pdf. |
|
21. 11/13 | Jacobs | Lecture: Generative models of objects. Markov Random Fields. Gibbs Energy. Gibbs sampling. | S.Geman and D.Geman. "Stochastic relaxation, gibbs distributions, and the bayesian
restoration of images", IEEE Transactions on Pattern Analysis and Machine
Intelligence, 6:721--741, 1984.
U. Grenander Y. Chow, and D. M. Keenan. "Hands. A Pattern Theoretic Study of Biological Shapes", Springer Verlag, New York, 1991. |
22. 11/18 | Jacobs | Lecture: Skeletons and Parts. (See previous lecture notes) | |
23.11/20 | Mihalcik
Tran |
Presentations: Parts
K. Siddiqi, A. Shokoufandeh, S. J. Dickinson & S. W. Zucker. Shock Graphs and Shape Matching. International Journal of Computer Vision, 35(1), 13-32, 1999. Presentation (view as web page). Pedro F. Felzenszwalb and Daniel P. Huttenlocher. |
|
24. 11/25 | Ling
Gordon |
Presentations:
S. Geman, D. Potter, and Z. Chi. Composition systems. Quarterly of Applied Mathematics, LX, 2002, 707-736. Presentation S. Zhu, Embedding
Gestalt Laws in Markov Random Fields -- A theory for shape modeling and
perceptual organization. |
|
25 & 26. 12/2 & 12/4 | Jacobs | Lecture: Linear separators, naive bayes, perceptrons, svms, boosting, winnow. (as web page). | C.
Burges, A
Tutorial on Support Vector Machines for Pattern Recognition
``Learning Quickly When Irrelevant Attributes Abound: A New Pattern Classification, Duda, Hart and Stork. |
27. 12/11 | Fails
Shirdhonkar |
Presentations: Linear Classifiers
P. Viola and M. Jones. Robust real-time object detection. Technical Report 2001/01, Compaq CRL, February 2001 H. Schneiderman and T. Kanade "Object Detection Using the Statistics of Parts" International Journal of Computer Vision. Presentation (view as web page). |
|
12/16 | Review Session. A.V. Williams, 4424, 3:30-5:00 |