PhD Proposal: Efficient Inference Methods for Structured Prediction by Imitation Learning

Talk
He He
Time: 
03.27.2015 11:00 to 12:30
Location: 

AVW 4424

Structured prediction problems are ubiquitous in natural language processing. The standard approach is to assign a score to every possible structure in the search space and find the highest- scoring one (known as decoding or inference), typically by dynamic programming or integer linear programming. Such algorithms tend to be slow due to the following reasons. First, the score is computed from a myriads of features describing the structure, which can take up to 80% of the total computation time. Second, searching for the global optimum in a combinatorial space is generally hard. A polynomial time algorithm may still be unsatisfactory in real applications, especially when the input size is large (e.g., parsing long sentences) or global features are used (e.g., higher order models).
Our primary goal is to speed up structured prediction at test time by learning good search strategies from oracles at training time. We propose statistical models that search the space sequentially, make decisions adaptively and find good solutions quickly. We tackle the two aspects of structured prediction, feature computation and decoding.
We start with dynamic feature selection in a simpler, non-structured setting: binary/multiclass classification. The model starts with zero feature and sequentially selects the most informative feature(s) given current predictions and feature costs. We then extend the model to a structured setting coupled with inference: dependency parsing. By leveraging the structural information, not only can we use fewer features, but also prune the search space aggressively. For decoding, we present efficient prioritization and pruning strategies in branch and bound search, which is the default method in most integer linear programming solvers. Furthermore, we discuss practical issues arising when applying imitation learning to non-typical imitation problems. We show that when the oracle action is too good, it is preferable to learn from a coach that demonstrates reachable and good actions, rather than an oracle that demonstrates the best action regardless of the learner's ability.
More broadly speaking, in addition to learning speed-accuracy tradeoff, our model is generally effective in settings where inputs arrive sequentially and predictions/actions are required given incomplete inputs. For example, in simultaneous machine translation, translation must be produced while a sentence is being revealed. We design actions based on linguistic knowledge of human interpretation and show how to combine them with the sequential decision-making framework.
Examining Committee:
Committee Chair: - Dr. Hal Daume' III
Dean's Representative - Dr. Ramani Duraiswami
Committee Member: - Dr. Jordan Boyd-Graber