PhD Proposal: Temporal Context Modeling for Text Streams

Talk

Jinfeng Rao

Time:

05.01.2017 10:00 to 11:30

Location:

AVW 3258

URL:

https://talks.cs.umd.edu/talks/1772

There is increasing recognition in the information retrieval community that time plays an essential role in many search-related tasks. The temporal dynamics of user behavior, document content and structure have been studied in a variety of IR problems, including query log analysis, temporal indexing, time-aware retrieval and ranking, event detection etc. Temporal dynamics modeling is becoming even more important with the emergence of social media platforms like Twitter and Facebook.

This proposal explores temporal models on evolving streams of text and the role that such models can improve information access. I consider three cases: a stream of social media posts by many users for tweet search, a stream of queries by an individual user for voice search, and a stream of response by two agents (proposed work). My work proposes to explore the relationship between temporal models and context models --- for tweet search, the evolution of an event serves as the context of clustering relevant tweets; for voice search, user's history queries are the contexts for helping understand his true information need.

First, I work on the tweet search problem by modeling the temporal contexts of the underlying collection. My intuition is that an information need in Twitter usually correlates with a breaking news event, thus tweets posted during the time of that event are more likely to be relevant. That is, relevant tweets tend to cluster together in time. Such temporal clustering behaviors can be identified by modeling term statistics distributions of the underlying collection, and eventually exploited as a temporal relevance signal to boost search effectiveness.

Second, I model the sequential dependences in a stream of queries in the voice search setting, where users interact with a voice-enabled remote controller through voice requests to search the TV programs (i.e., Game of Thrones). My work is motivated by a case study on 1.9M voice queries collected from the XFINITY entertainment platform, through which I found such queries are short, even shorter than comparable voice queries in other domains, which offers fewer opportunities for deciphering user intent.

Additionally, the various ways to search a program and the underlying speech recognition errors exacerbate ambiguity. In such scenarios, looking into a sequence of user's successive queries can help collect important clues towards revealing the user's ultimate intent. I therefore introduce the voice search session to capture the contextual dependencies in query sequences.

The key to both problems is an effective technique for sequence modeling. For tweet search, how to represent the temporal distribution of events is crucial for a reliable estimation of temporal relevance scores; for voice search, how to model the sequence of elements within a query and temporal contexts across queries are also essential to accurate predictions.

To this end, I first explored a technique for tweet search to estimate the temporal distributions of events from distributions of collection term frequencies. Given a query, we adopt kernel density estimation to estimate the collection frequency distributions of its query terms. Then the temporal distribution of events can be approximated by applying a nonlinear regression approach over multiple term distributions, from which we can estimate the temporal relevance score for each tweet given its timestamp. Experiments on two Twitter collections demonstrated this technique contributes additional temporal relevance signal to improve tweet search.

Secondly, I proposed a probabilistic framework in which recurrent and feedforward neural networks are organized in a hierarchical manner to model the temporal contexts across queries in voice search. I verified this approach by conducting experiments on large-scale real-world datasets, which are proved significantly more effective than other approaches with no temporal contexts, like SVM Rank, LSTM and the current deployed Xfinity system.

As the final component of my PhD, I propose to develop an end-to-end chat robot that can automatically generate conversations between two agents without any human intervention. I aim to explore the temporal context modeling techniques in a sequential conversion setting for constructing a fluent, succinct and meaningful dialog system.

Examining Committee:

Chair: Dr. Jimmy Lin

Dept rep: Dr. Amol Deshpande

Member: Dr. Marine Carpuat

Upcoming Events

Event

04.26.2024 12:00 to 13:30

IRB-4105

Computer Science APT Meeting

Event

04.26.2024 13:00 to 14:00

IRB-5105

Computer Science Instructional Faculty Meeting

Talk

04.26.2024 13:30 to 15:00

ATL 3100A

PhD Proposal: Towards the Verification of Quantum Networks
Yusuf Alnawakhtha

Event

04.26.2024 15:00 to 16:30

IRB-0318

Computer Science Education Committee Meeting

Talk

04.29.2024 11:30 to 12:30

IRB 4107

PhD Proposal: Multi-Agent Autonomous Decision Making in Artificial Intelligence
Saptarashmi Bandyopadhyay

Talk

04.29.2024 15:00 to 16:00

IRB 5105

PhD Proposal: Scaling Policy Gradient Methods to Open-Ended Domains
Ryan Sullivan

Talk

04.30.2024 10:00 to 12:00

IRB 4105

AI Empowered Music Education
Snehesh Shrestha

Talk

04.30.2024 12:30 to 15:00

IRB 4107

Towards Trustworthy Models in Machine Learning
Xiaoyu Liu

Talk

05.01.2024 15:00 to 17:00

IRB IRB-4105

PhD Defense: Feedback for Vision
Michael Maynord

Talk

05.02.2024 12:30 to 14:00

IRB 4107

Towards AI Alignment: Advancing Fairness, Reliability, and Human-Like Perception in AI
Bang An