MauveDB: Supporting Model-based User Views over Sensor Data

The MauveDB project is motivated by the tremendous increase in the number of distributed measurement infrastructures such as wireless sensor networks that continuously generate invaluable data about our everyday world. However, the potential of this data has been hard to realize mainly because of the typically incomplete, imprecise, erroneous, and uncertain nature of the data generated. The MauveDB project aims to develop abstractions that make it easy for users and application developers to continuously apply statistical modeling tools to streaming sensor data. Such statistical models can be used for data cleaning, prediction, interpolation, anomaly detection and for inferring hidden variables from the data, thus addressing many of the challenges in managing sensor data.

MauveDB supports a new abstraction called "model-based views" to achieve the above goal. A model-based view is analogous to a traditional database view in that it can be used to present a consistent "view" of the underlying data to the user. However, as opposed to a traditional database view, a model-based view is defined using a statistical model instead of an SQL query. This not only significantly enriches the user interaction with the sensed data, but also results in more efficient processing of data.

A brief overview of the MauveDB project and the underlying technology

Publications

Declarative Analysis of Noisy Information Networks;
Walaa Eldin Moustafa, Galileo Namata, Amol Deshpande, Lise Getoor;
ICDE Workshop on Graph Data Management: Techniques and Applications (GDM 2011).

There is a growing interest in methods for analyzing data describing networks of all types, including information, biological, physical, and social networks. Typically the data describing these networks is observational, and thus noisy ... [more]

Lineage Processing on Correlated Probabilistic Databases;
Bhargav Kanagal, Amol Deshpande;
SIGMOD 2010. [pdf]

In this paper, we address the problem of scalably evaluating conjunctive queries over correlated probabilistic databases containing tuple or attribute uncertainties. Like previous work, we adopt a two-phase approach where we first ... [more]

Indexing Correlated Probabilistic Databases;
Bhargav Kanagal, Amol Deshpande;
SIGMOD 2009. [pdf]

With large amounts of correlated probabilistic data being generated in a wide range of application domains including sensor networks, information extraction, event detection etc., effectively managing and querying them has become an ... [more]

Efficient Query Evaluation over Temporally Correlated Probabilistic Streams;
Bhargav Kanagal, Amol Deshpande;
ICDE 2009 (short paper). [pdf]

Many real world applications such as sensor networks and other monitoring applications naturally generate probabilistic streams that are highly correlated in both time and space. Query processing over such streaming data must be ... [more]

Online Filtering, Smoothing and Probabilistic Modeling of Streaming data;
Bhargav Kanagal, Amol Deshpande;
ICDE 2008. [pdf] (Extended version)

In this paper, we address the problem of extending a relational database system to facilitate efficient real-time application of dynamic probabilistic models to streaming data. We use the recently proposed abstraction of model-based views ... [more]

Model-based Querying in Sensor Networks;
Amol Deshpande, Carlos Guestrin, Samuel Madden;
Chapter in Encyclopedia of Database Systems. Ling Liu and M. Tamer Ozsu, ed. 2009.. [pdf]

MauveDB: Supporting Model-based User Views in Database Systems;
Amol Deshpande, Sam Madden;
SIGMOD 2006. [pdf] [talk]

Real-world data --- especially when generated by distributed measurement infrastructures such as sensor networks --- tends to be incomplete, imprecise, and erroneous, making it impossible to present it to users or feed it directly ... [more]

Model-based Approximate Querying in Sensor Networks;
Amol Deshpande, Carlos Guestrin, Sam Madden, Joseph M. Hellerstein, Wei Hong;
International Journal on Very Large Data Bases (VLDB Journal), 2005. [pdf]

Declarative queries are proving to be an attractive paradigm for interacting with networks of wireless sensors. The metaphor that "the sensornet is a database" is problematic, however, because sensors do not exhaustively represent ... [more]

Using Probabilistic Models for Data Management in Acquisitional Environments;
Amol Deshpande, Carlos Guestrin, Sam Madden;
CIDR 2005. [pdf]

Traditional database systems, particularly those focused on capturing and managing data from the real world, are poorly equipped to deal with the noise, loss, and uncertainty in data. We discuss a suite of ... [more]

Model-Driven Data Acquisition in Sensor Networks;
Amol Deshpande, Carlos Guestrin, Sam Madden, Joseph M. Hellerstein, Wei Hong;
VLDB 2004. [pdf] Recipient of the best paper award.

Efficient Stepwise Selection in Decomposable Models;
Amol Deshpande, Minos Garofalakis, Mike Jordan;
UAI 2001. [pdf] [talk]

In this paper, we present an efficient algorithm for performing stepwise selection in the class of decomposable models. We focus on the forward selection procedure, but we also discuss how backward selection and the combination of the ... [more]

The MauveDB Project

Publications

Project Participants

External Collaborators

Acknowledgments