Remotely-sensed data acquired from satellite-based sensors is widely used in
geographical, meteorological and environmental studies. A typical analysis processes
satellite data for ten days to a year and generates one or more raster images of the area
under study. The output images are usually significantly smaller than the input data. For
example, a 10-day full-globe analysis over coarse-grained satellite data (4km per pixel)
processes approximately 4GB of data to generate a 228MB multi-band image. This data
reduction is achieved by composition of information corresponding to different days.
Several database systems have been designed to handle the output raster images
generated from analyzing raw satellite data and provide powerful query operations
including various forms of spatial joins. However, they are not suitable for managing and
processing the raw satellite data. Titan is a parallel shared-nothing database designed to
support management and efficient data processing over remote-sensing data. It uses data
declustering and placement techniques to fully exploit all the I/O bandwidth provided by a
suitably configured disk farm. The system provides low-latency retrieval of very large
volumes of spatio-temporal data from secondary storage for efficient data processing. A
simplified R-tree is used to efficiently identify the subset of data that corresponds to
the region and time periods of interest. Furthermore, Titan integrates data processing and
retrieval so that data processing can be performed efficiently on the same machine that
the satellite data is stored on. As a result, only output images are communicated to the
clients, and not the input data, which is often much larger than the output images. Titan
coordinates the operations for data processing and retrieval so that the retrieved
satellite data is processed in a pipelined fashion and I/O, communication and computation
can be fully overlapped to reduce latency.
Titan is currently operational on the Maryland SP-2, and contains about 24GB of
data from the Advanced Very High Resolution Radiometer (AVHRR) on the NOAA-7 satellite.
Experimental results have shown that Titan provides good performance for global queries,
and interactive response times for local queries. We are currently in the process of a
adapting Titan into T2, our generalized infrastructure for building customized parallel
database systems, which enables integration of storage, retrieval and processing of
multi-dimensional datasets to support a wide range of data-intensive applications.