One of the biggest challenges for efficient data access over the Internet and
other higher end wide area networks is effective adaptation to resource fluctuations.
There are several emerging high end applications that are likely to benefit from an
ability to adapt to changes in availability of network and computational resources. These
applications fall into two roughly defined categories, metacomputing applications and
customized multicast (customcast) applications.
Metacomputing applications often involve synthesis of information obtained from
multiple sources. These applications may generate output products by combining data
obtained from a variety of sources. Data sources can consist of archived information,
information generated by physical simulations, information from scientific instruments
such as light or electron microscopes or data from sensors based on land, aircraft, or
satellites.
We anticipate that the process of data product generation will require moving
large amounts of data through variable quality network links. A common metacomputing
scenario organizes the computation as a pipeline consisting of a sequence of programs. For
instance in defense and disaster response scenarios, there is a need to generate data
products that combine information from combinations of satellite sensors, aircraft sensors
and ground based sensors. There are often significant computational costs associated with
data combining. Resource aware scheduling may be needed to deal with changes in network
characteristics, changes in the computational demands associated with data combination, or
changes in availability of computational resources.
Customized multicast (customcast) applications involve customized propagation of
data obtained from a single source. An example of customcast arises in an application
under development by our group in the Department of Pathology at Johns Hopkins Medical
School. This application, the Virtual Microscope, tries to achieve a realistic digital
emulation of a high power light microscope. An important design goal is to achieve
interactive response while the dataset is simultaneously explored by multiple users. The
utility of customcast, in this scenario, is indicated by the spatial and temporal locality
in the request patterns. While individual dataset are quite large (10-100 Gbytes per
specimen), most users usually access only those portions of the dataset that contain
features of medical interest. Moreover, due to psychophysical limitations of human data
processing, users frequently dwell on portions of the dataset that are of particular
interest.
Mobile programs can move an active thread of control from one site to another
during execution. This flexibility has many potential advantages. For example, a program
that searches distributed data repositories can improve its performance by migrating to
the repositories and performing the search on-site instead of fetching all the data to its
current location. Similarly, an Internet video-conferencing application can minimize
overall response time by positioning its server based on the location of its users.
Applications running on mobile platforms can react to a drop in network bandwidth by
moving network-intensive computations to a proxy host on the static network. The primary
advantage of mobility in these scenarios is that it can be used as a tool to adapt to
variations in the operating environment. Applications can use online information about
their operating environment and knowledge of their own resource requirements to make
judicious decisions about placement of computation and data.
In order to investigate resource aware computing, we have designed
and implemented Sumatra, an extension of Java that supports resource-aware mobile
programs. We also describe the design and implementation of a distributed resource monitor
that provides the information required by Sumatra programs.