Hotspots are a major obstacle to
achieving scalability in the Internet;
they
are usually caused by either high
demand for some data or high demand for a certain service.
At the application layer, hotspot problems have traditionally
been dealt with using some combination of
increasing capacity,
spreading the load over time and/or space,
and changing the workload.
Some examples of these are
data replication (web caching, ftp mirroring),
data replacement (multi-resolution images, video),
service replication (DNS lookup, Network Time Protocol),
and server push (news or software distribution).
These classes of solutions have been studied
in the context of applications using the following types of communication:
(a) one-to-many (data travels primarily from a server to multiple clients,
e.g., web download, software distribution, video-on-demand);
(b) many-to-many (data travels between multiple clients, through
either a centralized or a distributed server, e.g., chat rooms,
video conferencing);
and
(c) one-to-one (data travels between two clients, e.g., e-mail, e-talk).
However, to the best of our knowledge there is no existing work,
except ours
[D21,
D24,
D25],
on making applications using
many-to-one communication scalable and efficient;
existing solutions, such as web based uploads, simply use
many independent one-to-one transfers.
This corresponds to an important class of applications, whose examples include
the various upload applications such as
submission of income tax forms to IRS,
conference paper submission, proposal submission through the
NSF FastLane, homework and project submissions in
distance education, Internet-based storage, and many more.
The main focus of our work is scalable
infrastructure design for wide-area upload applications.
Traditional solutions aimed at downloads
are data replication (e.g., caching) and data replacement.
Clearly, these techniques are not applicable to uploads since
all the data is distinct.
Recently
[D21]
we proposed Bistro,
a framework for building
scalable wide-area upload applications which
employs the use of intermediaries, termed bistros,
for improving the efficiency and scalability of uploads.
We observed that the existence of hotspots in many upload applications
is due to approaching deadlines and long transfer times
(although here we focus on
uploads with deadlines, our framework can provide a scalable
solution to other upload applications as well).
We also observed that what is actually required by many
upload applications is an assurance
that specific data was submitted before a specific time, and
that the transfer of the data needs to be done
in a timely fashion, but does not have to occur by that deadline
(since the data is often not consumed by the server immediately
upon receipt).
Thus, our approach is to break the original deadline-driven upload problem into
the following pieces:
(a) a real-time timestamp subproblem,
where we ensure that the data is timestamped
and that the data cannot be subsequently tampered with;
(b) a low latency commit subproblem,
where the data goes ``somewhere'' (to an intermediary)
and the user is assured that the
data is safely and securely ``on its way'' to the server; and
(c) a timely data transfer subproblem,
which can be carefully planned (and coordinated with other uploads)
and results in data delivery to the original destination.
This means that we have taken a traditionally
synchronized client-push solution and replaced it with a
non-synchronized solution that uses some combination of
client-push and server-pull approaches. Consequently,
we eliminate the hotspots
by spreading most of the demand on the server over time.
Bistro's ability to share an infrastructure,
such as an infrastructure of proxies, between
a variety of wide-area applications has clear advantages over
the more traditional solutions.
In
[D25],
we conducted a performance study which
demonstrated the potential performance gains of the Bistro framework
as well as provided insight into the general upload problem.
Moreover, Bistro does not rely on the existence of a
private infrastructure; however it does not preclude it either.
Since confidentiality of data as well as other security
issues are especially important in upload applications
and in our solution where we introduced untrusted (public)
intermediaries (i.e., bistros), we also developed
[D24]
a secure data transfer protocol within the Bistro framework,
which not only ensures the privacy and integrity of the data
but also takes scalability considerations into account.
|