Frederic T. Chong, Shamik D. Sharma, Eric Brewer, Joel Saltz
Published in: Parallel Processing Letters, Volume 5, Number 4, pages 671-683, 1995.
University of Maryland Technical Report: CS-TR-3266 March 1994.
We achieve these speedups with non-matrix-specific methods which are applicable to any DAG. We compare a range of run-time preprocessed and dynamic approaches on matrices from the Harwell-Boeing benchmark set. Although precomputed data distributions and execution schedules produce the best performance, we find that it is challenging to keep their cost low enough to make them worthwhile on small, fine-grained problems.
Additionally, we find that a policy of frequent network polling can reduce communication overhead by a factor of three over the standard CM-5 policies. We present a detailed study of runtime overheads and demonstrate that send and receive processor overhead still dominate these applications on the CM-5. We conclude that these applications would highly benefit from architectural support for low-overhead communication.
Questions about the system or webserver:
webmaster@cs.umd.edu
Problems with publications homepage:
wes@cs.umd.edu