CMSC 714 (Fall
2008)
Introduction
9/2 Parallel Computing and Parallel Computers
Lecture Notes
9/4 Applications of Parallel Computing
Lecture Notes
Programming Models
9/9 Expressing Parallelism (Explicit Control)
"The PVM Concurrent Computing System: Evolution, Experiences, and Trends", (PDF)
J. J. Dongarra, S. W. Otto, M. Snir, and D. Walker, "A message passing standard for MPP and workstations," CACM, 39(7), 1996, pp. 84-90. (PDF)
9/11 Introduction to Debugging Parallel Program
Lecture Notes
9/16 Expressing Parallelism (Implicit Control)
William W. Carlson , et al, “Introduction to UPC and Language Specification”, CCS-TR-99-157, (PDF)
L. Dagum and R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science & Engineering, 5(1), 1998, pp. 46-55. (PDF)
9/18 Expressing
Parallelism (Hybrids)
Kathy Yelick et. al.,
"Titanium: A High Performance Java Dialect", Concurrency: Practice
& Experience, 10(11-13), 1998. (PDF)
Steve
W. Bova et. al., "Parallel Programming with Message Passing and
Directives", Computing in Science & Engineering, 3(5), 2001,
pp. 22-37, (PDF)
9/23 Expressing Parallelism (Frameworks)
S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. Efficient management of parallelism in object oriented numerical software libraries. In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163–202. Birkhäuser Press, 1997. (PDF)
T. Goodale, G. Allen, G. Lanfermann, J. Massó, T. Radke, E. Seidel, and J. Shalf. The Cactus Framework and Toolkit: Design and Applications. In Vector and Parallel Processing - VECPAR ’2002, 5th International Conference. Springer, 2003. (PDF)
Architectures
9/25 Shared Memory
Laudon, J., Lenoski, D., “The SGI Origin: a ccNUMA highly scalable server”, ISCA '97, pp. 241-51, May 1997 (PDF)
Alan E Charlesworth , “The Sun Fireplane System Interconnect “, Proceedings of SC’01, Nov. 2001. (PDF)
9/30 Message Passing
and Communication
Robert M. Metcalfe , David R. Boggs, “Ethernet: distributed packet switching for local computer networks, Communications of the ACM”, v.19 n.7, p.395-404, July 1976 (PDF)
Fabrizio Petrini. Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan
Frachtenberg, “The Quadrics Network: High-Performance Clustering Technology,”
IEEE Micro Jan-Feb 2002, pp. 46-57. (PDF)
10/2 Vectors and Threading
Gail Alverson, Preston Briggs, Susan , Simon Kahan, Richard Korry, “Tera hardware-software cooperation”, SC’97, Nov. 1997, (PDF)
T. H. Dunigan, Jr, J.S. Vetter, J.B. White III, P. H.
Worley, "Performance Evaluation of the Cray X1 Distributed Shared-Memory
Architecture", IEEE Micro, 25(1), Jan. 2005. (PDF)
10/7 Not Vector, Not
Commodity
A. Gara, et. al, “Overview of the Blue Gene/L system architecture”, IBM
Journal of Research and Development, 49(2/3) Fall 2005, (PDF)
A. E. Eichenberger , et. al, “Using advanced compiler technology to
exploit the performance of the Cell Broadband Engine™ architecture”, IBM
Systems Journal, 45(1), Jan. 2006, (PDF)
10/9 Computational
Grids & Clouds
I. Foster and C.
Kesselman, "Computational Grids", Chapter 2 of The
Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann,
1999. (PDF)
Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data
Processing on Large Cluster”, Proceedings of OSDI’04, Pp. 137–150 (PDF)
Tools
10/14 Event Ordering (changed to 11/4)
L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System," CACM, 21(7), 1978, pp. 558-564 (PDF).
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson, "Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs," Proceedings of the 16th Symposium on Operating Systems Principles (PDF).
10/16 Performance Metrics
A. J. Goldberg and J. L. Hennessy, "Performance
Debugging Shared Memory Multiprocessor Programs with MTOOL", Supercomputing'91. Nov. 18-22, 1991,
J. K. Hollingsworth, "Critical Path Profiling of Message Passing and Shared-memory Programs," IEEE Transactions on Parallel and Distributed Computing, 9(10), 1998, pp. 1029-1040. (PDF).
10/21 Data Collection and Instrumentation
Nicholas Nethercote and Julian Seward. “Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation”, PLDI 2007, June 2007 (PDF)
B. R. Buck and J.K. Hollingsworth , “An API for
Runtime Code Patching,” Journal of High Performance Computing Applications, 14
(4) (Winter 2000), pp. 317-329. (PDF)
10/23 Scheduling -
Short Term
John K Ousterhout, "Scheduling Techniques for
Concurrent Systems", International Conference on Distributed Computing Systems,
1982, pp 22-30. (PDF).
A. C. Dusseau, R. H. Arpaci, D. E. Culler,
"Effective Distributed Scheduling of Parallel Workloads", ACM SIGMETRICS Conference on Measurement and
Modeling of Computer Systems, May 1996,
10/28 Performance Tools
B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall, "The Paradyn Parallel Performance Measurement Tools", IEEE Computer, Nov. 1995. 28(11), pp. 37-46 (PDF).
S. Shende and A. D. Malony, "The TAU Parallel Performance System," International Journal of High Performance Computing Applications, SAGE Publications, 20(2):287-331, Summer 2006 (PDF).
10/30 Computational Steering
W. Gu, G. Eisenhauer, E. Kraemer, K. Schwan, J.
Stasko, J. Vetter, and N. Mallavurupu, "Falcon: On-line Monitoring and Steering
of Large-Scale Parallel Programs," Frontiers '95. Feb 6-9, 1995,
R. L. Ribler, J. S. Vetter, H. Simitci, and D. A.
Reed, "Autopilot: Adaptive Control of Distributed Applications," High
Performance Distributed Computing,
11/4 Cache
Tools (changed to 10/14)
John Mellor-Crummey, David Whalley, Ken Kennedy, “Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings,” International Journal of Parallel Programming, 29(3), June 2001. (PDF)
Margaret Martonosi, Anoop Gupta, Thomas Anderson, “MemSpy: analyzing memory system bottlenecks in programs”, SIGMETRICS 92, (PDF)
11/6 Runtime Parallelism
S.J. Fink, S.R. Kohn,
and S.B. Baden, “Efficient Run-time Support for Irregular
Block-Structured Applications”, Journal of Parallel and Distributed
Computing, 50(1), 1998. (PDF)
G.
Agrawal, A. Sussman, and J. Saltz, “An Integrated Runtime and
Compile-time Approach for Parallelizing Structured and Block Structured
Applications”, IEEE Transactions on Parallel and Distributed
Computing, 6(7), 1995. (PDF)
Systems Issues
11/11 Scheduling – Batch Queues
D. G. Feitelson and A. M. a. Weil, "Utilization
and Predictability in Scheduling the IBM SP2 with Backfilling," 2th Intl.
Parallel Processing Symposium. April 1998,
J. Weinberg, A. Snavely, “Symbiotic Space-Sharing on SDSC's DataStar System”, 12th Workshop on Job Scheduling Strategies for Parallel Processing In Conjunction with SIGMETRICS 2006, Saint-Malo, France (PDF)
11/13 Midterm (moved to 11/20)
11/18 Finding Idle
Resources
M. Litzkow, M. Livny, and M. Mutka, "Condor - A Hunter of Idle Workstations," International Conference on Distributed Computing Systems. June 1988, pp. 104-111. (PDF).
David P. Anderson, Carl Christensen and Bruce Allen, "Designing a Runtime System for Volunteer Computing", In Proceedings of SC'06, November 2006. (PDF).
11/20 Work in Progress session (moved to 11/13)
11/25 Parallel I/O
Terry Jones, Alice Koniges and R. Kim Yates, “Performance of the IBM General Parallel File System,” 14th International Parallel and Distributed Processing Symposium (IPDPS'00), (PDF)[h1]
A.
Acharya, M. Uysal, and J. Saltz,
"Active Disks: Programming Model, Algorithms and Evaluation," Eighth
International Conference on Architectural Support for Programming Languages and
Operating Systems. Oct.1998,
11/27 Thanksgiving
12/2 Performance
Prediction
M. E. Crovella, Thomas J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles", Proceedings of Supercomputing '94, 1994. (PDF)
L. Carrington, M. Laurenzano, A. Snavely, R. Campbell, L. Davis, “How Well Can Simple Metrics Represent the Performance of HPC Applications?”, Proceedings of SC’05, Nov. 2005, (PDF)
12/4 Gordon Bell Winners
G. Alvarez., M. S. Summers., D. E. Maxwell., M. Eisenbach., J. S. Meredith., J. M. Larkin, J. Levesque, T. A. Maier., P. R. C. Kent., E. F. D Azevedo., and T. C. Schulthess, "New algorithm to enable 400+ TFlop/s sustained performance in simulations of disorder effects in high-Tc superconductors", SC'08 (PDF).
Lin-Wang Wang, Byounghak Lee, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, David H. Bailey, "Linearly Scaling 3D Fragment Method for Large-Scale Electronic Structure Calculations", SC'08 (PDF).
12/9 Project Presentations
12/11 Project Presentations