Note: for each class (after the intro material), 4 students will be responsible for emailing me (als@cs.umd.edu) with ~4 discussion question on the reading(s) for that day by 6PM the day before the class, and be prepared to ask those questions and help explain the paper to the rest of the class.
8/28 Parallel Computing and Parallel Computers
8/30 Applications of Parallel Computing
J. J. Dongarra, S. W. Otto, M. Snir, and D. Walker, "A message passing standard for MPP and workstations," Communications of the ACM, 39(7), 1996, pp. 84-90. [PDF]
9/13 Expressing Parallelism (Implicit Control) - Tauqir Abdullah, Omid Aramoon, Sigurthor Bjorgvinsson, Abhishek Chakraborty
William W. Carlson et. al, "Introduction to UPC and Language Specification," CCS-TR-99-157. [PDF]
L. Dagum and R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science & Engineering, 5(1), 1998. [PDF]
B.R. de Supinski et. al, " The Ongoing Evolution of OpenMP ", Proceedings of the IEEE, pre-publication, 2018. [PDF]
9/18 Expressing Parallelism (Hybrids) - Taeyoung An, Gregory Davis, Daniel Gerzhoy, Seyhan Gul
Steve W. Bova et. al, "Parallel Programming with Message Passing and Directives", Computing in Science & Engineering, 3(5), 2001. [PDF]
Brent Leback, Michael Wolfe, and Douglas Miles "The PGI Fortran and C99 OpenACC Compilers", In Proceedings of Cray User Group (CUG) meeting, 2012. [PDF]
9/20 Expressing Parallelism (Frameworks) - Shilei Han, Ananth Hari, Gregory Harris, Katura Harvey
S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith, "Efficient Management of Parallelism in Object Oriented Numerical Software Libraries", In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163--202, Birkhäuser Press, 1997. [PDF]
T. Goodale, G. Allen, G. Lanfermann, J. Massó, T. Radke, E. Seidel, and J. Shalf., "The Cactus Framework and Toolkit: Design and Applications", In Proceedings of Vector and Parallel Processing - VECPAR 2002, Springer, 2003. [PDF]
9/25 Shared Memory - Charles Hastings, Kesha Hietala, Tao Hu, Luyi Kang
9/27 Message Passing and Communication - Koyu Kawasaki, Yunchuan Li, Ravi Lumba, Deshvir Malik
Robert M. Metcalfe and David R. Boggs, "Ethernet: distributed packet switching for local computer networks," Communications of the ACM, (19)7, 1976. [PDF]
Mellanox Technologies white paper, "Introduction to InfiniBand.". [PDF]
10/2 Custom Machines - Yi Mao, Christopher Maxey, Xiaoxu Meng, Ankit Mondal, Ameya Patil
S.R. Alam, J.A. Kuehn, R.F. Barrett, J.M. Larkin, M.R. Fahey, R. Sankaran, P.H. Worley, "Cray XT4: An Early Evaluation for Petascale Scientific Simulation", In Proceedings of SC'07, Nov. 2007. [PDF]
A. Gara, et. al, "Overview of the Blue Gene/L system architecture", IBM Journal of Research and Development, 49(2/3), Fall 2005. [PDF]
10/4 GPUs - Mrinalgouda Patil, Alexander Reustle, Thomas Rolinger, Peter Salvesen, Avi Schwarzschild
NVIDIA white paper, "NVIDIA Tesla P100, featuring Pascal GP100 GPU". [PDF], through page 30
"Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU", In Proceedings of 2010 International Symposium on Computer Architecture (ISCA), May 2010. [PDF]
10/9 Computational Grids - Brendan Sheehy, Yu Shen, Joanna Shoemaker, Devesh Singh, Jiahao Su
I. Foster and C. Kesselman, "Computational Grids", Chapter 2 of The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999. [PDF]
A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets", Journal of Network and Computer Applications, 23:187-200, 2001. [PDF]
10/11 Clouds - Qingyang Tan, Xiangxue Zhao, Tauqir Abdullah, Taeyoung An, Omid Aramoon
Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters", In Proceedings of OSDI'04, pp. 137-150 [PDF]
Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin, "MapReduce and Parallel DBMSs: Friends or Foes?", Communications of the ACM, 53(1), Jan. 2010, pp. 64-71. [PDF]
10/16 Clouds, cont. - Sigurthor Bjorgvinsson, Abhishek Chakraborty, Gregory Davis, Daniel Gerzhoy, Seyhan Gul
M. Zaharia, R. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica, "Apache Spark: A Unified Engine for Big Data Processing,", Communications of the ACM, 59(11), Nov. 2016. [PDF]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker and I. Stoica, " Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center", In Proceedings of 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), USENIX, March 2011. [PDF]
10/18 Event Ordering and Race Detection - Shilei Han, Ananth Hari, Gregory Harris, Katura Harvey, Charles Hastings
L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System", Communications of the ACM, 21(7), 1978, pp. 558-564. [PDF]
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson, "Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs", In Proceedings of the 16th Symposium on Operating Systems Principles, ACM Press, Oct. 1997. [PDF]
10/23 Data Collection and Instrumentation - Kesha Hietala, Tao Hu, Luyi Kang, Koyu Kawasaki, Yunchuan Li
Nicholas Nethercote and Julian Seward, "Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation", In Proceedings of the 2007 ACM/SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2007. [PDF]
B. R. Buck and J.K. Hollingsworth , "An API for Runtime Code Patching," International Journal of High Performance Computing Applications, 14(4), Winter 2000, pp. 317-329. [PDF]
10/25 No Class - work on research projects!
10/30 Cache Tools - Ravi Lumba, Deshvir Malik, Yi Mao, Christopher Maxey, Xiaoxu Meng
J. Mellor-Crummey, D. Whalley, and K. Kennedy, "Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings", International Journal of Parallel Programming, 29(3), June 2001. [PDF]
Margaret Martonosi, Anoop Gupta, Thomas Anderson, "MemSpy: analyzing memory system bottlenecks in programs", ACM SIGMETRICS Performance Evaluation Review, 20(1), 1992. [PDF]
11/1 Runtime Parallelization - Ameya Patil, Mrinalgouda Patil, Alexander Reustle, Thomas Rolinger, Peter Salvesen
J. Nieplocha et. al, "Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit", International Journal of High Performance Computing Applications, 20(6), 2006. [PDF]
G. Agrawal, A. Sussman, and J. Saltz, "An Integrated Runtime and Compile-time Approach for Parallelizing Structured and Block Structured Applications", IEEE Transactions on Parallel and Distributed Systems, 6(7), 1995. [PDF]
11/6 Autotuning - Brendan Sheehy, Yu Shen, Joanna Shoemaker, Devesh Singh, Qingyang Tan
P. Balaprakash et. al, "Autotuning in High-Performance Computing Applications", Proceedings of the IEEE, 106(11), November 2018. [PDF]
R. Clint Whaley, Antoine Petitet and Jack J. Dongarra, "Automated empirical optimizations of software and the ATLAS project ", Parallel Computing, 27(1), 2001. [PDF]
11/8 Finding Idle Cycles - Tauqir Abdullah, Taeyoung An, Omid Aramoon, Sigurthor Bjorgvinsson, Abhishek Chakraborty
M. Litzkow, M. Livny, and M. Mutka, "Condor - A Hunter of Idle Workstations", In Proceedings of International Conference on Distributed Computing Systems, June 1988, pp. 104-111. [PDF]
D. Thain, T. Tannenbaum, and M. Livny " Distributed Computing in Practice: The Condor Experience", Concurrency and Computation: Practice and Experience , Vol. 17, Nos. 2-4, 2005. [PDF]
David P. Anderson, Carl Christensen and Bruce Allen, "Designing a Runtime System for Volunteer Computing", In Proceedings of SC'06, November 2006. [PDF]
11/13 Midterm Exam
11/15 Scheduling - Batch Queues - Gregory Davis, Daniel Gerzhoy, Seyhan Gul, Ananth Hari, Gregory Harris, Katura Harvey, Charles Hastings, Kesha Hietala, Tao Hu
D. G. Feitelson and A. M. a. Weil, "Utilization and Predictability in Scheduling the IBM SP2 with Backfilling", 12th International Parallel Processing Symposium, April 1998. Use this extended form [PDF]
J. Weinberg and A. Snavely, "Symbiotic Space-Sharing on SDSC's DataStar System", 12th Workshop on Job Scheduling Strategies for Parallel Processing, 2006. [PDF]
11/20 Parallel I/O - Luyi Kang, Koyu Kawasaki, Yunchuan Li, Ravi Lumba, Deshvir Malik, Yi Mao, Christopher Maxey, Xiaoxu Meng, Ameya Patil
Terry Jones, Alice Koniges, and R. Kim Yates, "Performance of the IBM General Parallel File System", In Proceedings of 14th International Parallel and Distributed Processing Symposium (IPDPS'00), April 2000. [PDF]
A. Acharya, M. Uysal, and J. Saltz, "Active Disks: Programming Model, Algorithms and Evaluation", In Proceedings of Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998. [PDF]
11/22 Thanksgiving
11/27 Applications - Mrinalgouda Patil, Alexander Reustle, Thomas Rolinger, Peter Salvesen, Brendan Sheehy, Yu Shen, Joanna Shoemaker, Devesh Singh, Qingyang Tan
U. Catalyurek, M. Beynon, C. Chang, T. Kurc, A. Sussman, and J. Saltz, "The Virtual Microscope", IEEE Transactions on Information Technology in Biomedicine, Vol. 7, No. 4, 2003. [PDF]
David E. Shaw et. al, "Millisecond-scale molecular dynamics simulations on Anton", In Proceedings of SC09, November 2009. [PDF]
11/29 Project Demos
12/4 Project Demos
12/6 Project Demos and SC17 Gordon Bell award finalist
Haohuan Fu et. al, "Redesigning CAM-SE for Peta-Scale Climate Modeling Performance and Ultra-High Resolution on Sunway TaihuLight ", In Proceedings of SC17, November 2017. [PDF]