Note: for each class (after the intro material), 4 students will be responsible for emailing me (asussman@umd.edu) with ~4 discussion question on the reading(s) for that day by 6PM the day before the class, and be prepared to ask those questions and help explain the paper to the rest of the class.
1/28 Parallel Computing and Parallel Computers
1/30 Applications of Parallel Computing
2/4 Expressing Parallelism (Implicit Control)
L. Dagum and R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science & Engineering, 5(1), 1998. [PDF]
B.R. de Supinski et. al, " The Ongoing Evolution of OpenMP ", Proceedings of the IEEE, 106(11), 2018. [PDF]
2/6-2/11 Expressing Parallelism (Implicit Control, cont.) - A. Borhani, C. Charles, A. Coppens, S. Cui
B.L Chamberlain, Chapel chapter in Programming Models for Parallel Computing, edited by Pavan Balaji, MIT Press, 2015.[PDF]
J. Bezanson, A. Edelman, S. Karpinski, and V.B. Shah,, "Julia: A Fresh Approach to Numerical Computing", SIAM Review, Vol. 59, No. 1, 2017. [PDF]
J. J. Dongarra, S. W. Otto, M. Snir, and D. Walker, "A message passing standard for MPP and workstations," Communications of the ACM, 39(7), 1996, pp. 84-90. [PDF]
R. Thakur and W. Gropp, "Open Issues in MPI Implementation," In: L. Choi, Y. Paek, S. Cho (eds), Advances in Computer Systems Architecture. ACSAC 2007, 2007. [PDF]
R. Thakur, R. Rabenseifner, and W. Gropp, "Optimization of Collective Communication Operations in MPICH," The International Journal of High Performance Computing Applications, 19(1), 2005. [PDF]
T. Saif and M. Parashar, "Understanding the Behavior and Performance of Non-blocking Communications in MPI," In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds), Euro-Par 2004 Parallel Processing Conference, 2004. [PDF]
2/20-25 Expressing Parallelism (Hybrids and Frameworks) - N. Kadawedduwa, J. Kim, J. Liu, Z. Lu
Steve W. Bova et. al, "Parallel Programming with Message Passing and Directives", Computing in Science & Engineering, 3(5), 2001. [PDF]
S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith, "Efficient Management of Parallelism in Object Oriented Numerical Software Libraries", In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, Birkhäuser Press, 1997. [PDF]
2/27 CUDA and GPUs
No readings assigned, slides only
3/4 Profiling Programs - R. Misra, A. Namjoo, Q. Nguyen, R. Parsons
S.L. Graham, P.B. Kessler, and M.K. McKusick, "gprof: a Call Graph Execution Profiler," Proceedings of the SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, Vol. 17, No. 6, 1982. [PDF]
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel,G. Marin, J. Mellor-Crummey,and N.R.Tallent, "HPCTOOLKIT: tools for performance analysis of optimized parallel programs," Concurrency and Computation: Practice and Experience, Vol. 22, 2010. [PDF]
3/6 Scientific Workflows - C. Hutton, X. Qi, M. Sukanya, X. Tian
E. Deelman, K. Vahi, et. al, "Pegasus, a workflow management system for science automation," Future Generation Computer Systems, Vol. 46, May 2015. [PDF]
J.M. Wozniak, T.G. Armstrong, M. Wilde, D. Katz, E. Lusk, and I.T. Foster "Swift/T: Large-scale application composition via distributed-memory data flow processing," Proceedings of 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), , May 2013. [PDF]
3/11 Shared Memory - J. Umeike, R. Vishnoi, F. H.-Y. Yeh, A. Borhani
J. Laudon and D. Lenoski, "The SGI Origin: a ccNUMA highly scalable server," In Proceedings of 1997 International Symposium on Computer Architecture (ISCA '97), May 1997. [PDF]
SGI, "Technical Advances in the SGI® UV Architecture™," SGI White paper, 2012. [PDF]
3/13 Custom Machines - C. Charles, A. Coppens, S. Cui, J. Dai
S.R. Alam, J.A. Kuehn, R.F. Barrett, J.M. Larkin, M.R. Fahey, R. Sankaran, P.H. Worley, "Cray XT4: An Early Evaluation for Petascale Scientific Simulation", In Proceedings of SC'07, Nov. 2007. [PDF]
A. Gara, et. al, "Overview of the Blue Gene/L system architecture", IBM Journal of Research and Development, 49(2/3), Fall 2005. [PDF]
3/25 GPUs - T. Dao, A. Frolov, L. Hough, C. Hutton
W.J. Dally, S.W. Keckler, D.B. Kirk, "Evolution of the Graphics Processing Unit (GPU)," IEEE Micro, (41)6, 2021. [PDF]
V.W. Lee et. al, "Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU", In Proceedings of 2010 International Symposium on Computer Architecture (ISCA), May 2010. [PDF]
3/27 High Performance Networks - N. Kadawedduwa, J. Kim, J. Liu, R. Misra
Mellanox Technologies white paper, "Introduction to InfiniBand.", 2003. [PDF]
R. Alverson, D. Roweth, L.Kaplan, "The Gemini System Interconnect," In Proceedings of the 18th Symposium on High Performance Interconnects, Aug. 2010. [PDF]
4/1 Clouds - Z. Lu, A. Namjoo, Q. Nguyen, R. Parsons
Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters", In Proceedings of OSDI'04, pp. 137-150 [PDF]
Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin, "MapReduce and Parallel DBMSs: Friends or Foes?", Communications of the ACM, 53(1), Jan. 2010, pp. 64-71. [PDF]
4/3 Clouds, cont. - X. Qi, M. Sukanya, X. Tian, J. Umeike
M. Zaharia, R. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica, "Apache Spark: A Unified Engine for Big Data Processing,", Communications of the ACM, 59(11), Nov. 2016. [PDF]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker and I. Stoica, "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center", In Proceedings of 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), USENIX, March 2011. [PDF]
4/8 Event Ordering and Race Detection -
L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System", Communications of the ACM, 21(7), 1978, pp. 558-564. [PDF]
C. von Praun, "Race Detection Techniques", In D. Padua (eds) Encyclopedia of Parallel Computing, Springer, 2011, pp. 1697-1706. [PDF]
4/10 Data Collection and Instrumentation -
Nicholas Nethercote and Julian Seward, "Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation", In Proceedings of the 2007 ACM/SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2007. [PDF]
B. R. Buck and J.K. Hollingsworth , "An API for Runtime Code Patching," International Journal of High Performance Computing Applications, 14(4), Winter 2000, pp. 317-329. [PDF]