Department of Computer Science  
University of Maryland  
College Park, Maryland 20742
These publications may be viewed using either Postscript Files or PDF Files
Click here to remove abstracts

Compiler and Runtime Support for Programming in Adaptive Parallel Environments

Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, and Joel Saltz  

Scientific Programming, January 1997

University of Maryland Technical Report CS-TR-3510 and UMIACS-TR-95-83

For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at runtime. In this paper, we discuss runtime support for data parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a runtime library to provide this support. We discuss how the runtime library can be used by compilers of HPF-like languages to generate code for an adaptive environment. We present performance results for Navier-Stokes solver on a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varies frequently, the cost of data distribution is not significant when compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network on non-dedicated workstations, which are likely to be an important resource for parallel programming in the future.

Interprocedural Compilation of Irregular Applications for Distributed Memory Machines

Gagan Agrawal and Joel Saltz

Proceedings of Supercomputing 95, San Diego, California, December 3-8, 1995

Department of Computer Science Technical Reports CR-TR- 3447

Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications that have irregular data access pattern, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (PRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present three new interprocedural optimizations: placement of scatter routines, deletion of data structures and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D implementation system as the necessary infrastructure. We present experimental results from two codes compiled using our system to demonstrate the efficacy of the presented schemes.

Interprocedural Data Flow Based Optimizations for Compiling Irregular Applications

Gagan Agrawal and Joel Saltz

Proceedings of Languages and Compilers for Parallel Computing, August 1995

Runtime Support and Compilation Methods for User-Specified Data Distributions

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, and Geoffrey Fox

IEEE Transactions on Parallel and Distributed Memory Systems, pages 815-831, Volume 6, Number 8, August 1995 

University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135    

This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications

Gagan Agrawal, Alan Sussman, and Joel Saltz

IEEE Parallel and Distributed Systems, pages 747-754, Volume. 6, Number 7, July 1995 

Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Interprocedural Partial Redundancy Elimination and Its Application to Distributed Memory Compilation

Gagan Agrawal, Joel Saltz, and Raja Das

Conference on Programming Language Design and Implementation '95, pages 258-269, La Jolla, California, June 1995

Partial Redundancy Elimination (PRE) is a general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redundant code elimination. In this paper, we address the problem of performing this optimization interprocedurally. We use interprocedural partial redundancy elimination for placement of communication and communication preprocessing statements while compiling for distributed memory parallel machines.

Distributed Memory Compiler Design for Sparse Problems

J. Wu, Raja Das, Joel Saltz, H. Berryman, and S. Hiranandani

IEEE Transactions on Computers, pages 737-753, Volume 44, Number 6, June 1995

This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accessed that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used effectively by a compiler to generate efficient codes in these situations. The compiler accepts as input a Fortran 77 program enhanced with specifications for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. A variety of performance results on the Intel iPSC/860 are presented.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers

Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

IEEE Parallel & Distributed Technology, Spring 1995 (revised version) 

University of Maryland: Department of Computer Science and UMIACS Technical Reports CR-TR-3268 and UMIACS-TR-94-57

Interprocedural Communication Optimizations for Distributed Memory Compilation

Gagan Agrawal and Joel Saltz

Proceedings of Languages and Compilers for Parallel Computing, pages 283-299, 1994

Value-Based Distributions in Fortran D: A Preliminary Report

Reinhard v. Hanxleden, Ken Kennedy, and Joel Saltz

Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines, Volume 2, Number 3, December 1993

CRPC 1993, Number CRPC-TR93365-S

Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple functions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer to conveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an opportunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

Compiler and Runtime Support for Structured and Block Structured Applications

Gagan Agrawal, Alan Sussman, and Joel Saltz

Proceedings of Supercomputing '93, pages 578-587, Portland Oregon, November 15-19, 1993 

IEEE Computer Society Press

Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse

Ravi Ponnusamy, Joel Saltz, and Alok Choudhary

Proceedings Supercomputing '93,  pages 361-370, Portland, Oregon, November 15-19, 1993 

IEEE Computer Society Press

In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Slicing Analysis and Indirect Access to Distributed Arrays

Raja Das, Joel Saltz, and Reinhard von Hanxleden

Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing, pages 152-168, Portland Oregon, August 12-14, 1993

An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an implementation in Fortran D prototype compiler is in progress.

Compiler Analysis for Irregular Problems in Fortran D

Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, and Joel Saltz.

Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing, pages 97-111, New Haven, Connecticut, August 3-5, 1992

We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for tracking and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situations which allow runtime preprocessing overheads to be amortized. This dataflow analysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime pre-processing.

 

Compilers and Runtime Software for Scalable Processors

Edited by: J. Saltz and P. Mehrotra

Published by Elsevier, The Netherlands, 1992

Runtime Compilation for Multiprocessors

Concurrency Practice and Experience

Volume 3:3, pages: 573-592, 1991

[Applications | High Performance I/O | Compilers | Tools]