Compilers Publications

Publications on Compilers

Compiler and Runtime Support for Programming in Adaptive Parallel Environments
Published as: University of Maryland Technical Report CS-TR-3510 and UMIACS-TR-95-83

Submitted for Journal Publication
Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, Joel Saltz
For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at runtime. In this paper, we discuss runtime support for data parallel programming in such an adaptive environment. Excecuting programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a runtime library to provide this support. We discuss how the runtime library can be used by compilers of HPF-like languages to generate code for an adaptive environment. We present performace results for Navier-Stokes solver on a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that ifthe number of processors is not varies frequently, the cost of data distribution is not significant when compared to the time required for the actual computation. Overall, our work establishes the feasability of compiling HPF for a network on non-dedicated workstations, which are likely to be an important resource for parallel programming in the future.

Interprocedural Partial Redundancy Elimination and Its Application to Distributed Memory Compilation
Published in: Conference on Programming Languange Design and Implementation '95
pg: 258-269 June 1995, La Jolla California
Gagan Agrawal, Joel Saltz, Raja Das
Partial Redundancy Elimination (PRE) is a general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redundant code elimination. In this paper, we address the problem of performing this optimization interprocedurally. We use interprocedural partial redundancy elimination for placement of communication and communication preprocessing statements while compiling for distributed memory parallel machines.

Interprocedural Compilation of Irregular Applications for Distributed Memory Machines
Published in: SuperComputing '95
Department of Computer Science Technical Reports CR-TR- 3447
Gagan Agrawal, Joel Saltz
Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications that have irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (PRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present three new interprocedural optimizations: placement of scatter routines, deletion of data structures and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D implemenation system as the nessecary infrastracture. We present experimental results from two codes compiled using our system to demonstrate the efficacy of the presented schemes.

Interprocedural Communication Optimizations for Distributed Memory Compilation.

Published in: Procedings of Languages and Compilers for Parallel Computing 1994
pg: 283-299
Gagan Agrawal, Joel Saltz

Interprocedural Data Flow Based Optimizations for Compiling Irregular Applications

Published in: Procedings of Languages and Compilers for Parallel Computing
August 1995
Gagan Agrawal, Joel Saltz

Distributed Memory Compiler Design for Sparse Problems
Published in:IEEE Transactions on Computers vol. 44, no.6, 1995
J. Wu, Raja Das, Joel Saltz, H. Berryman, S. Hiranandani
This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accessed that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used effectively by a compiler to generate efficient codes in these situations. The compiler accepts as input a Fortran 77 program enhanced with specifications for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. A variety of performance results on the Intel iPSC/860 are presented.

Compiler Analysis for Irregular Problems in Fortran D
Published in:
Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing
New Haven Connecticut, August, 1992.

Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz.
We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for trac king and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situatio ns which allow runtime preprocessing overheads to be amortized. This dataflow aanalysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime pre-processing.

Slicing Analysis and Indirect Access to Distributed Arrays
Published in: Proceedings of the 6th Workshop on Languages and Compilers for Parallelism Workshop, August 1993, pg: 152-168.

Raja Das, Joel Saltz, Reinhard von Hanxleden.
An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.

Runtime Support and Compilation Methods for User-Specified Data Distributions
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox.
This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Submitted for Journal Publication
Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, Joel Saltz For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at runtime. In this paper, we discuss runtime support for data parallel programming in such an adaptive environment. Excecuting programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a runtime library to provide this support. We discuss how the runtime library can be used by compilers of HPF-like languages to generate code for an adaptive environment. We present performace results for Navier-Stokes solver on a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that ifthe number of processors is not varies frequently, the cost of data distribution is not significant when compared to the time required for the actual computation. Overall, our work establishes the feasability of compiling HPF for a network on non-dedicated workstations, which are likely to be an important resource for parallel programming in the future.
Interprocedural Partial Redundancy Elimination and Its Application to Distributed Memory Compilation
Published in: Conference on Programming Languange Design and Implementation '95
pg: 258-269 June 1995, La Jolla California
Gagan Agrawal, Joel Saltz, Raja Das
Partial Redundancy Elimination (PRE) is a general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redundant code elimination. In this paper, we address the problem of performing this optimization interprocedurally. We use interprocedural partial redundancy elimination for placement of communication and communication preprocessing statements while compiling for distributed memory parallel machines.

Interprocedural Compilation of Irregular Applications for Distributed Memory Machines
Published in: SuperComputing '95
Department of Computer Science Technical Reports CR-TR- 3447
Gagan Agrawal, Joel Saltz
Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications that have irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (PRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present three new interprocedural optimizations: placement of scatter routines, deletion of data structures and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D implemenation system as the nessecary infrastracture. We present experimental results from two codes compiled using our system to demonstrate the efficacy of the presented schemes.

Interprocedural Communication Optimizations for Distributed Memory Compilation.

Published in: Procedings of Languages and Compilers for Parallel Computing 1994
pg: 283-299
Gagan Agrawal, Joel Saltz

Interprocedural Data Flow Based Optimizations for Compiling Irregular Applications

Published in: Procedings of Languages and Compilers for Parallel Computing
August 1995
Gagan Agrawal, Joel Saltz

Distributed Memory Compiler Design for Sparse Problems
Published in:IEEE Transactions on Computers vol. 44, no.6, 1995
J. Wu, Raja Das, Joel Saltz, H. Berryman, S. Hiranandani
This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accessed that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used effectively by a compiler to generate efficient codes in these situations. The compiler accepts as input a Fortran 77 program enhanced with specifications for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. A variety of performance results on the Intel iPSC/860 are presented.

Compiler Analysis for Irregular Problems in Fortran D
Published in:
Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing
New Haven Connecticut, August, 1992.

Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz.
We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for trac king and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situatio ns which allow runtime preprocessing overheads to be amortized. This dataflow aanalysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime pre-processing.

Slicing Analysis and Indirect Access to Distributed Arrays
Published in: Proceedings of the 6th Workshop on Languages and Compilers for Parallelism Workshop, August 1993, pg: 152-168.

Raja Das, Joel Saltz, Reinhard von Hanxleden.
An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.

Runtime Support and Compilation Methods for User-Specified Data Distributions
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox.
This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
pg: 258-269 June 1995, La Jolla California Gagan Agrawal, Joel Saltz, Raja Das Partial Redundancy Elimination (PRE) is a general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redundant code elimination. In this paper, we address the problem of performing this optimization interprocedurally. We use interprocedural partial redundancy elimination for placement of communication and communication preprocessing statements while compiling for distributed memory parallel machines.
Interprocedural Compilation of Irregular Applications for Distributed Memory Machines
Published in: SuperComputing '95
Department of Computer Science Technical Reports CR-TR- 3447
Gagan Agrawal, Joel Saltz
Data parallel languages like High Performance Fortran (HPF) are emerging as the architecture independent mode of programming distributed memory parallel machines. In this paper, we present the interprocedural optimizations required for compiling applications that have irregular data access patterns, when coded in such data parallel languages. We have developed an Interprocedural Partial Redundancy Elimination (PRE) algorithm for optimized placement of runtime preprocessing routine and collective communication routines inserted for managing communication in such codes. We also present three new interprocedural optimizations: placement of scatter routines, deletion of data structures and use of coalescing and incremental routines. We then describe how program slicing can be used for further applying IPRE in more complex scenarios. We have done a preliminary implementation of the schemes presented here using the Fortran D implemenation system as the nessecary infrastracture. We present experimental results from two codes compiled using our system to demonstrate the efficacy of the presented schemes.

Interprocedural Communication Optimizations for Distributed Memory Compilation.

Published in: Procedings of Languages and Compilers for Parallel Computing 1994
pg: 283-299
Gagan Agrawal, Joel Saltz

Interprocedural Data Flow Based Optimizations for Compiling Irregular Applications

Published in: Procedings of Languages and Compilers for Parallel Computing
August 1995
Gagan Agrawal, Joel Saltz

Distributed Memory Compiler Design for Sparse Problems
Published in:IEEE Transactions on Computers vol. 44, no.6, 1995
J. Wu, Raja Das, Joel Saltz, H. Berryman, S. Hiranandani
This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accessed that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used effectively by a compiler to generate efficient codes in these situations. The compiler accepts as input a Fortran 77 program enhanced with specifications for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. A variety of performance results on the Intel iPSC/860 are presented.

Compiler Analysis for Irregular Problems in Fortran D
Published in:
Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing
New Haven Connecticut, August, 1992.

Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz.
We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for trac king and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situatio ns which allow runtime preprocessing overheads to be amortized. This dataflow aanalysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime pre-processing.

Slicing Analysis and Indirect Access to Distributed Arrays
Published in: Proceedings of the 6th Workshop on Languages and Compilers for Parallelism Workshop, August 1993, pg: 152-168.

Raja Das, Joel Saltz, Reinhard von Hanxleden.
An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.

Runtime Support and Compilation Methods for User-Specified Data Distributions
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox.
This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Interprocedural Communication Optimizations for Distributed Memory Compilation.

Published in: Procedings of Languages and Compilers for Parallel Computing 1994
pg: 283-299
Gagan Agrawal, Joel Saltz

Interprocedural Data Flow Based Optimizations for Compiling Irregular Applications

Published in: Procedings of Languages and Compilers for Parallel Computing
August 1995
Gagan Agrawal, Joel Saltz

Distributed Memory Compiler Design for Sparse Problems
Published in:IEEE Transactions on Computers vol. 44, no.6, 1995
J. Wu, Raja Das, Joel Saltz, H. Berryman, S. Hiranandani
This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accessed that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used effectively by a compiler to generate efficient codes in these situations. The compiler accepts as input a Fortran 77 program enhanced with specifications for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. A variety of performance results on the Intel iPSC/860 are presented.

Compiler Analysis for Irregular Problems in Fortran D
Published in:
Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing
New Haven Connecticut, August, 1992.

Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz.
We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for trac king and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situatio ns which allow runtime preprocessing overheads to be amortized. This dataflow aanalysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime pre-processing.

Slicing Analysis and Indirect Access to Distributed Arrays
Published in: Proceedings of the 6th Workshop on Languages and Compilers for Parallelism Workshop, August 1993, pg: 152-168.

Raja Das, Joel Saltz, Reinhard von Hanxleden.
An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.

Runtime Support and Compilation Methods for User-Specified Data Distributions
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox.
This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Published in: Procedings of Languages and Compilers for Parallel Computing 1994
pg: 283-299 Gagan Agrawal, Joel Saltz
Interprocedural Data Flow Based Optimizations for Compiling Irregular Applications

Published in: Procedings of Languages and Compilers for Parallel Computing
August 1995
Gagan Agrawal, Joel Saltz

Distributed Memory Compiler Design for Sparse Problems
Published in:IEEE Transactions on Computers vol. 44, no.6, 1995
J. Wu, Raja Das, Joel Saltz, H. Berryman, S. Hiranandani
This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accessed that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used effectively by a compiler to generate efficient codes in these situations. The compiler accepts as input a Fortran 77 program enhanced with specifications for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. A variety of performance results on the Intel iPSC/860 are presented.

Compiler Analysis for Irregular Problems in Fortran D
Published in:
Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing
New Haven Connecticut, August, 1992.

Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz.
We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for trac king and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situatio ns which allow runtime preprocessing overheads to be amortized. This dataflow aanalysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime pre-processing.

Slicing Analysis and Indirect Access to Distributed Arrays
Published in: Proceedings of the 6th Workshop on Languages and Compilers for Parallelism Workshop, August 1993, pg: 152-168.

Raja Das, Joel Saltz, Reinhard von Hanxleden.
An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.

Runtime Support and Compilation Methods for User-Specified Data Distributions
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox.
This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Published in: Procedings of Languages and Compilers for Parallel Computing
August 1995 Gagan Agrawal, Joel Saltz
Distributed Memory Compiler Design for Sparse Problems
Published in:IEEE Transactions on Computers vol. 44, no.6, 1995
J. Wu, Raja Das, Joel Saltz, H. Berryman, S. Hiranandani
This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accessed that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used effectively by a compiler to generate efficient codes in these situations. The compiler accepts as input a Fortran 77 program enhanced with specifications for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. A variety of performance results on the Intel iPSC/860 are presented.

Compiler Analysis for Irregular Problems in Fortran D
Published in:
Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing
New Haven Connecticut, August, 1992.

Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz.
We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for trac king and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situatio ns which allow runtime preprocessing overheads to be amortized. This dataflow aanalysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime pre-processing.

Slicing Analysis and Indirect Access to Distributed Arrays
Published in: Proceedings of the 6th Workshop on Languages and Compilers for Parallelism Workshop, August 1993, pg: 152-168.

Raja Das, Joel Saltz, Reinhard von Hanxleden.
An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.

Runtime Support and Compilation Methods for User-Specified Data Distributions
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox.
This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Compiler Analysis for Irregular Problems in Fortran D
Published in:
Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing
New Haven Connecticut, August, 1992.

Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz.
We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for trac king and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situatio ns which allow runtime preprocessing overheads to be amortized. This dataflow aanalysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime pre-processing.

Slicing Analysis and Indirect Access to Distributed Arrays
Published in: Proceedings of the 6th Workshop on Languages and Compilers for Parallelism Workshop, August 1993, pg: 152-168.

Raja Das, Joel Saltz, Reinhard von Hanxleden.
An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.

Runtime Support and Compilation Methods for User-Specified Data Distributions
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox.
This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing
New Haven Connecticut, August, 1992.
Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, Joel Saltz. We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for trac king and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situatio ns which allow runtime preprocessing overheads to be amortized. This dataflow aanalysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime pre-processing.
Slicing Analysis and Indirect Access to Distributed Arrays
Published in: Proceedings of the 6th Workshop on Languages and Compilers for Parallelism Workshop, August 1993, pg: 152-168.

Raja Das, Joel Saltz, Reinhard von Hanxleden.
An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.

Runtime Support and Compilation Methods for User-Specified Data Distributions
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox.
This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Raja Das, Joel Saltz, Reinhard von Hanxleden. An increasing fraction of the applications targeted by parallel computers makes heavy use of indirection arrays for indexing data arrays. Such irregular access patterns make it difficult for a compiler to generate efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through multiple levels of indirection. This paper presents a method for transforming programs using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can parallelize efficiently. a central concept of our algorithm is to perform program slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph eliminates redundant preprocessing and gives an ordering in which to compute the slices. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.
Runtime Support and Compilation Methods for User-Specified Data Distributions
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135

Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox.
This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Accepted for publication:
IEEE Transactions on Parallel and Distributed Systems
June, 1995
Published as: University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135
Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, Geoffrey Fox. This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.
Value-Based Distributions in Fortran D: A Preliminary Report
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.

Published: CRPC 1993, no. CRPC-TR93365-S

Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz.
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.

An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Accepted for publication (invited submission):
Journal of Programming Languages - Special Issue on Compiling and Run-Time Issues for Distributed Address Space Machines in Dec 1993.
Published: CRPC 1993, no. CRPC-TR93365-S
Reinhard v. Hanxleden, Ken Kennedy, Joel Saltz. Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple fuctions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer toconveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an oppurtunity to improve both inter- and intra- processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of index-based distributions as already present in Fortran D and HPF. This paper addresses the compilation issues involved and makes a quantitative comparison of index and value-based distributions for a Molecular Dynamics application.
An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.

Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Published in: IEEE Transactions on Parallel and Distributed Systems July 1995.
Gagan Agrawal, Alan Sussman, Joel Saltz. Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.
Compiler and Runtime Support for Structured and Block Structured Applications.
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587

Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.

Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Published in: Proceedings of Supercomputing '93
IEEE Computer Society Press, Pg: 578-587
Gagan Agrawal, Alan Sussman, Joel Saltz. Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.
Runtime-Compilation Techniques for Data Partitioning and Communication Schedule Reuse Published in: Proceedings for Supercomputing '93
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary.
In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Portland Oregon, IEEE Computer Socitery Press, pg: 361-370
Ravi Ponnusamy, Joel Saltz, Alok Choudhary. In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specifies mapping procedure via a set of compiler directives. The directives allow the user to use program arrays to describe graph connectivity, spatial location of array elements and computational load. The second is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed results from inspectors(e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.
Supporting Irregular Distributions in FORTRAN 90D/HPF Compilers
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox

Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Accepted for publication in: IEEE Parallel and Distributed Technology
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox
Compilers and Runtime Software for Scalable Processors
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.

Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Edited by: Joel Saltz, P. Mehrotra,
Published by Elsevier, The Netherlands, 1992.
Runtime Compilation for Multiprocessors
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991
Published in: Concurrency Practice and Experience
Volume 3:3, pg: 573-592 1991

[Applications | High Performance I/O | Compilers | Tools]

Questions about the system or webserver: webmaster@cs.umd.edu
Problems with publications homepage: wes@cs.umd.edu