Click here to remove abstracts Compiler and Runtime
Support for Programming in Adaptive Parallel Environments
Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, and Joel Saltz
Scientific Programming, January 1997
University of Maryland Technical Report CS-TR-3510 and UMIACS-TR-95-83
For better utilization of computing resources, it is important to consider parallel
programming environments in which the number of available processors varies at runtime. In
this paper, we discuss runtime support for data parallel programming in such an adaptive
environment. Executing programs in an adaptive environment requires redistributing data
when the number of processors changes, and also requires determining new loop bounds and
communication patterns for the new set of processors. We have developed a runtime library
to provide this support. We discuss how the runtime library can be used by compilers of
HPF-like languages to generate code for an adaptive environment. We present performance
results for Navier-Stokes solver on a multigrid template run on a network of workstations
and an IBM SP-2. Our experiments show that if the number of processors is not varies
frequently, the cost of data distribution is not significant when compared to the time
required for the actual computation. Overall, our work establishes the feasibility of
compiling HPF for a network on non-dedicated workstations, which are likely to be an
important resource for parallel programming in the future.
Interprocedural
Compilation of Irregular Applications for Distributed Memory Machines
Gagan Agrawal and Joel Saltz
Proceedings of Supercomputing 95, San Diego, California, December 3-8,
1995
Department of Computer Science Technical Reports CR-TR- 3447
Data parallel languages like High Performance Fortran (HPF) are emerging as the
architecture independent mode of programming distributed memory parallel machines. In this
paper, we present the interprocedural optimizations required for compiling applications
that have irregular data access pattern, when coded in such data parallel languages. We
have developed an Interprocedural Partial Redundancy Elimination (PRE) algorithm for
optimized placement of runtime preprocessing routine and collective communication routines
inserted for managing communication in such codes. We also present three new
interprocedural optimizations: placement of scatter routines, deletion of data structures
and use of coalescing and incremental routines. We then describe how program slicing can
be used for further applying IPRE in more complex scenarios. We have done a preliminary
implementation of the schemes presented here using the Fortran D implementation system as
the necessary infrastructure. We present experimental results from two codes compiled
using our system to demonstrate the efficacy of the presented schemes.
Interprocedural Data Flow Based Optimizations for Compiling
Irregular Applications
Gagan Agrawal and Joel Saltz
Proceedings of Languages and Compilers for Parallel Computing, August
1995
Runtime
Support and Compilation Methods for User-Specified Data Distributions
Ravi Ponnusamy, Joel Saltz, Alok Choudhary, Yuan-Shin Hwang, and Geoffrey Fox
IEEE Transactions on Parallel and Distributed
Memory Systems, pages 815-831, Volume 6, Number 8, August 1995
University of Maryland Technical Report CS-TR-3194 and UMIACS-TR-93-135
This paper describes two new ideas by which an HPF compiler can deal with irregular
computations effectively. The first mechanism invokes a user specified mapping procedure
via a set of compiler directives. The directives allow use of program arrays to describe
graph connectivity, spatial location of array elements and computational load. The second
mechanism is a simple conservative method that in many cases enables a compiler to
recognize that it is possible to reuse previously computed information from inspectors
(e.g. communication schedules, loop iteration partitions, information that associates
off-processor data copies with on-processor buffer locations). We present performance
results for these mechanisms from a Fortran 90D compiler implementation.
An
Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block
Structured Applications
Gagan Agrawal, Alan Sussman, and Joel Saltz
IEEE Parallel and Distributed Systems, pages 747-754, Volume. 6, Number 7, July
1995
Scientific and engineering applications often involve structured meshes. These meshes
may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or
irregularly coupled regular mesh problems). In this paper, we present a combined runtime
and compile-time approach for parallelizing these applications on distributed memory
parallel machines in an efficient and machine-independent fashion. We have designed and
implemented a runtime library which can be used to port these applications on distributed
memory machines. The library is currently implemented on several different systems. Since
the design of the library is machine independent, it can be easily ported on other
distributed memory machines and environments which support message passing. To further
ease the task of application programmers, we have developed methods for integrating this
runtime library with compilers for HPF-like parallel programming languages. We discuss how
we have integrated this runtime library with the Fortran 90D compiler being developed at
Syracuse University. We present experimental results to demonstrate the efficacy of our
approach. We have experimented with a multiblock Navier-Stokes solver template and a
multigrid code. Our experimental results show that our primitives have low runtime
communication overheads. Further, the compiler parallelized codes perform within 20% of
the code parallelized by manually inserting calls to the runtime library.
Interprocedural
Partial Redundancy Elimination and Its Application to Distributed Memory Compilation
Gagan Agrawal, Joel Saltz, and Raja Das
Conference on Programming Language Design and Implementation '95,
pages 258-269, La Jolla, California, June 1995
Partial Redundancy Elimination (PRE) is a general scheme for suppressing partial
redundancies which encompasses traditional optimizations like loop invariant code motion
and redundant code elimination. In this paper, we address the problem of performing this
optimization interprocedurally. We use interprocedural partial redundancy elimination for
placement of communication and communication preprocessing statements while compiling for
distributed memory parallel machines.
Distributed
Memory Compiler Design for Sparse Problems
J. Wu, Raja Das, Joel Saltz, H. Berryman, and S. Hiranandani
IEEE Transactions on Computers, pages 737-753, Volume 44, Number 6,
June 1995
This paper addresses the issue of compiling concurrent loop nests in the presence of
complicated array references and irregularly distributed arrays. Arrays accessed within
loops may contain accessed that make it impossible to precisely determine the reference
pattern at compile time. This paper proposes a run time support mechanism that is used
effectively by a compiler to generate efficient codes in these situations. The compiler
accepts as input a Fortran 77 program enhanced with specifications for distributing data,
and outputs a message passing program that runs on the nodes of a distributed memory
machine. The runtime support for the compiler consists of a library of primitives designed
to support irregular patterns of distributed array accesses and irregularly distributed
array partitions. A variety of performance results on the Intel iPSC/860 are presented.
Supporting
Irregular Distributions in FORTRAN 90D/HPF Compilers
Ravi Ponnusamy, Yuan-Shin Hwang, Joel Saltz, Alok Choudhary, and Geoffrey Fox
IEEE Parallel & Distributed Technology, Spring 1995 (revised version)
University of Maryland: Department of Computer Science and UMIACS Technical Reports
CR-TR-3268 and UMIACS-TR-94-57
Interprocedural Communication Optimizations for Distributed
Memory Compilation
Gagan Agrawal and Joel Saltz
Proceedings of Languages and Compilers for Parallel Computing, pages
283-299, 1994
Value-Based
Distributions in Fortran D: A Preliminary Report
Reinhard v. Hanxleden, Ken Kennedy, and Joel Saltz
Journal of Programming Languages - Special Issue on Compiling and Run-Time
Issues for Distributed Address Space Machines, Volume 2, Number 3, December 1993
CRPC 1993, Number CRPC-TR93365-S
Compiling irregular applications written in a data-parallel, High Performance
Fortran-like language presents a challenging problem of growing importance. One principal
difficulty with irregular problems is the general lack of access locality when
distributing data and computation naively, based on simple functions of array and
iteration indices. To address this problem, we extend classical, index-based data
distributions to value-based distributions. This type of distribution allows the
programmer to conveniently express for example the locality characteristics of an
underlying physical problem. This gives the compiler an opportunity to improve both inter-
and intra- processor locality, resulting in better performance and scalability, even when
this locality is not inherent in the original data structures and access patterns. This
paper reports on the experience gained from implementing value-based distributions in a
Fortran 77D compiler prototype. We propose a natural extension of index-based
distributions as already present in Fortran D and HPF. This paper addresses the
compilation issues involved and makes a quantitative comparison of index and value-based
distributions for a Molecular Dynamics application.
Compiler
and Runtime Support for Structured and Block Structured Applications
Gagan Agrawal, Alan Sussman, and Joel Saltz
Proceedings of Supercomputing '93, pages 578-587, Portland Oregon, November
15-19, 1993
IEEE Computer Society Press
Scientific and engineering applications often involve structured meshes. These meshes
may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called
Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library
for parallelizing this general class of applications on distributed memory parallel
machines in an efficient and machine independent manner. In this paper we present how this
runtime library can be integrated with compilers for High Performance Fortran (HPF) style
parallel programming languages. We discuss how we have integrated this runtime library
with the Fortran 90D compiler being developed at Syracuse University and provide
experimental data on a block structured Navier-Stokes solver template and a small
multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show
that the compiler parallelized code performs within 20% of the code parallelized by
inserting calls to the runtime library manually.
Runtime-Compilation
Techniques for Data Partitioning and Communication Schedule Reuse
Ravi Ponnusamy, Joel Saltz, and Alok Choudhary
Proceedings Supercomputing '93, pages 361-370, Portland, Oregon, November
15-19, 1993
IEEE Computer Society Press
In this paper, we describe two new ideas by which HPF compiler can deal with irregular
computations effectively. The first mechanism invokes a user specifies mapping procedure
via a set of compiler directives. The directives allow the user to use program arrays to
describe graph connectivity, spatial location of array elements and computational load.
The second is a simple conservative method that in many cases enables a compiler to
recognize that it is possible to reuse previously computed results from inspectors(e.g.
communication schedules, loop iteration partitions, information that associates
off-processor data copies with on-processor buffer locations). We present performance
results for these mechanisms from a Fortran 90D compiler implementation.
Slicing
Analysis and Indirect Access to Distributed Arrays
Raja Das, Joel Saltz, and Reinhard von Hanxleden
Proceedings of the 6th Workshop on Languages and Compilers for Parallel
Computing, pages 152-168, Portland Oregon, August 12-14, 1993
An increasing fraction of the applications targeted by parallel computers makes heavy
use of indirection arrays for indexing data arrays. Such irregular access patterns make it
difficult for a compiler to generate efficient parallel code. A limitation of existing
techniques addressing this problem is that they are only applicable for a single level of
indirection. How ever, many codes using sparse data structures across their data through
multiple levels of indirection. This paper presents a method for transforming programs
using multiple levels of indirection into programs with at most one level of indirection,
thereby broadening the range of applications that a compiler can parallelize efficiently.
a central concept of our algorithm is to perform program slicing on the subscript
expressions of the indirect array accesses. Such slices peel off the levels of
indirection, one by one, and create opportunities for aggregated data prefetching on
between. A slice graph eliminates redundant preprocessing and gives an ordering in which
to compute the slices. We present our work in the context of High Performance Fortran, an
implementation in Fortran D prototype compiler is in progress.
Compiler
Analysis for Irregular Problems in Fortran D
Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, and Joel Saltz.
Proceedings of 5th Workshop on Languages and Compilers for Parallel Computing, pages
97-111, New Haven, Connecticut, August 3-5, 1992
We developed a dataflow framework which provides a basis for a rigorously defining
strategies to make use of runtime preprocessing methods for distributed memory
multiprocessors. In many programs, several loops access the same of f-processor memory
locations. Our runtime support gives as a mechanism for tracking and reusing copies of
off-processor data. A key aspect of our compiler analysis strategy is to determine when it
is safe to reuse copies of off-process or data. Another crucial function of the compiler
analysis is to identify situations which allow runtime preprocessing overheads to be
amortized. This dataflow analysis will make it possible to effectively use the results of
interprocedural analysis in our efforts to reduce interprocessor communication and the
need for runtime pre-processing.
Compilers and Runtime Software for Scalable Processors
Edited by: J. Saltz and P. Mehrotra
Published by Elsevier, The Netherlands, 1992
Runtime Compilation for Multiprocessors
Concurrency Practice and Experience
Volume 3:3, pages: 573-592, 1991
[Applications
| High Performance I/O | Compilers | Tools]
|