Robert Bennet, Kelvin Bryant, Alan Sussman, Raja Das, Joel Saltz.
There has been a great deal of recent interest in paralell I/O. This paper discussed issues in the design and implementation of a portable I/O library designed to optimize the performance of multiprocessor architechtures that include multiple disks or disk arrays. The major emphasis of the paper is on optimizations that are made possible by the use of collective I/O, so that I/O requests for multiple processors can be combined to improve performance. Performance measurements from benchmarking our implementation of an I/O library that currently performs collective local optimizations , called Jovian, on three application templates are also presented.
Rahul Parulekar, Larry Davis, Rama Chellappa, J. Saltz, Alan Sussman, John Townshend.
We present the overall goals of our research program on the aplication of high performance computiong to remote sensing applications, specifically applications in land cover dynamics. This involves developing scalable and portable programs for a variety of image and map data processing applications, eventually integrated with new models for parallel I/O of large scale images and maps. After an overview of the multiblock PARTI run time support system, we explain extensions made to that system to support image processing applications, and then present an example involving multiresolution image processing. Results of running the parallel code on both a TMC CM5 and an Intel Paragon are discussed.
Chialin Chang, Alan Sussman, Joel Saltz.
Traditionally, applications executed on distributed memory architectures in single-program multiple-data (SPMD) mode use distributed (multi-dimensional) data arrays. Good performance has been achieved by applying runtime techniques to such applications executing in a loosely synchronous manner. However, many applications utilize language constructs such as pointers to synthesize dynamic complex data structures, such as linked lists, trees and graphs, with elements consisting of complex composite data types. Existing runtime systems that rely on global indicies cannot be used for these applications, as no global names or indicies are imposed upon the elements of these data structures.
A portable object-oriented runtime library is presented to support applications that use dynamic distributed data structures, including both arrays and pointer-based data structures. In particular, CHAOS++ deals with complex data types and pointer-based data structures by providing mobile objects and globally addressable objects . Preprocessing techniques are used to analyze communication patterns, and data exchange primitives are provided to carry out efficient data transfer. Performance results for applications taken from three distinct classes are also included to demonstrate the wide applicability of the runtime library.
R. Das, Y. Hwang, M. Uysal, J. Saltz, A. Sussman.
This paper describes a number of optimizations that can be used to support the efficient execution of irregular problems on distributed memory parallel machines. We describe software primitives that (1) coordinate interprocessor data movement, (2) manage the storage of, and access to, copies of off-processor data, (3) minimize interprocessor communication requirements and (4) support a shared name space. The performance of the primitives is characterized by examination of kernels from real applications and from a full implementation of a large unstructured adaptive application (the molecular dynamics code CHARMM).
A. Sussman, J. Saltz, R. Das, S. Gupta, D. Mavriplis, R. Ponnusamy.
This paper describes a set of primitives (PARTI) developed to efficiently execute unstructured and block structured problems on distributed memory parallel machines. We present experimental data from a 3-D unstructured Euler solver run on the Intel Touchstone Delta to demonstrate the usefulness of out methods.
Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. One important communication primitive supported by this library is the regular ssection move, which copies a regular section from a distributed array to another distributed array, potentially involving changes in offset, stride and rotation of dimensions. In this paper we discuss the regular section analysis which is required for efficiently generating schedules for this kind of communication. We discuss the details of the analysis required when the distributions of arrays may be block or cyclic.
Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an efficient and machine independent manner. In this paper we present how this runtime library can be integrated with compilers for High Performance Fortran (HPF) style parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University and provide experimental data on a block structured Navier-Stokes solver template and a small multigrid example parallelized using this compiler and run on an Intel iPSC/860. We show that the compiler parallelized code performs within 20% of the code parallelized by inserting calls to the runtime library manually.
Gagan Agrawal, Alan Sussman, Joel Saltz.
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion. We have designed and implemented a runtime library which can be used to port these applications on distributed memory machines. The library is currently implemented on several different systems. Since the design of the library is machine independent, it can be easily ported on other distributed memory machines and environments which support message passing. To further ease the task of application programmers, we have developed methods for integrating this runtime library with compilers for HPF-like parallel programming languages. We discuss how we have integrated this runtime library with the Fortran 90D compiler being developed at Syracuse University. We present experimental results to demonstrate the efficacy of our approach. We have experimented with a multiblock Navier-Stokes solver template and a multigrid code. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20% of the code parallelized by manually inserting calls to the runtime library.
Alan Sussman, Gagan Agrawal, Joel Saltz.
There exists a large class of scientific applications that are composed of irregularly coupled regular mesh (ICRM) computations. These problems are often referred to as block structured or multiblock, problems and include the Block Structured Navier-Stokes solver developed as NASA Langley called TLNS3D.
Primitives are presented that are designed to help users to efficiently program such problems on distributed memory machines. These primitives are also designed for use by compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and received messages are automatically generated. The primitives are also useful for parallelizing regular computations, since block structured computations also require all the runtime support necessary for regular computations.