- Runtime library
- For parallelizing multiple structured grid (e.g. multiblock and multigrid) applications
written in Fortran and C
- Target Systems
- Any distributed memory system that supports message passing
(currently implemented on Intel iPSC/860 and Paragon, IBM SP1/2, TMC CM5, network of
workstations via PVM)
- Implementation
- C + message passing
- Functionality
- The Multiblock Parti is used to produce an SPMD parallel program, and provides routines
that allow an application programmer or a compiler to
- Lay out distributed data in a flexible way, to enable good load balancing and minimize
interprocessor communication,
- Give high level specifications for performing data movement, and
- Distribute the computation across the processors.
Two types of communication are required in multiple structured grid applications.
Inter-block communication is required because of boundary conditions between blocks (in
multiblock codes) and restrictions and prolongations between grids at different levels of
resolution (in multigrid codes). Since the data that needs to be communicated is always a
regular section of an array, this can be handled by primitives for regular section move. A
regular section move copies a regular section of one distributed array into regular
section of another distributed array, potentially involving changes of offset, stride and
index permutation. Intra-block communication is required because of partitioning of blocks
or grids across processors. The data access pattern in the computation within a block or
grid is regular. This implies that the interaction between grid points is restricted to
nearby neighbors. The interpolation required during the prolongation step in multigrid
codes also involves interaction among the neighboring array elements. Such communication
is handled by allocation of extra space at the beginning and end of each array dimension
on each processor. These extra elements are called overlap , or ghost ,
cells. Depending upon the data access pattern in a loop, the required data is copied from
other processors and is stored in the overlap cells.
In the runtime system, communication is performed in two phases. First, a subroutine is
called to build a communication schedule that describes the required data motion,
and then another subroutine is called to perform the data motion (sends and receives on a
distributed memory parallel machine) using a previously built schedule. Such an
arrangement allows a schedule to be be used multiple times in an iterative algorithm.
- Library Interface
- The library provides routines for using the distributed array descriptors for address
translation and for interprocessor communication (regular section moves and filling ghost
cells) using communication schedules.
- Distributed Array Descriptors
- The library defines a descriptor in each processor that both describes the global
structure of a distributed array, and also caches information about the portion of the
array local to a processor. The descriptor employs a Fortran D style decomposition, which
is similar to an HPF template. The definition of the data structure in C is as follows:
typedef struct dArray_rec {
int nDims;
int *ghostCells; /* number of internal ghost cells in each dim */
int *dimVecG; /* total size of each dim */
int *dimVecL; /* local size of each dim for central pieces*/
int *dimVecL_L; /* local size of each dim for left most piece */
int *dimVecL_R; /* local size of each dim for right most piece */
int *g_index_low; /* lower global index on my processor */
int *g_index_hi; /* upper global index on my processor */
int *local_size; /* Local size on my processor */
int *decompDim; /* dim of decomp to which each dim aligned
* defines how array aligned to decomp
* used with decomp to initialize decompPosn
* and dimDist
*/
int *decompPosn; /* coordinate position of processor in
the decomposition to which it's bound */
/* in the multi-dimensional decomposition
space */
char *dimDist; /* type of distribution in each dim */
struct decomp_rec *decomp; /* decomposition to which processor bound */
} DARRAY;
/* this is the structure of a decomposition (in HPF, a template) */
typedef struct decomp_rec {
int nDims, nProcs, baseProc;
int *dimVec; /* size of decomposition in each dim */
int *dimProc; /* num processors allocated to each dim */
char *dimDist; /* type of distribution in each dim */
} DECOMP;
- Status
- Completed.
- Availability
- The library is available at Maryland via ftp; no
restrictions.
- Reusability
- Much of the code for doing regular section analysis and building communication schedules
for regular data distributions has been reused in other libraries (e.g. Jovian parallel
I/O library).
- Documentation
- The documentation is available as part of the library distribution, or directly via ftp
Maintained by Alan Sussman
|