Two types of communication are required in multiple structured grid applications. Inter-block communication is required because of boundary conditions between blocks (in multiblock codes) and restrictions and prolongations between grids at different levels of resolution (in multigrid codes). Since the data that needs to be communicated is always a regular section of an array, this can be handled by primitives for regular section move. A regular section move copies a regular section of one distributed array into regular section of another distributed array, potentially involving changes of offset, stride and index permutation. Intra-block communication is required because of partitioning of blocks or grids across processors. The data access pattern in the computation within a block or grid is regular. This implies that the interaction between grid points is restricted to nearby neighbors. The interpolation required during the prolongation step in multigrid codes also involves interaction among the neighboring array elements. Such communication is handled by allocation of extra space at the beginning and end of each array dimension on each processor. These extra elements are called overlap , or ghost , cells. Depending upon the data access pattern in a loop, the required data is copied from other processors and is stored in the overlap cells.
In the runtime system, communication is performed in two phases. First, a subroutine is called to build a communication schedule that describes the required data motion, and then another subroutine is called to perform the data motion (sends and receives on a distributed memory parallel machine) using a previously built schedule. Such an arrangement allows a schedule to be be used multiple times in an iterative algorithm.
typedef struct dArray_rec { int nDims; int *ghostCells; /* number of internal ghost cells in each dim */ int *dimVecG; /* total size of each dim */ int *dimVecL; /* local size of each dim for central pieces*/ int *dimVecL_L; /* local size of each dim for left most piece */ int *dimVecL_R; /* local size of each dim for right most piece */ int *g_index_low; /* lower global index on my processor */ int *g_index_hi; /* upper global index on my processor */ int *local_size; /* Local size on my processor */ int *decompDim; /* dim of decomp to which each dim aligned * defines how array aligned to decomp * used with decomp to initialize decompPosn * and dimDist */ int *decompPosn; /* coordinate position of processor in the decomposition to which it's bound */ /* in the multi-dimensional decomposition space */ char *dimDist; /* type of distribution in each dim */ struct decomp_rec *decomp; /* decomposition to which processor bound */ } DARRAY; /* this is the structure of a decomposition (in HPF, a template) */ typedef struct decomp_rec { int nDims, nProcs, baseProc; int *dimVec; /* size of decomposition in each dim */ int *dimProc; /* num processors allocated to each dim */ char *dimDist; /* type of distribution in each dim */ } DECOMP;