Raja Das, Joel Saltz, Reinhard v. Hanxleden.
An increasing fraction of the applications targeted by paralle l computers makes heavy use of indirection arrays for indexing data arrays. Suc h irregular access patterns make it difficult for a compiler to generat e efficient parallel code. A limitation of existing techniques addressing this problem is that they are only applicable for a single level of indirection. How ever, many codes using sparse data structures across their data through mult iple levels of indirection.
This paper presents a method for transforming progr ams using multiple levels of indirection into programs with at most one level of indirection, thereby broadening the range of applications that a compiler can p arallelize efficiently. a central concept of our algorithm is to perform p rogram slicing on the subscript expressions of the indirect array accesses. Such slices peel off the levels of indirection, one by one, and create opportunities for aggregated data prefetching on between. A slice graph elim inates redundant preprocessing and gives an ordering in which to compute the sli ces. We present our work in the context of High Performance Fortran, an impleme ntation in Fortran D prototype compiler is in progress.
R. v. Hanxleden, K. Kennedy, C. Koelbel, R. Das, J. Saltz.
We developed a dataflow framework which provides a basis for a rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors.
In many programs, several loops access the same of f-processor memory locations. Our runtime support gives as a mechanism for trac king and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-process or data. Another crucial function of the compiler analysis is to identify situatio ns which allow runtime preprocessing overheads to be amortized. This dataflow aanalysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime preprocessing.