R. P. Nance, R. G. Wilmouth, B. Moon, H. A. Hassan, J. Saltz
AIAA Journal of Thermophysics and Heat Transfer, Pages 471-477, Volume
9, Number. 3, July 1995
University of Maryland Technical Report: CR-TR-3425 and UMIACS-TR-95-25
This paper describes a parallel implementation of the direct simulation Monte Carlo
method. Runtime library support is usedfor scheduling and execution of communication
between nodes, and domain decomposition is perfonned dynamically to maintain a favorable
load balance. Perfonnance tests are conducted using the code to evaluate various remapping
and remapping- interval policies, and it is shown that a one-dimensional
chain-partitioning method works bestfor the problems considered. The parallel code is then
used to simulate the Math 20 nitrogen flow over a finite-thickness flat plate. It will be
shown that the parallel algorithm produces results which are very similar to previous DSMC
results, despite the increased resolution available. However, it yields
significantlyfaster execution times than the scalar code, as well as very good
load-balance and scalability characteristics.
B. Moon, M. Uysal, J. Saltz
Proceedings of the Ninth International Parallel Processing Symposium,
Pages 812-819, April 1995
Current research in parallel programming is focused on closing the gap between globally
indexed algorithms and the separate address spaces of processors on distributed memory
multicomputers. A set of index translation schemes have been implemented as a part of
CHAOS runtime support library, so that the library functions can be used for implementing
a global index space across a collection of separate local index spaces. These schemes
include two software-cached translation schemes aimed at adaptive irregular problems as
well as a distributed translation table technique for statically irregular problems. To
evaluate and demonstrate the efficiency of the software-cached translation schemes,
experiments have been performed with as adaptively irregular loop kernel and a
full-fledged 3D DSMC code from NASA Langely on the Intel Paragon and Cray T3D. This paper
also discusses and analyzes the operational conditions under which each scheme can produce
optimal performance.
Bongki Moon and Joel Saltz.
Proceedings of the Scalable High Performance Computing Conference 1994,
Pages 176-183, May 1994
In highly adaptive irregular problems such as many Particle-In-Cell (PIC) codes and Direct
Simulation Monte Carlo (DSMC) codes, data access patterns may vary from time step to time
step. This fluctuation may hinder efficient utilization of distributed memory parallel
computers because of the resulting overhead for data redistribution and dynamic load
balancing. This may hinder efficient utilization of runtime pre-processing because the
pre-processing requirements are sensitive to perturbations in the data access patterns. To
efficiently parallelize such adaptive irregular problems on distributed memory parallel
computers, several issues such as effective methods for domain partitioning, efficient
index dereferencing and fast data transportation must be addressed. This paper presents
efficient runtime support methods for such problems. These new runtime support primitives
have recently been implemented and added to the CHAOS library. A new domain partitioning
algorithm is introduced A simple one-dimensional domain partitioning method is implemented
and compared with unstructured mesh partitioners such as recursive coordinate bisection
and recursive inertial bisection. A remapping decision policy has been investigated for
dynamic load balancing on 3-dimensional DSMC codes. Performance results are presented.