Examples of distributed vector usage
Vector 0: Simple vector initialization
This example show several basic functionalities of the distributed vector vector_dist.
The distributed vector is a set of particles in an N-dimensional space.
In this example it is shown how to:
- Initialize the library
- Create a
Boxthat defines the domain - An array that defines the boundary conditions
- A
Ghostobject that will define the extension of the ghost part in physical units
The source code of the example Vector/0_simple/main.cpp. The full doxygen documentation Vector_0_simple.
See also our video lectures dedicated to this topic Video 1, Video 2
Example 1: Vector Ghost layer

This example shows the properties of ghost_get and ghost_put - functions
that synchronize the ghosts layer for a distributed vector vector_dist.
In this example it is shown how to:
- Iterate
vector_distviagetDomainIterator - Redistribute the particles in
vector_distaccording to the underlying domain decomposition viamap - Synchronize the ghost layers in the standard way
NO_POSITION,KEEP_PROPERTIESandSKIP_LABELLINGoptions of theghost_getfunction- Propagate the data from ghost to non-ghost particles via
ghost_put
The source code of the example Vector/1_ghost_get_put/main.cpp. The full doxygen documentation Vector_1_ghost_get.
Example 2: Cell-lists and Verlet-lists
This example shows the properties of ghost_get and ghost_put - functions
that synchronize the ghosts layer for a distributed vector vector_dist.
Key points:
- How to utilize the grid iterator
getGridIterator, to create a grid-like particle domain - Two principal types of fast neighbor lists: cell-list
getCellListand Verlet-listgetVerletfor a distributed vectorvector_dist CELL_MEMFAST,CELL_MEMBALandCELL_MEMMWvariations of the cell-list, with different memory requirements and computations costs- Iterating through the neighboring particles via
getNNIteratorof cell-list and Verlet-list
The source code of the example Vector/1_celllist/main.cpp. The full doxygen documentation Vector_1_celllist.
Example 3: GPU vector

This example shows how to create a vector data-structure with vector_dist_gpu to access a vector_dist-alike data structure from GPU accelerated computing code.
Key points:
- How to convert the source code from using
vector_disttovector_dist_gpuand how it influences the memory layout of the data structure - Oflloading particle position
hostToDevicePosand particle propertyhostToDevicePropdata from CPU to GPU - Lanuching a CUDA-like kernel with
CUDA_LAUNCHand automatic subdivision of a computation loop into workgroups/threads viagetDomainIteratorGPUor manually specifying the number of workgroups and the number of threads in a workgroup - Passing the data-structures to a CUDA-like kernel code via
toKernel - How to use
mapwith the optionRUN_DEVICEto redistribute the particles directly on GPU, andghost_getwithRUN_DEVICEoption to fill ghost particles directly on GPU - How to detect and utilize RDMA on GPU to get the support of CUDA-aware MPI implementation to work directly with device pointers in communication subroutines
The source code of the example Vector/1_gpu_first_step/main.cpp. The full doxygen documentation Vector_1_gpu_first_step.
Example 4: HDF5 Save and load
This example show how to save and load a vector to/from the parallel file format HDF5.
Key points:
- How to save the position/property information of the particles
vector_distinto an .hdf5 file viasave - How to load the position/property information of the particles
vector_distfrom an .hdf5 file viaload
The source code of the example Vector/1_HDF5_save_load/main.cpp. The full doxygen documentation Vector_1_HDF5.
Example 5: Vector expressions
This example shows how to use vector expressions to apply mathematical operations and functions on particles.
The example also shows to create a point-wise applicable function
where $A_q$ is the property $A$ of particle $q$, $x_p, x_q$ are positions of particles $p, q$ correspondingly.
Key points:
- Setting an alias for particle properties via
getVofparticle_distto be used within an expression - Composing expressions with scalar particle properties
- Composing expressions with vector particle properties. The expressions are 1) applied point-wise; 2) used to create a component-wise multiplication via
*; 3) scalar product viapmul; 4) compute a normnorm; 5) perform square root operationsqrt - Converting
Pointobject into an expressiongetVExprto be used with vector expressions - Utilizing
operator=and the functionassignto assing singular or multiple particle properties per iteration through particles - Constructing expressions with
applyKernel_inandapplyKernel_in_gento create kernel functions called at particle locations for all the neighboring particles, e.g. as in SPH
The source code of the example Vector/2_expressions/main.cpp. The full doxygen documentation Vector_2_expression.
Example 6: Molecular Dynamics with Lennard-Jones potential (Cell-List)
This example shows a simple Lennard-Jones molecular dynamics simulation in a stable regime.
The particles interact with the interaction potential
$A_q$ is the property $A$ of particle $q$, $x_p, x_q$ are positions of particles $p, q$ correspondingly, $\sigma$ is a free parameter, $r$ is the distance between the particles.
Key points:
- Reusing memory allocated with
getCellListfor the subsequent iterations viaupdateCellList - Utilizing
CELL_MEMBALwithgetCellListto minimize memory footprint - Performing 10000 time steps using symplectic Verlet integrator
- Producing a time-total energy 2D plot with
GoogleChart
The source code of the example Vector/3_molecular_dynamic/main.cpp. The full doxygen documentation Vector_3_md_dyn.
Example 7: Molecular Dynamics with Lennard-Jones potential (Verlet-List) [1/3]
The physical model in the example is identical to Molecular Dynamics with Lennard-Jones potential (Cell-List). Please refer to it for futher details. Key points:
- Due to the computational cost of updating Verlet-list, r_cut + skin cutoff distance is used
such that the Verlet-list has to be updated once in 10 iterations via
updateVerlet - As Verlet-lists are constructed based on local particle id's, which would be invalidated by
maporghost_get,mapis called every 10 time-step, andghost_getis used withSKIP_LABELLINGoption to keep old indices every iteration
The source code of the example Vector/3_molecular_dynamic/main_vl.cpp. The full doxygen documentation Vector_3_md_vl.
Example 7: Molecular Dynamics with Lennard-Jones potential (Symmetric Verlet-List) [2/3]
This example is an extension to Molecular Dynamics with Lennard-Jones potential (Verlet-List). It shows how better performance can be achieved for symmetric interaction models with symmetric Verlet-list compared to the standard Verlet-list. Key points:
- Computing the interaction for particles p, q only once
- Propagate the data from potentially ghost particles q to non-ghost particles in their corresponding domains via
ghost_putwith the operationadd_ - Changing the prefactor in the subroutine of calculating the total energy as every pair of particles is visited once (as compared to two times before)
- Updating Verlet-list once in 10 iterations via
updateVerletwith 'VL_SYMMETRIC' flag
The source code of the example Vector/5_molecular_dynamic_sym/main.cpp. The full doxygen documentation Vector_5_md_vl_sym.
Example 7: Molecular Dynamics with Lennard-Jones potential (Symmetric CRS Verlet-List) [3/3]
This example is an extension to Molecular Dynamics with Lennard-Jones potential (Verlet-List) and Molecular Dynamics with Lennard-Jones potential (Verlet-List). It shows how better performance can be achieved for symmetric interaction models with symmetric Verlet-list compared to the standard Verlet-list. Key points:
- Computing the interaction for particles p, q only once
- Propagate the data from potentially ghost particles q to non-ghost particles in their corresponding domains via
ghost_putwith the operationadd_ - Changing the prefactor in the subroutine of calculating the total energy as every pair of particles is visited once (as compared to two times before)
- Updating Verlet-list once in 10 iterations via
updateVerletwith 'VL_SYMMETRIC' flag
The source code of the example Vector/5_molecular_dynamic_sym/main.cpp. The full doxygen documentation Vector_5_md_vl_sym.
Example 8: Molecular Dynamics with Lennard-Jones potential (GPU)
The physical model in the example is identical to Molecular Dynamics with Lennard-Jones potential (Cell-List) and Molecular Dynamics with Lennard-Jones potential (Verlet-List). Please refer to those for futher details. Key points:
- To get the particle index inside a CUDA-like kernel
GET_PARTICLEmacro is used to avoid overflow in the constructionblockIdx.x * blockDim.x + threadIdx.x - A primitive reduction function
reduce_localwith the operation_add_is used to get the total energy by summing energies of all particles.
The source code of the example Vector/3_molecular_dynamic_gpu/main_vl.cpp. The full doxygen documentation Vector_3_md_dyn_gpu.
Example 9: Molecular Dynamics with Lennard-Jones potential (GPU optimized)
The physical model in the example is identical to Molecular Dynamics with Lennard-Jones potential (Cell-List), Molecular Dynamics with Lennard-Jones potential (Verlet-List) and is based on Molecular Dynamics with Lennard-Jones potential (GPU). Please refer to those for futher details. Key points:
- To achieve coalesced memory access on GPU and to reduce cache load the particle indices are stored in cell-list in a sorted manner, i.e. particles with neighboring indices are located in the same cell. This is achieved by assigning new particle indices and storing them temporarily in
vector_distby passing the parameterCL_GPU_REORDERto the methodgetCellListGPUofvector_dist. By default the method copies particle positions and no properties to the reordered vector. To copy properties as well they are passed as a template parameter<...>of the methodgetCellListGPU. - The cell-list built on top of the reordered version of
vector_distusesget_sortinstead ofgetto get a neighbor particle index when iterating with the cell-list neighborhood iteratorgetNNIteratorBox - The sorted version of
vector_disthave to be reordered to the original order once the processing is done viarestoreOrderofvector_dist. By default the method copies particle positions and no properties to the original unordered vector. To copy properties as well they are passed as a template parameter<...>of the methodrestoreOrder.
The source code of the example Vector/3_molecular_dynamic_gpu_opt/main_vl.cpp. The full doxygen documentation Vector_3_md_dyn_gpu_opt.
Example 10: Molecular Dynamics with Lennard-Jones potential (Particle reordering)
The physical model in the example is identical to Molecular Dynamics with Lennard-Jones potential (Cell-List), Molecular Dynamics with Lennard-Jones potential (Verlet-List). The example shows how reordering the data can significantly reduce the computational running time. Key points:
- The particles inside
vector_distare reordered viareorderfollowing a Hilbert curve of order m (here m=5) passing through the cells of $2^m \times 2^m \times 2^m$ (here, in 3D) cell-list - It is shown that the frequency of reordering depends on the mobility of particles
- Wall clock time is measured of the function
calc_forceutilizing the objecttimerviastartandstop
The source code of the example Vector/4_reorder/main_data_ord.cpp. The full doxygen documentation Vector_4_reo.
Example 11: Molecular Dynamics with Lennard-Jones potential (Cell-list reordering)
The physical model in the example is identical to Molecular Dynamics with Lennard-Jones potential (Cell-List), Molecular Dynamics with Lennard-Jones potential (Verlet-List). The example shows how reordering the data can significantly reduce the computational running time. Key points:
- The cell-list cells are iterated following a Hilbert curve instead of a normal left-to-right bottom-to-top cell iteration (in 2D). The function
getCellList_hilbofvector_distis used instead ofgetCellList - It is shown that for static or slowly moving particles a speedup of up to 10% could be achieved
The source code of the example Vector/4_reorder/main_comp_ord.cpp. The full doxygen documentation Vector_4_comp_reo.
Example 12: Complex properties [1/2]
This example shows how to use complex properties in the distributed vector vector_dist
Key points:
- Creating a distributed vector with particle properties: scalar, vector
float[3],Point, list of floatopenfpm::vector<float>, list of custom structuresopenfpm::vector<A>(whereAis a user-defined type with no pointers), vector of vectorsopenfpm::vector<openfpm::vector<float>>> - Redistribute the particles in
vector_distaccording to the underlying domain decomposition. Communicate only the selected particle properties viamap_list(instead of communicating allmap) - Synchronize the ghost layers only for the selected particle properties
ghost_get
The source code of the example Vector/4_complex_prop/main.cpp. The full doxygen documentation Vector_4_complex_prop.
Example 13: Complex properties [2/2]
This example shows how to use complex properties in the distributed vector vector_dist
Key points:
- Creating a distributed vector with particle properties: scalar, vector
float[3],Point, list of floatopenfpm::vector<float>, list of custom structuresopenfpm::vector<A>(whereAis a user-defined type with memory pointers inside), vector of vectorsopenfpm::vector<openfpm::vector<float>>> - Enabling the user-defined type being serializable by
vector_distviapackRequestmethod to indicate how many byte are needed to serialize the structurepackmethod to serialize the data-structure via methodsallocate,getPointerofExtPreAllocand methodpackofPackerunpackmethod to deserialize the data-structure via methodgetPointerOffsetofExtPreAllocand methodunpackofUnpackernoPointersmethod to inform the serialization system that the object has pointers- Constructing constructor, destructor and
operator=to avoid memory leaks
The source code of the example Vector/4_complex_prop/main.cpp. The full doxygen documentation Vector_4_complex_prop_ser.
Example 14: Multiphase Cell-lists and Verlet-lists
This example is an extension to Example 2: Cell-lists and Verlet-lists and ()[]. It shows how to use multi-phase cell-lists and Verlet-list using multiple instances of vector_dist. Key points:
- All the phases have to use the same domain decomposition, which is achieved by passing the decomposition of the first phase to the constructor of
vector_distof all the other phases. - The domains have to be iterated individually via
getDomainIterator, the particles redistributed viamap, the ghost layers synchronized viaghost_getfor all the phasesvector_dist. - Constructing Verlet-lists for two phases (ph0, ph1) with
createVerlet, where for one phase ph0 the neighoring particles of ph1 are assigned in the Verlet-list. Cell-list of ph1 has to be passed tocreateVerlet - Constructing Verlet-lists for multiple phases (ph0, ph1, ph2...) with
createVerletM, where for one phase ph0 the neighoring particles of ph1, ph2... are assigned in the Verlet-list. Cell-list containing all of ph1, ph2... create withcreateCellListMhas to be passed tocreateVerletM - Iterating over the neighboring particles of a multiphase Verlet-list with
getNNIteratorwithgetbeing substituded bygetP(particle phase) andgetV(particle id) - Extending example of the symmetric interaction for multiphase cell-lists and Verlet-lists via
createCellListSymM,createVerletSymM
The source code of the example Vector/4_multiphase_celllist_verlet/main.cpp. The full doxygen documentation Vector_4_mp_cl.
Example 16: Validation and debugging
This example shows how the flexibility of the library can be used to perform complex tasks for validation and debugging. Key points:
- To get unique global id's of the particles the function
accumofvector_distis used, which returns prefix sum of local domain sizes $j<i$ for the logical processor $i$ out of $N$ total processors - Propagate the data from potentially ghost particles q to non-ghost particles in their corresponding domains via
ghost_putwith the operationmerge_, that merges twoopenfpm::vector(ghost and non-ghost)
The source code of the example Vector/6_complex_usage/main.cpp. The full doxygen documentation Vector_6_complex_usage.
Example 17: Smoothed Particle Hydrodynamics (SPH) formulation on CPU [1/2]
This example shows the classical SPH Dam break simulation with Load Balancing and Dynamic load balancing. The example has been adopted from DualSPHysics. Please refer to the website of DualSPHysics and to the paper of Monaghan, 1992 for more details.

Formulation
The SPH formulation used in this example code follow these equations
with the the viscosity term
and the constants defined as
The cubic kernel $W_{ab}$ defined as
its gradient $ \nabla W_{ab} $.
While the particle kernel support is given by
where $dp$ is the particle spacing. Please refer to the work of Monaghan, 1992 for more details on the variables and constants used.
The simulation uses an additional Tensile term to avoid the tensile instability. Please refer to Monaghan, 1999 for more details on this scheme.
Time-stepping
Dynamic time stepping is calculated in accordance with Monaghan, 1992
where
With the governing equations are written as
The Verlet time-stepping scheme Verlet, 1967 is used
Due to the integration over a staggered time interval, the equations of density and velocity are decoupled, which may lead to divergence of the integrated values. See DualSPHysics formulation.
Load Balancing

In order to reach an optimal utilization of available computational resource we distribute the particles to reach a balanced simulation. To do this we set weights for each sub-sub-domain, decompose the space and distribute the particles accordingly.
The weights are set according to:
where $N_{fluid}$ Is the number of fluid particles in a sub-sub-domain and $ N_{boundary} $ is the number of boundary particles.
Implicitly the communication cost is given by $ \frac{V_{ghost}}{V_{sub-sub}} t_s $, while the migration cost is given by $ v_{sub-sub} $. In general $ t_s $ is the number of ghost_get calls between two rebalance calls.
Dynamic load balancing. Theory 1
Dynamic load balancing. Theory 2
Dynamic load balancing. Practice 1
Dynamic load balancing. Practice 2
Simulation results
Simulation video 1
Simulation video 2
Simulation dynamic load balancing video 1
Simulation dynamic load balancing video 2
Simulation countour prospective 1
Simulation countour prospective 2
Simulation countour prospective 3
Key points:
- Load balancing and dynamic load balancing indicate the possibility of the system to re-adapt the domain decomposition to keep all the processor under load and reduce idle time
- Cell-list is used to iterate neighboring particles when computing derivatives
- Domain decomposition could use a user-provided cost function on sub-sub-domains later for them to be assigned to sub-domains (usually equal to the number of processors) via
addComputationCostsofvector_dist - The object
DEC_GRAN(512)passed to the constructor ofvector_distis related to the Load-Balancing decomposition granularity. It indicates that the space must be decomposed in at least $ N_{subsub} $ sub-sub-domains for $ N_p $ processors
- Method
DrawBoxof the classDrawParticlesreturns an iterator that can be used to create particles on a Cartesian grid with a given spacing (grid boundaries should be inside the simulation domain). - After filling the computational cost the domain stored in
vector_distis decomposed viagetDecomposition().decompose()(i.e. every sub-sub-domain is assigned to a processor) and subsequently the particles are redistributed to the corresponding processors viamap.
The source code of the example Vector/7_SPH_dlb/main.cpp. The full doxygen documentation Vector_7_sph_dlb.
Example 17: Smoothed Particle Hydrodynamics (SPH) formulation on CPU: optimized [2/2]
The physical model in the example is identical to Example 17: Smoothed Particle Hydrodynamics (SPH) formulation on CPU.
Key points:
- Verlet-list is used instead of Cell list to iterate neighboring particles when computing derivatives. The Verlet-list is reconstructed on maximum particle displacement reaching the half skin size. Symmetric interaction reduces the computation complexity by half. Ghost particles are used to store symmetric interaction force and density increments. The increments are added to the corresponding non-ghost particles via
ghost_put vector_distis constructed with the optionBIND_DEC_TO_GHOST. It binds the domain decomposition to be multiple of the ghost size required by the symmetric interaction- Refine the domain decomposition instead of decomposing the domain from scratch via
getDecomposition().redecompose(...)ofvector_dist. Available only for ParMetis decomposition.
The source code of the example Vector/7_SPH_dlb_opt/main.cpp. The full doxygen documentation Vector_7_sph_dlb_opt.
Example 18: Smoothed Particle Hydrodynamics (SPH) formulation on GPU [1/3]
The physical model in the example is identical to Example 17: Smoothed Particle Hydrodynamics (SPH) formulation on CPU with the computation-heavy subroutines being executed on GPU.
Simulation results
Simulation video 1
Simulation video 2
Simulation video 3
Key points:
- Derivative approximation scheme (SPH), particle force calculation, time integration schemes (Euler, Verlet time integration) and pressure sensor readings implemented on GPU.
- A primitive reduction function
reduce_localwith the operation_add_is used to get the total energy by summing energies of all particles. - Particles exceeding the domain boundaries are removed with the GPU subroutine
remove_marked<prp>, whereprpis the property ofvector_distset to 1 for particles to be removed, and to 0 otherwise.
The source code of the example Vector/7_SPH_dlb_gpu/main.cu. The full doxygen documentation Vector_7_sph_dlb_gpu.
Example 18: Smoothed Particle Hydrodynamics (SPH) formulation on GPU: optimized [2/3]
The physical model in the example is identical to Example 18: Smoothed Particle Hydrodynamics (SPH) formulation on GPU with the computation-heavy subroutines being executed on GPU optimized for improved coalesced memory access.
Key points:
- To achieve coalesced memory access on GPU and to reduce cache load the particle indices are stored in cell-list in a sorted manner, i.e. particles with neighboring indices are located in the same cell. This is achieved by assigning new particle indices and storing them temporarily in
vector_distby passing the parameterCL_GPU_REORDERto the methodgetCellListGPUofvector_dist. By default the method copies particle positions and no properties to the reordered vector. To copy properties as well they are passed as a template parameter<...>of the methodgetCellListGPU. - The cell-list built on top of the reordered version of
vector_distusesget_sortinstead ofgetto get a neighbor particle index when iterating with the cell-list neighborhood iteratorgetNNIteratorBox - The sorted version of
vector_disthave to be reordered to the original order once the processing is done viarestoreOrderofvector_dist. By default the method copies particle positions and no properties to the original unordered vector. To copy properties as well they are passed as a template parameter<...>of the methodrestoreOrder.
The source code of the example Vector/7_SPH_dlb_gpu_opt/main.cu. The full doxygen documentation Vector_7_sph_dlb_gpu_opt.
Example 18: Smoothed Particle Hydrodynamics (SPH) formulation on GPU: opetimized [3/3]
The physical model in the example is identical to Example 18: Smoothed Particle Hydrodynamics (SPH) formulation on GPU: optimized with the computation-heavy subroutines being executed on GPU optimized for improved coalesced memory access and particle force calculation performed in 2 steps.
Key points:
- The subroutine
get_indexes_by_typeis used to split the particles into 2 lists of fluid and boundary particle ids. Two sets of GPU kernels are devised to calculate forces and density change separately for these two types of particles.
The source code of the example Vector/7_SPH_dlb_gpu_more_opt/main.cu. The full doxygen documentation Vector_7_sph_dlb_gpu_opt.
Example 19: Discrete Element Method (DEM) simulation of the avalanche down the inclined plane
This example implements a Discrete Element Method (DEM) simulation using the Lorentz-force contact model.

A classical model for DEM simulations of spherical granular flows is the Silbert model, it includes a Herzian contact force and an elastic deformation of the grains. Each particles has a radius $R$, mass $m$, polar momentum $I$ and is represented by the location of its center of mass $r_{i}$.
When two particles $i$ and $j$ collide or are in contact, the elastic contact deformation is given by:
where $\vec{r_{ij}}$ is the distance vector connecting particle centers and $r_{ij} = {\lvert \vec{r}_{ij}\rvert}_2$ its module. The normal and tangential components of the relative velocity at the point of contact is given by
with $\vec{n_{ij}}=\vec{r_{ij}}/r_{ij}$ is the normal unit vector in direction of the distance vector, $\vec{\omega_i}$ is the angular velocity of a particle and $\vec{v_{ij}}=\vec{v_i}-\vec{v_j}$ the relative velocity between the two particles. The evolution of the elastic tangential displacement $\vec{u_{t_{ij}}}$ is integrated when two particles are in contact using:
Where $\delta t$ is the time step size. The deformation of the contacts points is stored for each particle and for each new contact point the elastic tangential displacement is initialized with $\vec{u_{t_{ij}}} = 0$. Thus for each pair of particle interacting the normal and tangential forces become:
where $k_{n,t}$ are the elastic constants in normal and tangential direction, respectively, and $\gamma_{n,t}$ the corresponding viscoelastic constants. The effective collision mass is given by $m_{\text{eff}}=\frac{m}{2}$. For each contact point in order to enforce Coulomb's law
the tangential force is bounded by the normal component force. In particular the elastic tangential displacement $\vec{u_{t_{ij}}}$ is adjusted with
This adjustment induce a truncation of the elastic displacement. The Coulomb condition is equivalent to the case where two spheres slip against each other without inducing additional deformations. Thus the deformation is truncated using:
Considering that each particle $i$ interact with all the particles $j$ is in touch with , the total resultant force on particle $i$ is then computed by summing the contributions of all pair particles $(i,j)$. Considering that the grains are also under the effect of the gravitational field we obtain that the total force is given by
where $\vec{g}$ is the acceleration due to gravity. Because particles has also rotational degree of freedoms, the total torque on particle $i$ is calculated using
$\vec{r}_i$ and angular velocities $\vec{\omega}_i$ for each particle $i$ at time step $n+1$, We integrate in time the equations using leap-frog scheme with time step given by
where $\vec{r}_i^{n},\vec{v}_i^{n},\vec{\omega}_i^{n}$ denotes respectively the position, the speed and the rotational speed of the particle $i$ at time step $n$, and $\delta t$ the time step size.
Simulation results
Key points:
- Method
DrawBoxof the classDrawParticlesreturns an iterator that can be used to create particles on a Cartesian grid with a given spacing (grid boundaries should be inside the simulation domain). - Domain decomposition uses a quadratic cost function assigned to sub-domains as a function of the number of sub-sub-domains via
addComputationCostsofvector_dist - Refine the domain decomposition instead of decomposing the domain from scratch via
getDecomposition().redecompose(...)ofvector_dist. Available only for ParMetis decomposition. - Iterating through the neighboring particles via
getNNIteratorof Verlet-list. - The method
updateVerletofvector_distis used to update an existing Verlet-list after particles have changed their positions.
The source code of the example Vector/8_DEM/main.cpp. The full doxygen documentation Vector_8_DEM.
Example 20: GPU CUDA interoperability
This example shows how to access and operate data arrays in GPU kernels via memory pointers obtained from distributed data-structures.
Key points:
- The concept of coalesced memory access is shown for scalar property, vector and tensor properties.
- Memory reallocation process and the concept of memory alignment is explained when extending a vector.
- The method
getDeviceBuffer<...>()of a serial property vectorvectorreturned bygetPropVectorof parallel vectorvector_distis used to obtain an internal device pointer for the given property.
The source code of the example Vector/9_gpu_cuda_interop/main.cu. The full doxygen documentation 9_gpu_cuda_interop.