OpenFPM_pdata  3.0.0
Project that contain the implementation of distributed structures
cub::GridEvenShare< OffsetT > Struct Template Reference

GridEvenShare is a descriptor utility for distributing input among CUDA thread blocks in an "even-share" fashion. Each thread block gets roughly the same number of input tiles. More...

Detailed Description

template<typename OffsetT>
struct cub::GridEvenShare< OffsetT >

GridEvenShare is a descriptor utility for distributing input among CUDA thread blocks in an "even-share" fashion. Each thread block gets roughly the same number of input tiles.

Overview
Each thread block is assigned a consecutive sequence of input tiles. To help preserve alignment and eliminate the overhead of guarded loads for all but the last thread block, to GridEvenShare assigns one of three different amounts of work to a given thread block: "big", "normal", or "last". The "big" workloads are one scheduling grain larger than "normal". The "last" work unit for the last thread block may be partially-full if the input is not an even multiple of the scheduling grain size.
Before invoking a child grid, a parent thread will typically construct an instance of GridEvenShare. The instance can be passed to child thread blocks which can initialize their per-thread block offsets using BlockInit().

Definition at line 74 of file grid_even_share.cuh.

Public Member Functions

__host__ __device__ __forceinline__ GridEvenShare ()
 Constructor.
 
__host__ __device__ __forceinline__ void DispatchInit (OffsetT num_items, int max_grid_size, int tile_items)
 Dispatch initializer. To be called prior prior to kernel launch. More...
 
template<int TILE_ITEMS>
__device__ __forceinline__ void BlockInit (int block_id, Int2Type< GRID_MAPPING_RAKE >)
 Initializes ranges for the specified thread block index. Specialized for a "raking" access pattern in which each thread block is assigned a consecutive sequence of input tiles.
 
template<int TILE_ITEMS>
__device__ __forceinline__ void BlockInit (int block_id, Int2Type< GRID_MAPPING_STRIP_MINE >)
 Block-initialization, specialized for a "raking" access pattern in which each thread block is assigned a consecutive sequence of input tiles.
 
template<int TILE_ITEMS, GridMappingStrategy STRATEGY>
__device__ __forceinline__ void BlockInit ()
 Block-initialization, specialized for "strip mining" access pattern in which the input tiles assigned to each thread block are separated by a stride equal to the the extent of the grid.
 
template<int TILE_ITEMS>
__device__ __forceinline__ void BlockInit (OffsetT block_offset, OffsetT block_end)
 Block-initialization, specialized for a "raking" access pattern in which each thread block is assigned a consecutive sequence of input tiles. More...
 

Data Fields

OffsetT num_items
 Total number of input items.
 
int grid_size
 Grid size in thread blocks.
 
OffsetT block_offset
 OffsetT into input marking the beginning of the owning thread block's segment of input tiles.
 
OffsetT block_end
 OffsetT into input of marking the end (one-past) of the owning thread block's segment of input tiles.
 
OffsetT block_stride
 Stride between input tiles.
 

Private Attributes

OffsetT total_tiles
 
int big_shares
 
OffsetT big_share_items
 
OffsetT normal_share_items
 
OffsetT normal_base_offset
 

Member Function Documentation

◆ BlockInit()

template<typename OffsetT>
template<int TILE_ITEMS>
__device__ __forceinline__ void cub::GridEvenShare< OffsetT >::BlockInit ( OffsetT  block_offset,
OffsetT  block_end 
)
inline

Block-initialization, specialized for a "raking" access pattern in which each thread block is assigned a consecutive sequence of input tiles.

Parameters
[in]block_offsetThreadblock begin offset (inclusive)
[in]block_endThreadblock end offset (exclusive)

Definition at line 203 of file grid_even_share.cuh.

◆ DispatchInit()

template<typename OffsetT>
__host__ __device__ __forceinline__ void cub::GridEvenShare< OffsetT >::DispatchInit ( OffsetT  num_items,
int  max_grid_size,
int  tile_items 
)
inline

Dispatch initializer. To be called prior prior to kernel launch.

Parameters
num_itemsTotal number of input items
max_grid_sizeMaximum grid size allowable (actual grid size may be less if not warranted by the the number of input items)
tile_itemsNumber of data items per input tile

Definition at line 122 of file grid_even_share.cuh.


The documentation for this struct was generated from the following file: