BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data. More...

Detailed Description

template<typename T, int BLOCK_THREADS, int PTX_ARCH = CUB_PTX_ARCH>
struct cub::BlockRakingLayout< T, BLOCK_THREADS, PTX_ARCH >

BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data.

Overview: This type facilitates a shared memory usage pattern where a block of CUDA threads places elements into shared memory and then reduces the active parallelism to one "raking" warp of threads for serially aggregating consecutive sequences of shared items. Padding is inserted to eliminate bank conflicts (for most data types).

Template Parameters

T	The data type to be exchanged.
BLOCK_THREADS	The thread block size in threads.
PTX_ARCH	[optional] \ptxversion

Definition at line 67 of file block_raking_layout.cuh.

Data Structures
struct	TempStorage
	Alias wrapper allowing storage to be unioned. More...

Public Types
enum	{ SHARED_ELEMENTS = BLOCK_THREADS , MAX_RAKING_THREADS = CUB_MIN(BLOCK_THREADS, CUB_WARP_THREADS(PTX_ARCH)) , SEGMENT_LENGTH = (SHARED_ELEMENTS + MAX_RAKING_THREADS - 1) / MAX_RAKING_THREADS , RAKING_THREADS = (SHARED_ELEMENTS + SEGMENT_LENGTH - 1) / SEGMENT_LENGTH , HAS_CONFLICTS = (CUB_SMEM_BANKS(PTX_ARCH) % SEGMENT_LENGTH == 0) , CONFLICT_DEGREE , USE_SEGMENT_PADDING = ((SEGMENT_LENGTH & 1) == 0) && (SEGMENT_LENGTH > 2) , GRID_ELEMENTS = RAKING_THREADS * (SEGMENT_LENGTH + USE_SEGMENT_PADDING) , UNGUARDED = (SHARED_ELEMENTS % RAKING_THREADS == 0) }

Public Member Functions
struct	__align__ (16) _TempStorage
	Shared memory storage type.

Static Public Member Functions
static __device__ __forceinline__ T *	PlacementPtr (TempStorage &temp_storage, unsigned int linear_tid)
	Returns the location for the calling thread to place data into the grid.

static __device__ __forceinline__ T *	RakingPtr (TempStorage &temp_storage, unsigned int linear_tid)
	Returns the location for the calling thread to begin sequential raking.

Member Enumeration Documentation

◆ anonymous enum

template<typename T , int BLOCK_THREADS, int PTX_ARCH = CUB_PTX_ARCH>

anonymous enum

Enumerator
SHARED_ELEMENTS	The total number of elements that need to be cooperatively reduced.
MAX_RAKING_THREADS	Maximum number of warp-synchronous raking threads.
SEGMENT_LENGTH	Number of raking elements per warp-synchronous raking thread (rounded up)
RAKING_THREADS	Never use a raking thread that will have no valid data (e.g., when BLOCK_THREADS is 62 and SEGMENT_LENGTH is 2, we should only use 31 raking threads)
HAS_CONFLICTS	Whether we will have bank conflicts (technically we should find out if the GCD is > 1)
CONFLICT_DEGREE	Degree of bank conflicts (e.g., 4-way)
USE_SEGMENT_PADDING	Pad each segment length with one element if segment length is not relatively prime to warp size and can't be optimized as a vector load.
GRID_ELEMENTS	Total number of elements in the raking grid.
UNGUARDED	Whether or not we need bounds checking during raking (the number of reduction elements is not a multiple of the number of raking threads)

Definition at line 73 of file block_raking_layout.cuh.

Member Function Documentation

◆ align()

template<typename T , int BLOCK_THREADS, int PTX_ARCH = CUB_PTX_ARCH>

struct cub::BlockRakingLayout< T, BLOCK_THREADS, PTX_ARCH >::__align__ ( 16 )

inline

Shared memory storage type.

Definition at line 151 of file block_raking_layout.cuh.

◆ PlacementPtr()

template<typename T , int BLOCK_THREADS, int PTX_ARCH = CUB_PTX_ARCH>

static __device__ __forceinline__ T * cub::BlockRakingLayout< T, BLOCK_THREADS, PTX_ARCH >::PlacementPtr	(	TempStorage &	temp_storage,
		unsigned int	linear_tid
	)

inlinestatic

Returns the location for the calling thread to place data into the grid.

Definition at line 121 of file block_raking_layout.cuh.

◆ RakingPtr()

template<typename T , int BLOCK_THREADS, int PTX_ARCH = CUB_PTX_ARCH>

static __device__ __forceinline__ T * cub::BlockRakingLayout< T, BLOCK_THREADS, PTX_ARCH >::RakingPtr	(	TempStorage &	temp_storage,
		unsigned int	linear_tid
	)

inlinestatic

Returns the location for the calling thread to begin sequential raking.

Definition at line 142 of file block_raking_layout.cuh.

The documentation for this struct was generated from the following file:

openfpm_data/src/util/cuda/cub_old/block/block_raking_layout.cuh

Detailed Description

Data Structures

Public Types

Public Member Functions

Static Public Member Functions

Member Enumeration Documentation

◆ anonymous enum

Member Function Documentation

◆ __align__()

◆ PlacementPtr()

◆ RakingPtr()

◆ align()