OpenFPM_pdata  3.0.0
Project that contain the implementation of distributed structures
cub::BlockRakingLayout< T, BLOCK_THREADS, PTX_ARCH > Struct Template Reference

BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data. More...

Detailed Description

template<typename T, int BLOCK_THREADS, int PTX_ARCH = CUB_PTX_ARCH>
struct cub::BlockRakingLayout< T, BLOCK_THREADS, PTX_ARCH >

BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data.

Overview
This type facilitates a shared memory usage pattern where a block of CUDA threads places elements into shared memory and then reduces the active parallelism to one "raking" warp of threads for serially aggregating consecutive sequences of shared items. Padding is inserted to eliminate bank conflicts (for most data types).
Template Parameters
TThe data type to be exchanged.
BLOCK_THREADSThe thread block size in threads.
PTX_ARCH[optional] \ptxversion

Definition at line 67 of file block_raking_layout.cuh.

Data Structures

struct  TempStorage
 Alias wrapper allowing storage to be unioned. More...
 

Public Types

enum  {
  SHARED_ELEMENTS = BLOCK_THREADS, MAX_RAKING_THREADS = CUB_MIN(BLOCK_THREADS, CUB_WARP_THREADS(PTX_ARCH)), SEGMENT_LENGTH = (SHARED_ELEMENTS + MAX_RAKING_THREADS - 1) / MAX_RAKING_THREADS, RAKING_THREADS = (SHARED_ELEMENTS + SEGMENT_LENGTH - 1) / SEGMENT_LENGTH,
  HAS_CONFLICTS = (CUB_SMEM_BANKS(PTX_ARCH) % SEGMENT_LENGTH == 0), CONFLICT_DEGREE, USE_SEGMENT_PADDING = ((SEGMENT_LENGTH & 1) == 0) && (SEGMENT_LENGTH > 2), GRID_ELEMENTS = RAKING_THREADS * (SEGMENT_LENGTH + USE_SEGMENT_PADDING),
  UNGUARDED = (SHARED_ELEMENTS % RAKING_THREADS == 0)
}
 

Public Member Functions

struct __align__ (16) _TempStorage
 Shared memory storage type.
 

Static Public Member Functions

static __device__ __forceinline__ T * PlacementPtr (TempStorage &temp_storage, unsigned int linear_tid)
 Returns the location for the calling thread to place data into the grid.
 
static __device__ __forceinline__ T * RakingPtr (TempStorage &temp_storage, unsigned int linear_tid)
 Returns the location for the calling thread to begin sequential raking.
 

Member Enumeration Documentation

◆ anonymous enum

template<typename T , int BLOCK_THREADS, int PTX_ARCH = CUB_PTX_ARCH>
anonymous enum
Enumerator
SHARED_ELEMENTS 

The total number of elements that need to be cooperatively reduced.

MAX_RAKING_THREADS 

Maximum number of warp-synchronous raking threads.

SEGMENT_LENGTH 

Number of raking elements per warp-synchronous raking thread (rounded up)

RAKING_THREADS 

Never use a raking thread that will have no valid data (e.g., when BLOCK_THREADS is 62 and SEGMENT_LENGTH is 2, we should only use 31 raking threads)

HAS_CONFLICTS 

Whether we will have bank conflicts (technically we should find out if the GCD is > 1)

CONFLICT_DEGREE 

Degree of bank conflicts (e.g., 4-way)

USE_SEGMENT_PADDING 

Pad each segment length with one element if segment length is not relatively prime to warp size and can't be optimized as a vector load.

GRID_ELEMENTS 

Total number of elements in the raking grid.

UNGUARDED 

Whether or not we need bounds checking during raking (the number of reduction elements is not a multiple of the number of raking threads)

Definition at line 73 of file block_raking_layout.cuh.


The documentation for this struct was generated from the following file: