OpenFPM_pdata  4.1.0
Project that contain the implementation of distributed structures
cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > Struct Template Reference

BlockReduceRakingCommutativeOnly provides raking-based methods of parallel reduction across a CUDA thread block. Does not support non-commutative reduction operators. Does not support block sizes that are not a multiple of the warp size. More...

Detailed Description

template<typename T, int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>
struct cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >

BlockReduceRakingCommutativeOnly provides raking-based methods of parallel reduction across a CUDA thread block. Does not support non-commutative reduction operators. Does not support block sizes that are not a multiple of the warp size.

< The PTX compute capability for which to to specialize this collective

Definition at line 58 of file block_reduce_raking_commutative_only.cuh.

Data Structures

union  _TempStorage
 Shared memory storage layout type. More...
 
struct  TempStorage
 Alias wrapper allowing storage to be unioned. More...
 

Public Types

enum  { BLOCK_THREADS = BLOCK_DIM_X * BLOCK_DIM_Y * BLOCK_DIM_Z }
 Constants. More...
 
enum  {
  WARP_THREADS = CUB_WARP_THREADS(PTX_ARCH), USE_FALLBACK = ((BLOCK_THREADS % WARP_THREADS != 0) || (BLOCK_THREADS <= WARP_THREADS)), RAKING_THREADS = WARP_THREADS, SHARING_THREADS = CUB_MAX(1, BLOCK_THREADS - RAKING_THREADS),
  SEGMENT_LENGTH = SHARING_THREADS / WARP_THREADS
}
 Constants. More...
 
typedef BlockReduceRaking< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > FallBack
 
typedef WarpReduce< T, RAKING_THREADS, PTX_ARCH > WarpReduce
 WarpReduce utility type.
 
typedef BlockRakingLayout< T, SHARING_THREADS, PTX_ARCH > BlockRakingLayout
 Layout type for padded thread block raking grid.
 

Public Member Functions

__device__ __forceinline__ BlockReduceRakingCommutativeOnly (TempStorage &temp_storage)
 Constructor.
 
template<bool FULL_TILE>
__device__ __forceinline__ T Sum (T partial, int num_valid)
 Computes a thread block-wide reduction using addition (+) as the reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread0. More...
 
template<bool FULL_TILE, typename ReductionOp >
__device__ __forceinline__ T Reduce (T partial, int num_valid, ReductionOp reduction_op)
 Computes a thread block-wide reduction using the specified reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread0. More...
 

Data Fields

_TempStoragetemp_storage
 
unsigned int linear_tid
 

Member Enumeration Documentation

◆ anonymous enum

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>
anonymous enum

Constants.

Enumerator
BLOCK_THREADS 

The thread block size in threads.

Definition at line 61 of file block_reduce_raking_commutative_only.cuh.

◆ anonymous enum

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>
anonymous enum

Constants.

Enumerator
WARP_THREADS 

Number of warp threads.

USE_FALLBACK 

Whether or not to use fall-back.

RAKING_THREADS 

Number of raking threads.

SHARING_THREADS 

Number of threads actually sharing items with the raking threads.

SEGMENT_LENGTH 

Number of raking elements per warp synchronous raking thread.

Definition at line 71 of file block_reduce_raking_commutative_only.cuh.

Member Function Documentation

◆ Reduce()

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>
template<bool FULL_TILE, typename ReductionOp >
__device__ __forceinline__ T cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Reduce ( partial,
int  num_valid,
ReductionOp  reduction_op 
)
inline

Computes a thread block-wide reduction using the specified reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread0.

Parameters
[in]partialCalling thread's input partial reductions
[in]num_validNumber of valid elements (may be less than BLOCK_THREADS)
[in]reduction_opBinary reduction operator

Definition at line 163 of file block_reduce_raking_commutative_only.cuh.

◆ Sum()

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>
template<bool FULL_TILE>
__device__ __forceinline__ T cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Sum ( partial,
int  num_valid 
)
inline

Computes a thread block-wide reduction using addition (+) as the reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread0.

Parameters
[in]partialCalling thread's input partial reductions
[in]num_validNumber of valid elements (may be less than BLOCK_THREADS)

Definition at line 127 of file block_reduce_raking_commutative_only.cuh.


The documentation for this struct was generated from the following file: