BlockReduceRakingCommutativeOnly provides raking-based methods of parallel reduction across a CUDA thread block. Does not support non-commutative reduction operators. Does not support block sizes that are not a multiple of the warp size. More...
BlockReduceRakingCommutativeOnly provides raking-based methods of parallel reduction across a CUDA thread block. Does not support non-commutative reduction operators. Does not support block sizes that are not a multiple of the warp size.
< The PTX compute capability for which to to specialize this collective
Definition at line 58 of file block_reduce_raking_commutative_only.cuh.
Data Structures | |
union | _TempStorage |
Shared memory storage layout type. More... | |
struct | TempStorage |
Alias wrapper allowing storage to be unioned. More... | |
Public Types | |
enum | { BLOCK_THREADS = BLOCK_DIM_X * BLOCK_DIM_Y * BLOCK_DIM_Z } |
Constants. More... | |
enum | { WARP_THREADS = CUB_WARP_THREADS(PTX_ARCH) , USE_FALLBACK = ((BLOCK_THREADS % WARP_THREADS != 0) || (BLOCK_THREADS <= WARP_THREADS)) , RAKING_THREADS = WARP_THREADS , SHARING_THREADS = CUB_MAX(1, BLOCK_THREADS - RAKING_THREADS) , SEGMENT_LENGTH = SHARING_THREADS / WARP_THREADS } |
Constants. More... | |
typedef BlockReduceRaking< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | FallBack |
typedef WarpReduce< T, RAKING_THREADS, PTX_ARCH > | WarpReduce |
WarpReduce utility type. | |
typedef BlockRakingLayout< T, SHARING_THREADS, PTX_ARCH > | BlockRakingLayout |
Layout type for padded thread block raking grid. | |
Public Member Functions | |
__device__ __forceinline__ | BlockReduceRakingCommutativeOnly (TempStorage &temp_storage) |
Constructor. | |
template<bool FULL_TILE> | |
__device__ __forceinline__ T | Sum (T partial, int num_valid) |
Computes a thread block-wide reduction using addition (+) as the reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread0. | |
template<bool FULL_TILE, typename ReductionOp > | |
__device__ __forceinline__ T | Reduce (T partial, int num_valid, ReductionOp reduction_op) |
Computes a thread block-wide reduction using the specified reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread0. | |
Data Fields | |
_TempStorage & | temp_storage |
unsigned int | linear_tid |
typedef BlockRakingLayout<T, SHARING_THREADS, PTX_ARCH> cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockRakingLayout |
Layout type for padded thread block raking grid.
Definition at line 93 of file block_reduce_raking_commutative_only.cuh.
typedef BlockReduceRaking<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH> cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FallBack |
Definition at line 68 of file block_reduce_raking_commutative_only.cuh.
typedef WarpReduce<T, RAKING_THREADS, PTX_ARCH> cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::WarpReduce |
WarpReduce utility type.
Definition at line 90 of file block_reduce_raking_commutative_only.cuh.
anonymous enum |
Constants.
Enumerator | |
---|---|
BLOCK_THREADS | The thread block size in threads. |
Definition at line 61 of file block_reduce_raking_commutative_only.cuh.
anonymous enum |
Constants.
Definition at line 71 of file block_reduce_raking_commutative_only.cuh.
|
inline |
Constructor.
Definition at line 117 of file block_reduce_raking_commutative_only.cuh.
|
inline |
Computes a thread block-wide reduction using the specified reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread0.
[in] | partial | Calling thread's input partial reductions |
[in] | num_valid | Number of valid elements (may be less than BLOCK_THREADS) |
[in] | reduction_op | Binary reduction operator |
Definition at line 163 of file block_reduce_raking_commutative_only.cuh.
|
inline |
Computes a thread block-wide reduction using addition (+) as the reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread0.
[in] | partial | Calling thread's input partial reductions |
[in] | num_valid | Number of valid elements (may be less than BLOCK_THREADS) |
Definition at line 127 of file block_reduce_raking_commutative_only.cuh.
unsigned int cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::linear_tid |
Definition at line 113 of file block_reduce_raking_commutative_only.cuh.
_TempStorage& cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::temp_storage |
Definition at line 112 of file block_reduce_raking_commutative_only.cuh.