BlockReduceRakingCommutativeOnly provides raking-based methods of parallel reduction across a CUDA thread block. Does not support non-commutative reduction operators. Does not support block sizes that are not a multiple of the warp size. More...

Detailed Description

template<typename T, int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>
struct cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >

BlockReduceRakingCommutativeOnly provides raking-based methods of parallel reduction across a CUDA thread block. Does not support non-commutative reduction operators. Does not support block sizes that are not a multiple of the warp size.

< The PTX compute capability for which to to specialize this collective

Definition at line 58 of file block_reduce_raking_commutative_only.cuh.

Data Structures
union	_TempStorage
	Shared memory storage layout type. More...

struct	TempStorage
	Alias wrapper allowing storage to be unioned. More...

Public Types
enum	{ BLOCK_THREADS = BLOCK_DIM_X * BLOCK_DIM_Y * BLOCK_DIM_Z }
	Constants. More...

enum	{ WARP_THREADS = CUB_WARP_THREADS(PTX_ARCH) , USE_FALLBACK = ((BLOCK_THREADS % WARP_THREADS != 0) \|\| (BLOCK_THREADS <= WARP_THREADS)) , RAKING_THREADS = WARP_THREADS , SHARING_THREADS = CUB_MAX(1, BLOCK_THREADS - RAKING_THREADS) , SEGMENT_LENGTH = SHARING_THREADS / WARP_THREADS }
	Constants. More...

typedef BlockReduceRaking< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >	FallBack

typedef WarpReduce< T, RAKING_THREADS, PTX_ARCH >	WarpReduce
	WarpReduce utility type.

typedef BlockRakingLayout< T, SHARING_THREADS, PTX_ARCH >	BlockRakingLayout
	Layout type for padded thread block raking grid.

Public Member Functions
__device__ __forceinline__	BlockReduceRakingCommutativeOnly (TempStorage &temp_storage)
	Constructor.

template<bool FULL_TILE>
__device__ __forceinline__ T	Sum (T partial, int num_valid)
	Computes a thread block-wide reduction using addition (+) as the reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread₀.

template<bool FULL_TILE, typename ReductionOp >
__device__ __forceinline__ T	Reduce (T partial, int num_valid, ReductionOp reduction_op)
	Computes a thread block-wide reduction using the specified reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread₀.

Data Fields
_TempStorage &	temp_storage

unsigned int	linear_tid

Member Typedef Documentation

◆ BlockRakingLayout

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

typedef BlockRakingLayout<T, SHARING_THREADS, PTX_ARCH> cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockRakingLayout

Layout type for padded thread block raking grid.

Definition at line 93 of file block_reduce_raking_commutative_only.cuh.

◆ FallBack

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

typedef BlockReduceRaking<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH> cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FallBack

Definition at line 68 of file block_reduce_raking_commutative_only.cuh.

◆ WarpReduce

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

typedef WarpReduce<T, RAKING_THREADS, PTX_ARCH> cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::WarpReduce

WarpReduce utility type.

Definition at line 90 of file block_reduce_raking_commutative_only.cuh.

Member Enumeration Documentation

◆ anonymous enum

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

anonymous enum

Constants.

Enumerator
BLOCK_THREADS	The thread block size in threads.

Definition at line 61 of file block_reduce_raking_commutative_only.cuh.

◆ anonymous enum

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

anonymous enum

Constants.

Enumerator
WARP_THREADS	Number of warp threads.
USE_FALLBACK	Whether or not to use fall-back.
RAKING_THREADS	Number of raking threads.
SHARING_THREADS	Number of threads actually sharing items with the raking threads.
SEGMENT_LENGTH	Number of raking elements per warp synchronous raking thread.

Definition at line 71 of file block_reduce_raking_commutative_only.cuh.

Constructor & Destructor Documentation

◆ BlockReduceRakingCommutativeOnly()

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

__device__ __forceinline__ cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockReduceRakingCommutativeOnly ( TempStorage & temp_storage )

inline

Constructor.

Definition at line 117 of file block_reduce_raking_commutative_only.cuh.

Member Function Documentation

◆ Reduce()

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

template<bool FULL_TILE, typename ReductionOp >

__device__ __forceinline__ T cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Reduce	(	T	partial,
		int	num_valid,
		ReductionOp	reduction_op
	)

inline

Computes a thread block-wide reduction using the specified reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread₀.

Parameters

[in]	partial	Calling thread's input partial reductions
[in]	num_valid	Number of valid elements (may be less than BLOCK_THREADS)
[in]	reduction_op	Binary reduction operator

Definition at line 163 of file block_reduce_raking_commutative_only.cuh.

◆ Sum()

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

template<bool FULL_TILE>

__device__ __forceinline__ T cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Sum	(	T	partial,
		int	num_valid
	)

inline

Computes a thread block-wide reduction using addition (+) as the reduction operator. The first num_valid threads each contribute one reduction partial. The return value is only valid for thread₀.

Parameters

[in]	partial	Calling thread's input partial reductions
[in]	num_valid	Number of valid elements (may be less than BLOCK_THREADS)

Definition at line 127 of file block_reduce_raking_commutative_only.cuh.

Field Documentation

◆ linear_tid

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

unsigned int cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::linear_tid

Definition at line 113 of file block_reduce_raking_commutative_only.cuh.

◆ temp_storage

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y, int BLOCK_DIM_Z, int PTX_ARCH>

_TempStorage& cub::BlockReduceRakingCommutativeOnly< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::temp_storage

Definition at line 112 of file block_reduce_raking_commutative_only.cuh.

The documentation for this struct was generated from the following file:

openfpm_data/src/util/cuda/cub_old/block/specializations/block_reduce_raking_commutative_only.cuh

Detailed Description

Data Structures

Public Types

Public Member Functions

Data Fields

Member Typedef Documentation

◆ BlockRakingLayout

◆ FallBack

◆ WarpReduce

Member Enumeration Documentation

◆ anonymous enum

◆ anonymous enum

Constructor & Destructor Documentation

◆ BlockReduceRakingCommutativeOnly()

Member Function Documentation

◆ Reduce()

◆ Sum()

Field Documentation

◆ linear_tid

◆ temp_storage