OpenFPM_pdata  4.1.0
Project that contain the implementation of distributed structures
 
Loading...
Searching...
No Matches
cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > Class Template Reference

The BlockShuffle class provides collective methods for shuffling data partitioned across a CUDA thread block. More...

Detailed Description

template<typename T, int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
class cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >

The BlockShuffle class provides collective methods for shuffling data partitioned across a CUDA thread block.

Template Parameters
TThe data type to be exchanged.
BLOCK_DIM_XThe thread block length in threads along the X dimension
BLOCK_DIM_Y[optional] The thread block length in threads along the Y dimension (default: 1)
BLOCK_DIM_Z[optional] The thread block length in threads along the Z dimension (default: 1)
PTX_ARCH[optional] \ptxversion
Overview
It is commonplace for blocks of threads to rearrange data items between threads. The BlockShuffle abstraction allows threads to efficiently shift items either (a) up to their successor or (b) down to their predecessor.

Definition at line 70 of file block_shuffle.cuh.

Data Structures

struct  _TempStorage
 Shared memory storage layout type (last element from each thread's input) More...
 
struct  TempStorage
 \smemstorage{BlockShuffle} More...
 

Public Member Functions

Collective constructors
__device__ __forceinline__ BlockShuffle ()
 Collective constructor using a private static allocation of shared memory as temporary storage.
 
__device__ __forceinline__ BlockShuffle (TempStorage &temp_storage)
 Collective constructor using the specified memory allocation as temporary storage.
 
Shuffle movement
__device__ __forceinline__ void Offset (T input, T &output, int distance=1)
 Each threadi obtains the input provided by threadi+distance. The offset distance may be negative.
 
__device__ __forceinline__ void Rotate (T input, T &output, unsigned int distance=1)
 Each threadi obtains the input provided by threadi+distance.
 
template<int ITEMS_PER_THREAD>
__device__ __forceinline__ void Up (T(&input)[ITEMS_PER_THREAD], T(&prev)[ITEMS_PER_THREAD])
 The thread block rotates its blocked arrangement of input items, shifting it up by one item.
 
template<int ITEMS_PER_THREAD>
__device__ __forceinline__ void Up (T(&input)[ITEMS_PER_THREAD], T(&prev)[ITEMS_PER_THREAD], T &block_suffix)
 The thread block rotates its blocked arrangement of input items, shifting it up by one item. All threads receive the input provided by threadBLOCK_THREADS-1.
 
template<int ITEMS_PER_THREAD>
__device__ __forceinline__ void Down (T(&input)[ITEMS_PER_THREAD], T(&prev)[ITEMS_PER_THREAD])
 The thread block rotates its blocked arrangement of input items, shifting it down by one item.
 
template<int ITEMS_PER_THREAD>
__device__ __forceinline__ void Down (T(&input)[ITEMS_PER_THREAD], T(&prev)[ITEMS_PER_THREAD], T &block_prefix)
 The thread block rotates its blocked arrangement of input items, shifting it down by one item. All threads receive input[0] provided by thread0.
 

Private Types

enum  { BLOCK_THREADS = BLOCK_DIM_X * BLOCK_DIM_Y * BLOCK_DIM_Z , LOG_WARP_THREADS = CUB_LOG_WARP_THREADS(PTX_ARCH) , WARP_THREADS = 1 << LOG_WARP_THREADS , WARPS = (BLOCK_THREADS + WARP_THREADS - 1) / WARP_THREADS }
 

Private Member Functions

__device__ __forceinline__ _TempStoragePrivateStorage ()
 Internal storage allocator.
 

Private Attributes

_TempStoragetemp_storage
 Shared storage reference.
 
unsigned int linear_tid
 Linear thread-id.
 

Member Enumeration Documentation

◆ anonymous enum

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
anonymous enum
private

Definition at line 78 of file block_shuffle.cuh.

Constructor & Destructor Documentation

◆ BlockShuffle() [1/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockShuffle ( )
inline

Collective constructor using a private static allocation of shared memory as temporary storage.

Definition at line 140 of file block_shuffle.cuh.

◆ BlockShuffle() [2/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockShuffle ( TempStorage temp_storage)
inline

Collective constructor using the specified memory allocation as temporary storage.

Parameters
[in]temp_storageReference to memory allocation having layout type TempStorage

Definition at line 150 of file block_shuffle.cuh.

Member Function Documentation

◆ Down() [1/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD>
__device__ __forceinline__ void cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Down ( T(&)  input[ITEMS_PER_THREAD],
T(&)  prev[ITEMS_PER_THREAD] 
)
inline

The thread block rotates its blocked arrangement of input items, shifting it down by one item.

  • \blocked
  • \granularity
  • \smemreuse
Parameters
[in]inputThe calling thread's input items
[out]prevThe corresponding predecessor items (may be aliased to input). The value prev[0] is not updated for threadBLOCK_THREADS-1.

Definition at line 263 of file block_shuffle.cuh.

◆ Down() [2/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD>
__device__ __forceinline__ void cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Down ( T(&)  input[ITEMS_PER_THREAD],
T(&)  prev[ITEMS_PER_THREAD],
T &  block_prefix 
)
inline

The thread block rotates its blocked arrangement of input items, shifting it down by one item. All threads receive input[0] provided by thread0.

  • \blocked
  • \granularity
  • \smemreuse
Parameters
[in]inputThe calling thread's input items
[out]prevThe corresponding predecessor items (may be aliased to input). The value prev[0] is not updated for threadBLOCK_THREADS-1.
[out]block_prefixThe item input[0] from thread0, provided to all threads

Definition at line 289 of file block_shuffle.cuh.

◆ Offset()

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ void cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Offset ( input,
T &  output,
int  distance = 1 
)
inline

Each threadi obtains the input provided by threadi+distance. The offset distance may be negative.

  • \smemreuse
Parameters
[in]inputThe input item from the calling thread (threadi)
[out]outputThe input item from the successor (or predecessor) thread threadi+distance (may be aliased to input). This value is only updated for for threadi when 0 <= (i + distance) < BLOCK_THREADS-1
[in]distanceOffset distance (may be negative)

Definition at line 171 of file block_shuffle.cuh.

◆ PrivateStorage()

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ _TempStorage & cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::PrivateStorage ( )
inlineprivate

Internal storage allocator.

Definition at line 123 of file block_shuffle.cuh.

◆ Rotate()

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ void cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Rotate ( input,
T &  output,
unsigned int  distance = 1 
)
inline

Each threadi obtains the input provided by threadi+distance.

  • \smemreuse
Parameters
[in]inputThe calling thread's input item
[out]outputThe input item from thread thread(i+distance>)%<BLOCK_THREADS> (may be aliased to input). This value is not updated for threadBLOCK_THREADS-1
[in]distanceOffset distance (0 < distance < BLOCK_THREADS)

Definition at line 191 of file block_shuffle.cuh.

◆ Up() [1/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD>
__device__ __forceinline__ void cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Up ( T(&)  input[ITEMS_PER_THREAD],
T(&)  prev[ITEMS_PER_THREAD] 
)
inline

The thread block rotates its blocked arrangement of input items, shifting it up by one item.

  • \blocked
  • \granularity
  • \smemreuse
Parameters
[in]inputThe calling thread's input items
[out]prevThe corresponding predecessor items (may be aliased to input). The item prev[0] is not updated for thread0.

Definition at line 217 of file block_shuffle.cuh.

◆ Up() [2/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD>
__device__ __forceinline__ void cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::Up ( T(&)  input[ITEMS_PER_THREAD],
T(&)  prev[ITEMS_PER_THREAD],
T &  block_suffix 
)
inline

The thread block rotates its blocked arrangement of input items, shifting it up by one item. All threads receive the input provided by threadBLOCK_THREADS-1.

  • \blocked
  • \granularity
  • \smemreuse
Parameters
[in]inputThe calling thread's input items
[out]prevThe corresponding predecessor items (may be aliased to input). The item prev[0] is not updated for thread0.
[out]block_suffixThe item input[ITEMS_PER_THREAD-1] from threadBLOCK_THREADS-1, provided to all threads

Definition at line 244 of file block_shuffle.cuh.

Field Documentation

◆ linear_tid

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
unsigned int cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::linear_tid
private

Linear thread-id.

Definition at line 115 of file block_shuffle.cuh.

◆ temp_storage

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
_TempStorage& cub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::temp_storage
private

Shared storage reference.

Definition at line 112 of file block_shuffle.cuh.


The documentation for this class was generated from the following file: