The BlockShuffle class provides collective methods for shuffling data partitioned across a CUDA thread block. More...
The BlockShuffle class provides collective methods for shuffling data partitioned across a CUDA thread block.
| T | The data type to be exchanged. |
| BLOCK_DIM_X | The thread block length in threads along the X dimension |
| BLOCK_DIM_Y | [optional] The thread block length in threads along the Y dimension (default: 1) |
| BLOCK_DIM_Z | [optional] The thread block length in threads along the Z dimension (default: 1) |
| PTX_ARCH | [optional] \ptxversion |
Definition at line 70 of file block_shuffle.cuh.
Data Structures | |
| struct | _TempStorage |
| Shared memory storage layout type (last element from each thread's input) More... | |
| struct | TempStorage |
| \smemstorage{BlockShuffle} More... | |
Public Member Functions | |
Collective constructors | |
| __device__ __forceinline__ | BlockShuffle () |
| Collective constructor using a private static allocation of shared memory as temporary storage. | |
| __device__ __forceinline__ | BlockShuffle (TempStorage &temp_storage) |
| Collective constructor using the specified memory allocation as temporary storage. | |
Shuffle movement | |
| __device__ __forceinline__ void | Offset (T input, T &output, int distance=1) |
Each threadi obtains the input provided by threadi+distance. The offset distance may be negative. | |
| __device__ __forceinline__ void | Rotate (T input, T &output, unsigned int distance=1) |
Each threadi obtains the input provided by threadi+distance. | |
| template<int ITEMS_PER_THREAD> | |
| __device__ __forceinline__ void | Up (T(&input)[ITEMS_PER_THREAD], T(&prev)[ITEMS_PER_THREAD]) |
The thread block rotates its blocked arrangement of input items, shifting it up by one item. | |
| template<int ITEMS_PER_THREAD> | |
| __device__ __forceinline__ void | Up (T(&input)[ITEMS_PER_THREAD], T(&prev)[ITEMS_PER_THREAD], T &block_suffix) |
The thread block rotates its blocked arrangement of input items, shifting it up by one item. All threads receive the input provided by threadBLOCK_THREADS-1. | |
| template<int ITEMS_PER_THREAD> | |
| __device__ __forceinline__ void | Down (T(&input)[ITEMS_PER_THREAD], T(&prev)[ITEMS_PER_THREAD]) |
The thread block rotates its blocked arrangement of input items, shifting it down by one item. | |
| template<int ITEMS_PER_THREAD> | |
| __device__ __forceinline__ void | Down (T(&input)[ITEMS_PER_THREAD], T(&prev)[ITEMS_PER_THREAD], T &block_prefix) |
The thread block rotates its blocked arrangement of input items, shifting it down by one item. All threads receive input[0] provided by thread0. | |
Private Member Functions | |
| __device__ __forceinline__ _TempStorage & | PrivateStorage () |
| Internal storage allocator. | |
Private Attributes | |
| _TempStorage & | temp_storage |
| Shared storage reference. | |
| unsigned int | linear_tid |
| Linear thread-id. | |
|
private |
Definition at line 78 of file block_shuffle.cuh.
|
inline |
Collective constructor using a private static allocation of shared memory as temporary storage.
Definition at line 140 of file block_shuffle.cuh.
|
inline |
Collective constructor using the specified memory allocation as temporary storage.
| [in] | temp_storage | Reference to memory allocation having layout type TempStorage |
Definition at line 150 of file block_shuffle.cuh.
|
inline |
The thread block rotates its blocked arrangement of input items, shifting it down by one item.
| [in] | input | The calling thread's input items |
| [out] | prev | The corresponding predecessor items (may be aliased to input). The value prev[0] is not updated for threadBLOCK_THREADS-1. |
Definition at line 263 of file block_shuffle.cuh.
|
inline |
The thread block rotates its blocked arrangement of input items, shifting it down by one item. All threads receive input[0] provided by thread0.
| [in] | input | The calling thread's input items |
| [out] | prev | The corresponding predecessor items (may be aliased to input). The value prev[0] is not updated for threadBLOCK_THREADS-1. |
| [out] | block_prefix | The item input[0] from thread0, provided to all threads |
Definition at line 289 of file block_shuffle.cuh.
|
inline |
Each threadi obtains the input provided by threadi+distance. The offset distance may be negative.
| [in] | input | The input item from the calling thread (threadi) |
| [out] | output | The input item from the successor (or predecessor) thread threadi+distance (may be aliased to input). This value is only updated for for threadi when 0 <= (i + distance) < BLOCK_THREADS-1 |
| [in] | distance | Offset distance (may be negative) |
Definition at line 171 of file block_shuffle.cuh.
|
inlineprivate |
Internal storage allocator.
Definition at line 123 of file block_shuffle.cuh.
|
inline |
Each threadi obtains the input provided by threadi+distance.
| [in] | input | The calling thread's input item |
| [out] | output | The input item from thread thread(i+distance>)%<BLOCK_THREADS> (may be aliased to input). This value is not updated for threadBLOCK_THREADS-1 |
| [in] | distance | Offset distance (0 < distance < BLOCK_THREADS) |
Definition at line 191 of file block_shuffle.cuh.
|
inline |
The thread block rotates its blocked arrangement of input items, shifting it up by one item.
| [in] | input | The calling thread's input items |
| [out] | prev | The corresponding predecessor items (may be aliased to input). The item prev[0] is not updated for thread0. |
Definition at line 217 of file block_shuffle.cuh.
|
inline |
The thread block rotates its blocked arrangement of input items, shifting it up by one item. All threads receive the input provided by threadBLOCK_THREADS-1.
| [in] | input | The calling thread's input items |
| [out] | prev | The corresponding predecessor items (may be aliased to input). The item prev[0] is not updated for thread0. |
| [out] | block_suffix | The item input[ITEMS_PER_THREAD-1] from threadBLOCK_THREADS-1, provided to all threads |
Definition at line 244 of file block_shuffle.cuh.
|
private |
Linear thread-id.
Definition at line 115 of file block_shuffle.cuh.
|
private |
Shared storage reference.
Definition at line 112 of file block_shuffle.cuh.