The BlockStore class provides collective data movement methods for writing a blocked arrangement of items partitioned across a CUDA thread block to a linear segment of memory. More...
The BlockStore class provides collective data movement methods for writing a blocked arrangement of items partitioned across a CUDA thread block to a linear segment of memory.

| T | The type of data to be written. |
| BLOCK_DIM_X | The thread block length in threads along the X dimension |
| ITEMS_PER_THREAD | The number of consecutive items partitioned onto each thread. |
| ALGORITHM | [optional] cub::BlockStoreAlgorithm tuning policy enumeration. default: cub::BLOCK_STORE_DIRECT. |
| WARP_TIME_SLICING | [optional] Whether or not only one warp's worth of shared memory should be allocated and time-sliced among block-warps during any load-related data transpositions (versus each warp having its own storage). (default: false) |
| BLOCK_DIM_Y | [optional] The thread block length in threads along the Y dimension (default: 1) |
| BLOCK_DIM_Z | [optional] The thread block length in threads along the Z dimension (default: 1) |
| PTX_ARCH | [optional] \ptxversion |
BLOCK_STORE_WARP_TRANSPOSE, meaning items are locally reordered among threads so that memory references will be efficiently coalesced using a warp-striped access pattern. thread_data across the block of threads is { [0,1,2,3], [4,5,6,7], ..., [508,509,510,511] }. The output d_data will be 0, 1, 2, 3, 4, 5, .... Definition at line 515 of file block_store.cuh.
Data Structures | |
| struct | StoreInternal |
| Store helper. More... | |
| struct | StoreInternal< BLOCK_STORE_DIRECT, DUMMY > |
| struct | StoreInternal< BLOCK_STORE_TRANSPOSE, DUMMY > |
| struct | StoreInternal< BLOCK_STORE_VECTORIZE, DUMMY > |
| struct | StoreInternal< BLOCK_STORE_WARP_TRANSPOSE, DUMMY > |
| struct | StoreInternal< BLOCK_STORE_WARP_TRANSPOSE_TIMESLICED, DUMMY > |
| struct | TempStorage |
| \smemstorage{BlockStore} More... | |
Public Member Functions | |
Collective constructors | |
| __device__ __forceinline__ | BlockStore () |
| Collective constructor using a private static allocation of shared memory as temporary storage. | |
| __device__ __forceinline__ | BlockStore (TempStorage &temp_storage) |
| Collective constructor using the specified memory allocation as temporary storage. | |
Data movement | |
| template<typename OutputIteratorT > | |
| __device__ __forceinline__ void | Store (OutputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD]) |
| Store items into a linear segment of memory. | |
| template<typename OutputIteratorT > | |
| __device__ __forceinline__ void | Store (OutputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items) |
| Store items into a linear segment of memory, guarded by range. | |
Private Types | |
| enum | { BLOCK_THREADS = BLOCK_DIM_X * BLOCK_DIM_Y * BLOCK_DIM_Z } |
| Constants. More... | |
| typedef StoreInternal< ALGORITHM, 0 > | InternalStore |
| Internal load implementation to use. | |
| typedef InternalStore::TempStorage | _TempStorage |
| Shared memory storage layout type. | |
Private Member Functions | |
| __device__ __forceinline__ _TempStorage & | PrivateStorage () |
| Internal storage allocator. | |
Private Attributes | |
| _TempStorage & | temp_storage |
| Thread reference to shared storage. | |
| int | linear_tid |
| Linear thread-id. | |