The BlockExchange class provides collective methods for rearranging data partitioned across a CUDA thread block. More...

Detailed Description

template<typename InputT, int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
class cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >

The BlockExchange class provides collective methods for rearranging data partitioned across a CUDA thread block.

Template Parameters

T	The data type to be exchanged.
BLOCK_DIM_X	The thread block length in threads along the X dimension
ITEMS_PER_THREAD	The number of items partitioned onto each thread.
WARP_TIME_SLICING	[optional] When `true`, only use enough shared memory for a single warp's worth of tile data, time-slicing the block-wide exchange over multiple synchronized rounds. Yields a smaller memory footprint at the expense of decreased parallelism. (Default: false)
BLOCK_DIM_Y	[optional] The thread block length in threads along the Y dimension (default: 1)
BLOCK_DIM_Z	[optional] The thread block length in threads along the Z dimension (default: 1)
PTX_ARCH	[optional] \ptxversion

Overview

It is commonplace for blocks of threads to rearrange data items between threads. For example, the device-accessible memory subsystem prefers access patterns where data items are "striped" across threads (where consecutive threads access consecutive items), yet most block-wide operations prefer a "blocked" partitioning of items across threads (where consecutive items belong to a single thread).
BlockExchange supports the following types of data exchanges:
- Transposing between blocked and striped arrangements
- Transposing between blocked and warp-striped arrangements
- Scattering ranked items to a blocked arrangement
- Scattering ranked items to a striped arrangement
\rowmajor

A Simple Example: \blockcollective{BlockExchange}

: The code snippet below illustrates the conversion from a "blocked" to a "striped" arrangement of 512 integer items partitioned across 128 threads where each thread owns 4 items.

: #include <cub/cub.cuh> // or equivalently <cub/block/block_exchange.cuh>

__global__ void ExampleKernel(int *d_data, ...)

{

// Specialize BlockExchange for a 1D block of 128 threads owning 4 integer items each

typedef cub::BlockExchange<int, 128, 4> BlockExchange;

// Allocate shared memory for BlockExchange

__shared__ typename BlockExchange::TempStorage temp_storage;

// Load a tile of data striped across threads

int thread_data[4];

cub::LoadDirectStriped<128>(threadIdx.x, d_data, thread_data);

// Collectively exchange data into a blocked arrangement across threads

BlockExchange(temp_storage).StripedToBlocked(thread_data);

cub::BlockExchange
The BlockExchange class provides collective methods for rearranging data partitioned across a CUDA th...
Definition block_exchange.cuh:117

cub::BlockExchange::BlockExchange
__device__ __forceinline__ BlockExchange()
Collective constructor using a private static allocation of shared memory as temporary storage.
Definition block_exchange.cuh:713

cub::BlockExchange::temp_storage
_TempStorage & temp_storage
Shared storage reference.
Definition block_exchange.cuh:175

cub::BlockExchange::TempStorage
\smemstorage{BlockExchange}
Definition block_exchange.cuh:165

: Suppose the set of striped input thread_data across the block of threads is { [0,128,256,384], [1,129,257,385], ..., [127,255,383,511] }. The corresponding output thread_data in those threads will be { [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }.

Performance Considerations

Proper device-specific padding ensures zero bank conflicts for most types.

Definition at line 116 of file block_exchange.cuh.

Data Structures
struct	TempStorage
	\smemstorage{BlockExchange} More...

Public Member Functions
Collective constructors
__device__ __forceinline__	BlockExchange ()
	Collective constructor using a private static allocation of shared memory as temporary storage.

__device__ __forceinline__	BlockExchange (TempStorage &temp_storage)
	Collective constructor using the specified memory allocation as temporary storage.

Structured exchanges
template<typename OutputT >
__device__ __forceinline__ void	StripedToBlocked (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD])
	Transposes data items from striped arrangement to blocked arrangement.

template<typename OutputT >
__device__ __forceinline__ void	BlockedToStriped (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD])
	Transposes data items from blocked arrangement to striped arrangement.

template<typename OutputT >
__device__ __forceinline__ void	WarpStripedToBlocked (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD])
	Transposes data items from warp-striped arrangement to blocked arrangement.

template<typename OutputT >
__device__ __forceinline__ void	BlockedToWarpStriped (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD])
	Transposes data items from blocked arrangement to warp-striped arrangement.

Scatter exchanges
template<typename OutputT , typename OffsetT >
__device__ __forceinline__ void	ScatterToBlocked (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD])
	Exchanges data items annotated by rank into blocked arrangement.

template<typename OutputT , typename OffsetT >
__device__ __forceinline__ void	ScatterToStriped (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD])
	Exchanges data items annotated by rank into striped arrangement.

template<typename OutputT , typename OffsetT >
__device__ __forceinline__ void	ScatterToStripedGuarded (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD])
	Exchanges data items annotated by rank into striped arrangement. Items with rank -1 are not exchanged.

template<typename OutputT , typename OffsetT , typename ValidFlag >
__device__ __forceinline__ void	ScatterToStripedFlagged (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD], ValidFlag is_valid[ITEMS_PER_THREAD])
	Exchanges valid data items annotated by rank into striped arrangement.

__device__ __forceinline__ void	StripedToBlocked (InputT items[ITEMS_PER_THREAD])

__device__ __forceinline__ void	BlockedToStriped (InputT items[ITEMS_PER_THREAD])

__device__ __forceinline__ void	WarpStripedToBlocked (InputT items[ITEMS_PER_THREAD])

__device__ __forceinline__ void	BlockedToWarpStriped (InputT items[ITEMS_PER_THREAD])

template<typename OffsetT >
__device__ __forceinline__ void	ScatterToBlocked (InputT items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD])

template<typename OffsetT >
__device__ __forceinline__ void	ScatterToStriped (InputT items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD])

template<typename OffsetT >
__device__ __forceinline__ void	ScatterToStripedGuarded (InputT items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD])

template<typename OffsetT , typename ValidFlag >
__device__ __forceinline__ void	ScatterToStripedFlagged (InputT items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD], ValidFlag is_valid[ITEMS_PER_THREAD])

Private Types
enum	{ BLOCK_THREADS = BLOCK_DIM_X * BLOCK_DIM_Y * BLOCK_DIM_Z , LOG_WARP_THREADS = CUB_LOG_WARP_THREADS(PTX_ARCH) , WARP_THREADS = 1 << LOG_WARP_THREADS , WARPS = (BLOCK_THREADS + WARP_THREADS - 1) / WARP_THREADS , LOG_SMEM_BANKS = CUB_LOG_SMEM_BANKS(PTX_ARCH) , SMEM_BANKS = 1 << LOG_SMEM_BANKS , TILE_ITEMS = BLOCK_THREADS * ITEMS_PER_THREAD , TIME_SLICES = (WARP_TIME_SLICING) ? WARPS : 1 , TIME_SLICED_THREADS = (WARP_TIME_SLICING) ? CUB_MIN(BLOCK_THREADS, WARP_THREADS) : BLOCK_THREADS , TIME_SLICED_ITEMS = TIME_SLICED_THREADS * ITEMS_PER_THREAD , WARP_TIME_SLICED_THREADS = CUB_MIN(BLOCK_THREADS, WARP_THREADS) , WARP_TIME_SLICED_ITEMS = WARP_TIME_SLICED_THREADS * ITEMS_PER_THREAD , INSERT_PADDING = (ITEMS_PER_THREAD > 4) && (PowerOfTwo<ITEMS_PER_THREAD>::VALUE) , PADDING_ITEMS = (INSERT_PADDING) ? (TIME_SLICED_ITEMS >> LOG_SMEM_BANKS) : 0 }
	Constants. More...

Private Member Functions
struct	__align__ (16) _TempStorage
	Shared memory storage layout type.

__device__ __forceinline__ _TempStorage &	PrivateStorage ()
	Internal storage allocator.

template<typename OutputT >
__device__ __forceinline__ void	BlockedToStriped (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], Int2Type< false >)

template<typename OutputT >
__device__ __forceinline__ void	BlockedToStriped (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], Int2Type< true >)

template<typename OutputT >
__device__ __forceinline__ void	BlockedToWarpStriped (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], Int2Type< false >)

template<typename OutputT >
__device__ __forceinline__ void	BlockedToWarpStriped (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], Int2Type< true >)

template<typename OutputT >
__device__ __forceinline__ void	StripedToBlocked (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], Int2Type< false >)

template<typename OutputT >
__device__ __forceinline__ void	StripedToBlocked (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], Int2Type< true >)

template<typename OutputT >
__device__ __forceinline__ void	WarpStripedToBlocked (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], Int2Type< false >)

template<typename OutputT >
__device__ __forceinline__ void	WarpStripedToBlocked (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], Int2Type< true >)

template<typename OutputT , typename OffsetT >
__device__ __forceinline__ void	ScatterToBlocked (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD], Int2Type< false >)

template<typename OutputT , typename OffsetT >
__device__ __forceinline__ void	ScatterToBlocked (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD], Int2Type< true >)

template<typename OutputT , typename OffsetT >
__device__ __forceinline__ void	ScatterToStriped (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD], Int2Type< false >)

template<typename OutputT , typename OffsetT >
__device__ __forceinline__ void	ScatterToStriped (InputT input_items[ITEMS_PER_THREAD], OutputT output_items[ITEMS_PER_THREAD], OffsetT ranks[ITEMS_PER_THREAD], Int2Type< true >)

Private Attributes
_TempStorage &	temp_storage
	Shared storage reference.

unsigned int	linear_tid
	Linear thread-id.

unsigned int	lane_id

unsigned int	warp_id

unsigned int	warp_offset

Member Enumeration Documentation

◆ anonymous enum

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

anonymous enum

private

Constants.

Enumerator
BLOCK_THREADS	The thread block size in threads.

Definition at line 125 of file block_exchange.cuh.

Constructor & Destructor Documentation

◆ BlockExchange() [1/2]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

__device__ __forceinline__ cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockExchange ( )

inline

Collective constructor using a private static allocation of shared memory as temporary storage.

Definition at line 713 of file block_exchange.cuh.

◆ BlockExchange() [2/2]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

__device__ __forceinline__ cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockExchange ( TempStorage & temp_storage )

inline

Collective constructor using the specified memory allocation as temporary storage.

Parameters

[in] temp_storage Reference to memory allocation having layout type TempStorage

Definition at line 726 of file block_exchange.cuh.

Member Function Documentation

◆ align()

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

struct cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::__align__ ( 16 )

inlineprivate

Shared memory storage layout type.

Definition at line 1247 of file block_exchange.cuh.

◆ BlockedToStriped() [1/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockedToStriped	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD]
	)

inline

Transposes data items from blocked arrangement to striped arrangement.

\smemreuse

Snippet: The code snippet below illustrates the conversion from a "blocked" to a "striped" arrangement of 512 integer items partitioned across 128 threads where each thread owns 4 items.

: #include <cub/cub.cuh> // or equivalently <cub/block/block_exchange.cuh>

__global__ void ExampleKernel(int *d_data, ...)

{

// Specialize BlockExchange for a 1D block of 128 threads owning 4 integer items each

typedef cub::BlockExchange<int, 128, 4> BlockExchange;

// Allocate shared memory for BlockExchange

__shared__ typename BlockExchange::TempStorage temp_storage;

// Obtain a segment of consecutive items that are blocked across threads

int thread_data[4];

...

// Collectively exchange data into a striped arrangement across threads

BlockExchange(temp_storage).BlockedToStriped(thread_data, thread_data);

// Store data striped across block threads into an ordered tile

cub::StoreDirectStriped<STORE_DEFAULT, 128>(threadIdx.x, d_data, thread_data);

: Suppose the set of blocked input thread_data across the block of threads is { [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }. The corresponding output thread_data in those threads will be { [0,128,256,384], [1,129,257,385], ..., [127,255,383,511] } in preparation for storing to device-accessible memory.

Parameters

[in]	input_items	Items to exchange, converting between striped and blocked arrangements.
[out]	output_items	Items from exchange, converting between striped and blocked arrangements.

Definition at line 829 of file block_exchange.cuh.

◆ BlockedToStriped() [2/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockedToStriped	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		Int2Type< false >
	)

inlineprivate

Transposes data items from blocked arrangement to striped arrangement. Specialized for no timeslicing.

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.

Definition at line 200 of file block_exchange.cuh.

◆ BlockedToStriped() [3/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockedToStriped	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		Int2Type< true >
	)

inlineprivate

Transposes data items from blocked arrangement to striped arrangement. Specialized for warp-timeslicing.

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.

Definition at line 229 of file block_exchange.cuh.

◆ BlockedToStriped() [4/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockedToStriped ( InputT items[ITEMS_PER_THREAD] )

inline

Parameters

items [in-out] Items to exchange, converting between striped and blocked arrangements.

Definition at line 1069 of file block_exchange.cuh.

◆ BlockedToWarpStriped() [1/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockedToWarpStriped	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD]
	)

inline

Transposes data items from blocked arrangement to warp-striped arrangement.

\smemreuse

Snippet: The code snippet below illustrates the conversion from a "blocked" to a "warp-striped" arrangement of 512 integer items partitioned across 128 threads where each thread owns 4 items.

: #include <cub/cub.cuh> // or equivalently <cub/block/block_exchange.cuh>

__global__ void ExampleKernel(int *d_data, ...)

{

// Specialize BlockExchange for a 1D block of 128 threads owning 4 integer items each

typedef cub::BlockExchange<int, 128, 4> BlockExchange;

// Allocate shared memory for BlockExchange

__shared__ typename BlockExchange::TempStorage temp_storage;

// Obtain a segment of consecutive items that are blocked across threads

int thread_data[4];

...

// Collectively exchange data into a warp-striped arrangement across threads

BlockExchange(temp_storage).BlockedToWarpStriped(thread_data, thread_data);

// Store data striped across warp threads into an ordered tile

cub::StoreDirectStriped<STORE_DEFAULT, 128>(threadIdx.x, d_data, thread_data);

: Suppose the set of blocked input thread_data across the block of threads is { [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }. The corresponding output thread_data in those threads will be { [0,32,64,96], [1,33,65,97], [2,34,66,98], ..., [415,447,479,511] } in preparation for storing to device-accessible memory. (The first 128 items are striped across the first warp of 32 threads, the second 128 items are striped across the second warp, etc.)

Parameters

[in]	input_items	Items to exchange, converting between striped and blocked arrangements.
[out]	output_items	Items from exchange, converting between striped and blocked arrangements.

Definition at line 928 of file block_exchange.cuh.

◆ BlockedToWarpStriped() [2/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockedToWarpStriped	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		Int2Type< false >
	)

inlineprivate

Transposes data items from blocked arrangement to warp-striped arrangement. Specialized for no timeslicing

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.

Definition at line 289 of file block_exchange.cuh.

◆ BlockedToWarpStriped() [3/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockedToWarpStriped	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		Int2Type< true >
	)

inlineprivate

Transposes data items from blocked arrangement to warp-striped arrangement. Specialized for warp-timeslicing

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.

Definition at line 317 of file block_exchange.cuh.

◆ BlockedToWarpStriped() [4/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockedToWarpStriped ( InputT items[ITEMS_PER_THREAD] )

inline

Parameters

items [in-out] Items to exchange, converting between striped and blocked arrangements.

Definition at line 1081 of file block_exchange.cuh.

◆ PrivateStorage()

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

__device__ __forceinline__ _TempStorage & cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::PrivateStorage ( )

inlineprivate

Internal storage allocator.

Definition at line 189 of file block_exchange.cuh.

◆ ScatterToBlocked() [1/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT , typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToBlocked	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD]
	)

inline

Exchanges data items annotated by rank into blocked arrangement.

\smemreuse

Template Parameters

OffsetT [inferred] Signed integer type for local offsets

Parameters

[in]	input_items	Items to exchange, converting between striped and blocked arrangements.
[out]	output_items	Items from exchange, converting between striped and blocked arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 953 of file block_exchange.cuh.

◆ ScatterToBlocked() [2/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT , typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToBlocked	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD],
		Int2Type< false >
	)

inlineprivate

Exchanges data items annotated by rank into blocked arrangement. Specialized for no timeslicing.

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 534 of file block_exchange.cuh.

◆ ScatterToBlocked() [3/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT , typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToBlocked	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD],
		Int2Type< true >
	)

inlineprivate

Exchanges data items annotated by rank into blocked arrangement. Specialized for warp-timeslicing.

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 563 of file block_exchange.cuh.

◆ ScatterToBlocked() [4/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToBlocked	(	InputT	items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD]
	)

inline

Parameters

	items	[in-out] Items to exchange, converting between striped and blocked arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 1088 of file block_exchange.cuh.

◆ ScatterToStriped() [1/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT , typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToStriped	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD]
	)

inline

Exchanges data items annotated by rank into striped arrangement.

\smemreuse

Template Parameters

OffsetT [inferred] Signed integer type for local offsets

Parameters

[in]	input_items	Items to exchange, converting between striped and blocked arrangements.
[out]	output_items	Items from exchange, converting between striped and blocked arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 972 of file block_exchange.cuh.

◆ ScatterToStriped() [2/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT , typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToStriped	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD],
		Int2Type< false >
	)

inlineprivate

Exchanges data items annotated by rank into striped arrangement. Specialized for no timeslicing.

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 616 of file block_exchange.cuh.

◆ ScatterToStriped() [3/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT , typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToStriped	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD],
		Int2Type< true >
	)

inlineprivate

Exchanges data items annotated by rank into striped arrangement. Specialized for warp-timeslicing.

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 646 of file block_exchange.cuh.

◆ ScatterToStriped() [4/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToStriped	(	InputT	items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD]
	)

inline

Parameters

	items	[in-out] Items to exchange, converting between striped and blocked arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 1096 of file block_exchange.cuh.

◆ ScatterToStripedFlagged() [1/2]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT , typename OffsetT , typename ValidFlag >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToStripedFlagged	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD],
		ValidFlag	is_valid[ITEMS_PER_THREAD]
	)

inline

Exchanges valid data items annotated by rank into striped arrangement.

\smemreuse

Template Parameters

OffsetT	[inferred] Signed integer type for local offsets
ValidFlag	[inferred] FlagT type denoting which items are valid

Parameters

[in]	input_items	Items to exchange, converting between striped and blocked arrangements.
[out]	output_items	Items from exchange, converting between striped and blocked arrangements.
[in]	ranks	Corresponding scatter ranks
[in]	is_valid	Corresponding flag denoting item validity

Definition at line 1029 of file block_exchange.cuh.

◆ ScatterToStripedFlagged() [2/2]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OffsetT , typename ValidFlag >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToStripedFlagged	(	InputT	items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD],
		ValidFlag	is_valid[ITEMS_PER_THREAD]
	)

inline

Parameters

	items	[in-out] Items to exchange, converting between striped and blocked arrangements.
[in]	ranks	Corresponding scatter ranks
[in]	is_valid	Corresponding flag denoting item validity

Definition at line 1112 of file block_exchange.cuh.

◆ ScatterToStripedGuarded() [1/2]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT , typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToStripedGuarded	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD]
	)

inline

Exchanges data items annotated by rank into striped arrangement. Items with rank -1 are not exchanged.

\smemreuse

Template Parameters

OffsetT [inferred] Signed integer type for local offsets

Parameters

[in]	input_items	Items to exchange, converting between striped and blocked arrangements.
[out]	output_items	Items from exchange, converting between striped and blocked arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 991 of file block_exchange.cuh.

◆ ScatterToStripedGuarded() [2/2]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OffsetT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::ScatterToStripedGuarded	(	InputT	items[ITEMS_PER_THREAD],
		OffsetT	ranks[ITEMS_PER_THREAD]
	)

inline

Parameters

	items	[in-out] Items to exchange, converting between striped and blocked arrangements.
[in]	ranks	Corresponding scatter ranks

Definition at line 1104 of file block_exchange.cuh.

◆ StripedToBlocked() [1/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StripedToBlocked	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD]
	)

inline

Transposes data items from striped arrangement to blocked arrangement.

\smemreuse

Snippet: The code snippet below illustrates the conversion from a "striped" to a "blocked" arrangement of 512 integer items partitioned across 128 threads where each thread owns 4 items.

: #include <cub/cub.cuh> // or equivalently <cub/block/block_exchange.cuh>

__global__ void ExampleKernel(int *d_data, ...)

{

// Specialize BlockExchange for a 1D block of 128 threads owning 4 integer items each

typedef cub::BlockExchange<int, 128, 4> BlockExchange;

// Allocate shared memory for BlockExchange

__shared__ typename BlockExchange::TempStorage temp_storage;

// Load a tile of ordered data into a striped arrangement across block threads

int thread_data[4];

cub::LoadDirectStriped<128>(threadIdx.x, d_data, thread_data);

// Collectively exchange data into a blocked arrangement across threads

BlockExchange(temp_storage).StripedToBlocked(thread_data, thread_data);

: Suppose the set of striped input thread_data across the block of threads is { [0,128,256,384], [1,129,257,385], ..., [127,255,383,511] } after loading from device-accessible memory. The corresponding output thread_data in those threads will be { [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }.

Parameters

[in]	input_items	Items to exchange, converting between striped and blocked arrangements.
[out]	output_items	Items from exchange, converting between striped and blocked arrangements.

Definition at line 780 of file block_exchange.cuh.

◆ StripedToBlocked() [2/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StripedToBlocked	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		Int2Type< false >
	)

inlineprivate

Transposes data items from striped arrangement to blocked arrangement. Specialized for no timeslicing.

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.

Definition at line 376 of file block_exchange.cuh.

◆ StripedToBlocked() [3/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StripedToBlocked	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		Int2Type< true >
	)

inlineprivate

Transposes data items from striped arrangement to blocked arrangement. Specialized for warp-timeslicing.

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.

Definition at line 406 of file block_exchange.cuh.

◆ StripedToBlocked() [4/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StripedToBlocked ( InputT items[ITEMS_PER_THREAD] )

inline

Parameters

items [in-out] Items to exchange, converting between striped and blocked arrangements.

Definition at line 1063 of file block_exchange.cuh.

◆ WarpStripedToBlocked() [1/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::WarpStripedToBlocked	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD]
	)

inline

Transposes data items from warp-striped arrangement to blocked arrangement.

\smemreuse

Snippet: The code snippet below illustrates the conversion from a "warp-striped" to a "blocked" arrangement of 512 integer items partitioned across 128 threads where each thread owns 4 items.

: #include <cub/cub.cuh> // or equivalently <cub/block/block_exchange.cuh>

__global__ void ExampleKernel(int *d_data, ...)

{

// Specialize BlockExchange for a 1D block of 128 threads owning 4 integer items each

typedef cub::BlockExchange<int, 128, 4> BlockExchange;

// Allocate shared memory for BlockExchange

__shared__ typename BlockExchange::TempStorage temp_storage;

// Load a tile of ordered data into a warp-striped arrangement across warp threads

int thread_data[4];

cub::LoadSWarptriped<LOAD_DEFAULT>(threadIdx.x, d_data, thread_data);

// Collectively exchange data into a blocked arrangement across threads

BlockExchange(temp_storage).WarpStripedToBlocked(thread_data);

: Suppose the set of warp-striped input thread_data across the block of threads is { [0,32,64,96], [1,33,65,97], [2,34,66,98], ..., [415,447,479,511] } after loading from device-accessible memory. (The first 128 items are striped across the first warp of 32 threads, the second 128 items are striped across the second warp, etc.) The corresponding output thread_data in those threads will be { [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }.

Parameters

[in]	input_items	Items to exchange, converting between striped and blocked arrangements.
[out]	output_items	Items from exchange, converting between striped and blocked arrangements.

Definition at line 877 of file block_exchange.cuh.

◆ WarpStripedToBlocked() [2/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::WarpStripedToBlocked	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		Int2Type< false >
	)

inlineprivate

Transposes data items from warp-striped arrangement to blocked arrangement. Specialized for no timeslicing

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.

Definition at line 467 of file block_exchange.cuh.

◆ WarpStripedToBlocked() [3/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

template<typename OutputT >

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::WarpStripedToBlocked	(	InputT	input_items[ITEMS_PER_THREAD],
		OutputT	output_items[ITEMS_PER_THREAD],
		Int2Type< true >
	)

inlineprivate

Transposes data items from warp-striped arrangement to blocked arrangement. Specialized for warp-timeslicing

Parameters

[in]	input_items	Items to exchange, converting between blocked and striped arrangements.
[out]	output_items	Items to exchange, converting between blocked and striped arrangements.

Definition at line 496 of file block_exchange.cuh.

◆ WarpStripedToBlocked() [4/4]

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

__device__ __forceinline__ void cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::WarpStripedToBlocked ( InputT items[ITEMS_PER_THREAD] )

inline

Parameters

items [in-out] Items to exchange, converting between striped and blocked arrangements.

Definition at line 1075 of file block_exchange.cuh.

Field Documentation

◆ lane_id

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

unsigned int cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::lane_id

private

Definition at line 179 of file block_exchange.cuh.

◆ linear_tid

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

unsigned int cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::linear_tid

private

Linear thread-id.

Definition at line 178 of file block_exchange.cuh.

◆ temp_storage

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

_TempStorage& cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::temp_storage

private

Shared storage reference.

Definition at line 175 of file block_exchange.cuh.

◆ warp_id

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

unsigned int cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::warp_id

private

Definition at line 180 of file block_exchange.cuh.

◆ warp_offset

template<typename InputT , int BLOCK_DIM_X, int ITEMS_PER_THREAD, bool WARP_TIME_SLICING = false, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>

unsigned int cub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::warp_offset

private

Definition at line 181 of file block_exchange.cuh.

The documentation for this class was generated from the following file:

openfpm_data/src/util/cuda/cub_old/block/block_exchange.cuh

Detailed Description

Data Structures

Public Member Functions

Private Types

Private Member Functions

Private Attributes

Member Enumeration Documentation

◆ anonymous enum

Constructor & Destructor Documentation

◆ BlockExchange() [1/2]

◆ BlockExchange() [2/2]

Member Function Documentation

◆ __align__()

◆ BlockedToStriped() [1/4]

◆ BlockedToStriped() [2/4]

◆ BlockedToStriped() [3/4]

◆ BlockedToStriped() [4/4]

◆ BlockedToWarpStriped() [1/4]

◆ BlockedToWarpStriped() [2/4]

◆ BlockedToWarpStriped() [3/4]

◆ BlockedToWarpStriped() [4/4]

◆ PrivateStorage()

◆ ScatterToBlocked() [1/4]

◆ ScatterToBlocked() [2/4]

◆ ScatterToBlocked() [3/4]

◆ ScatterToBlocked() [4/4]

◆ ScatterToStriped() [1/4]

◆ ScatterToStriped() [2/4]

◆ ScatterToStriped() [3/4]

◆ ScatterToStriped() [4/4]

◆ ScatterToStripedFlagged() [1/2]

◆ ScatterToStripedFlagged() [2/2]

◆ ScatterToStripedGuarded() [1/2]

◆ ScatterToStripedGuarded() [2/2]

◆ StripedToBlocked() [1/4]

◆ StripedToBlocked() [2/4]

◆ StripedToBlocked() [3/4]

◆ StripedToBlocked() [4/4]

◆ WarpStripedToBlocked() [1/4]

◆ WarpStripedToBlocked() [2/4]

◆ WarpStripedToBlocked() [3/4]

◆ WarpStripedToBlocked() [4/4]

Field Documentation

◆ lane_id

◆ linear_tid

◆ temp_storage

◆ warp_id

◆ warp_offset

◆ align()