OpenFPM_pdata  4.1.0
Project that contain the implementation of distributed structures
 
Loading...
Searching...
No Matches
cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > Class Template Reference

The BlockDiscontinuity class provides collective methods for flagging discontinuities within an ordered set of items partitioned across a CUDA thread block. More...

Detailed Description

template<typename T, int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
class cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >

The BlockDiscontinuity class provides collective methods for flagging discontinuities within an ordered set of items partitioned across a CUDA thread block.

Template Parameters
TThe data type to be flagged.
BLOCK_DIM_XThe thread block length in threads along the X dimension
BLOCK_DIM_Y[optional] The thread block length in threads along the Y dimension (default: 1)
BLOCK_DIM_Z[optional] The thread block length in threads along the Z dimension (default: 1)
PTX_ARCH[optional] \ptxversion
Overview
  • A set of "head flags" (or "tail flags") is often used to indicate corresponding items that differ from their predecessors (or successors). For example, head flags are convenient for demarcating disjoint data segments as part of a segmented scan or reduction.
  • \blocked
Performance Considerations
  • \granularity
A Simple Example
\blockcollective{BlockDiscontinuity}
The code snippet below illustrates the head flagging of 512 integer items that are partitioned in a blocked arrangement across 128 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_discontinuity.cuh>
__global__ void ExampleKernel(...)
{
// Specialize BlockDiscontinuity for a 1D block of 128 threads on type int
// Allocate shared memory for BlockDiscontinuity
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Collectively compute head flags for discontinuities in the segment
int head_flags[4];
BlockDiscontinuity(temp_storage).FlagHeads(head_flags, thread_data, cub::Inequality());
The BlockDiscontinuity class provides collective methods for flagging discontinuities within an order...
_TempStorage & temp_storage
Shared storage reference.
__device__ __forceinline__ BlockDiscontinuity()
Collective constructor using a private static allocation of shared memory as temporary storage.
\smemstorage{BlockDiscontinuity}
Default inequality functor.
Suppose the set of input thread_data across the block of threads is { [0,0,1,1], [1,1,1,1], [2,3,3,3], [3,4,4,4], ... }. The corresponding output head_flags in those threads will be { [1,0,1,0], [0,0,0,0], [1,1,0,0], [0,1,0,0], ... }.
Performance Considerations
  • Incurs zero bank conflicts for most types

Definition at line 108 of file block_discontinuity.cuh.

Data Structures

struct  _TempStorage
 Shared memory storage layout type (last element from each thread's input) More...
 
struct  ApplyOp
 Specialization for when FlagOp has third index param. More...
 
struct  ApplyOp< FlagOp, false >
 Specialization for when FlagOp does not have a third index param. More...
 
struct  Iterate
 Templated unrolling of item comparison (inductive case) More...
 
struct  Iterate< MAX_ITERATIONS, MAX_ITERATIONS >
 Templated unrolling of item comparison (termination case) More...
 
struct  TempStorage
 \smemstorage{BlockDiscontinuity} More...
 

Public Member Functions

Collective constructors
__device__ __forceinline__ BlockDiscontinuity ()
 Collective constructor using a private static allocation of shared memory as temporary storage.
 
__device__ __forceinline__ BlockDiscontinuity (TempStorage &temp_storage)
 Collective constructor using the specified memory allocation as temporary storage.
 
Head flag operations
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeads (FlagT(&head_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], T(&preds)[ITEMS_PER_THREAD], FlagOp flag_op)
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeads (FlagT(&head_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], T(&preds)[ITEMS_PER_THREAD], FlagOp flag_op, T tile_predecessor_item)
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeads (FlagT(&head_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 Sets head flags indicating discontinuities between items partitioned across the thread block, for which the first item has no reference and is always flagged.
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeads (FlagT(&head_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op, T tile_predecessor_item)
 Sets head flags indicating discontinuities between items partitioned across the thread block.
 
Tail flag operations
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagTails (FlagT(&tail_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 Sets tail flags indicating discontinuities between items partitioned across the thread block, for which the last item has no reference and is always flagged.
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagTails (FlagT(&tail_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op, T tile_successor_item)
 Sets tail flags indicating discontinuities between items partitioned across the thread block.
 
Head & tail flag operations
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeadsAndTails (FlagT(&head_flags)[ITEMS_PER_THREAD], FlagT(&tail_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 Sets both head and tail flags indicating discontinuities between items partitioned across the thread block.
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeadsAndTails (FlagT(&head_flags)[ITEMS_PER_THREAD], FlagT(&tail_flags)[ITEMS_PER_THREAD], T tile_successor_item, T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 Sets both head and tail flags indicating discontinuities between items partitioned across the thread block.
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeadsAndTails (FlagT(&head_flags)[ITEMS_PER_THREAD], T tile_predecessor_item, FlagT(&tail_flags)[ITEMS_PER_THREAD], T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 Sets both head and tail flags indicating discontinuities between items partitioned across the thread block.
 
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void FlagHeadsAndTails (FlagT(&head_flags)[ITEMS_PER_THREAD], T tile_predecessor_item, FlagT(&tail_flags)[ITEMS_PER_THREAD], T tile_successor_item, T(&input)[ITEMS_PER_THREAD], FlagOp flag_op)
 Sets both head and tail flags indicating discontinuities between items partitioned across the thread block.
 

Private Types

enum  { BLOCK_THREADS = BLOCK_DIM_X * BLOCK_DIM_Y * BLOCK_DIM_Z }
 Constants. More...
 

Private Member Functions

__device__ __forceinline__ _TempStoragePrivateStorage ()
 Internal storage allocator.
 

Private Attributes

_TempStoragetemp_storage
 Shared storage reference.
 
unsigned int linear_tid
 Linear thread-id.
 

Member Enumeration Documentation

◆ anonymous enum

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
anonymous enum
private

Constants.

Enumerator
BLOCK_THREADS 

The thread block size in threads.

Definition at line 117 of file block_discontinuity.cuh.

Constructor & Destructor Documentation

◆ BlockDiscontinuity() [1/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockDiscontinuity ( )
inline

Collective constructor using a private static allocation of shared memory as temporary storage.

Definition at line 271 of file block_discontinuity.cuh.

◆ BlockDiscontinuity() [2/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::BlockDiscontinuity ( TempStorage temp_storage)
inline

Collective constructor using the specified memory allocation as temporary storage.

Parameters
[in]temp_storageReference to memory allocation having layout type TempStorage

Definition at line 281 of file block_discontinuity.cuh.

Member Function Documentation

◆ FlagHeads() [1/4]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeads ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline

Sets head flags indicating discontinuities between items partitioned across the thread block, for which the first item has no reference and is always flagged.

  • The flag head_flagsi is set for item inputi when flag_op(previous-item, inputi) returns true (where previous-item is either the preceding item in the same thread or the last item in the previous thread).
  • For thread0, item input0 is always flagged.
  • \blocked
  • \granularity
  • \smemreuse
Snippet
The code snippet below illustrates the head-flagging of 512 integer items that are partitioned in a blocked arrangement across 128 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_discontinuity.cuh>
__global__ void ExampleKernel(...)
{
// Specialize BlockDiscontinuity for a 1D block of 128 threads on type int
// Allocate shared memory for BlockDiscontinuity
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Collectively compute head flags for discontinuities in the segment
int head_flags[4];
BlockDiscontinuity(temp_storage).FlagHeads(head_flags, thread_data, cub::Inequality());
Suppose the set of input thread_data across the block of threads is { [0,0,1,1], [1,1,1,1], [2,3,3,3], [3,4,4,4], ... }. The corresponding output head_flags in those threads will be { [1,0,1,0], [0,0,0,0], [1,1,0,0], [0,1,0,0], ... }.
Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary predicate functor type having member T operator()(const T &a, const T &b) or member T operator()(const T &a, const T &b, unsigned int b_index), and returning true if a discontinuity exists between a and b, otherwise false. b_index is the rank of b in the aggregate tile of data.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate

Definition at line 411 of file block_discontinuity.cuh.

◆ FlagHeads() [2/4]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeads ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op,
tile_predecessor_item 
)
inline

Sets head flags indicating discontinuities between items partitioned across the thread block.

  • The flag head_flagsi is set for item inputi when flag_op(previous-item, inputi) returns true (where previous-item is either the preceding item in the same thread or the last item in the previous thread).
  • For thread0, item input0 is compared against tile_predecessor_item.
  • \blocked
  • \granularity
  • \smemreuse
Snippet
The code snippet below illustrates the head-flagging of 512 integer items that are partitioned in a blocked arrangement across 128 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_discontinuity.cuh>
__global__ void ExampleKernel(...)
{
// Specialize BlockDiscontinuity for a 1D block of 128 threads on type int
// Allocate shared memory for BlockDiscontinuity
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Have thread0 obtain the predecessor item for the entire tile
int tile_predecessor_item;
if (threadIdx.x == 0) tile_predecessor_item == ...
// Collectively compute head flags for discontinuities in the segment
int head_flags[4];
head_flags, thread_data, cub::Inequality(), tile_predecessor_item);
Suppose the set of input thread_data across the block of threads is { [0,0,1,1], [1,1,1,1], [2,3,3,3], [3,4,4,4], ... }, and that tile_predecessor_item is 0. The corresponding output head_flags in those threads will be { [0,0,1,0], [0,0,0,0], [1,1,0,0], [0,1,0,0], ... }.
Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary predicate functor type having member T operator()(const T &a, const T &b) or member T operator()(const T &a, const T &b, unsigned int b_index), and returning true if a discontinuity exists between a and b, otherwise false. b_index is the rank of b in the aggregate tile of data.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate
[in]tile_predecessor_item[thread0 only] Item with which to compare the first tile item (input0 from thread0).

Definition at line 480 of file block_discontinuity.cuh.

◆ FlagHeads() [3/4]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeads ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
T(&)  preds[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[in]inputCalling thread's input items
[out]predsCalling thread's predecessor items
[in]flag_opBinary boolean flag predicate

Definition at line 302 of file block_discontinuity.cuh.

◆ FlagHeads() [4/4]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeads ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
T(&)  preds[ITEMS_PER_THREAD],
FlagOp  flag_op,
tile_predecessor_item 
)
inline
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[in]inputCalling thread's input items
[out]predsCalling thread's predecessor items
[in]flag_opBinary boolean flag predicate
[in]tile_predecessor_item[thread0 only] Item with which to compare the first tile item (input0 from thread0).

Definition at line 332 of file block_discontinuity.cuh.

◆ FlagHeadsAndTails() [1/4]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeadsAndTails ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
FlagT(&)  tail_flags[ITEMS_PER_THREAD],
tile_successor_item,
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline

Sets both head and tail flags indicating discontinuities between items partitioned across the thread block.

  • The flag head_flagsi is set for item inputi when flag_op(previous-item, inputi) returns true (where previous-item is either the preceding item in the same thread or the last item in the previous thread).
  • For thread0, item input0 is always flagged.
  • The flag tail_flagsi is set for item inputi when flag_op(inputi, next-item) returns true (where next-item is either the next item in the same thread or the first item in the next thread).
  • For threadBLOCK_THREADS-1, item inputITEMS_PER_THREAD-1 is compared against tile_predecessor_item.
  • \blocked
  • \granularity
  • \smemreuse
Snippet
The code snippet below illustrates the head- and tail-flagging of 512 integer items that are partitioned in a blocked arrangement across 128 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_discontinuity.cuh>
__global__ void ExampleKernel(...)
{
// Specialize BlockDiscontinuity for a 1D block of 128 threads on type int
// Allocate shared memory for BlockDiscontinuity
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Have thread127 obtain the successor item for the entire tile
int tile_successor_item;
if (threadIdx.x == 127) tile_successor_item == ...
// Collectively compute head and flags for discontinuities in the segment
int head_flags[4];
int tail_flags[4];
head_flags, tail_flags, tile_successor_item, thread_data, cub::Inequality());
Suppose the set of input thread_data across the block of threads is { [0,0,1,1], [1,1,1,1], [2,3,3,3], ..., [124,125,125,125] } and that the tile_successor_item is 125. The corresponding output head_flags in those threads will be { [1,0,1,0], [0,0,0,0], [1,1,0,0], [0,1,0,0], ... }. and the corresponding output tail_flags in those threads will be { [0,1,0,0], [0,0,0,1], [1,0,0,...], ..., [1,0,0,0] }.
Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary predicate functor type having member T operator()(const T &a, const T &b) or member T operator()(const T &a, const T &b, unsigned int b_index), and returning true if a discontinuity exists between a and b, otherwise false. b_index is the rank of b in the aggregate tile of data.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]tile_successor_item[threadBLOCK_THREADS-1 only] Item with which to compare the last tile item (inputITEMS_PER_THREAD-1 from threadBLOCK_THREADS-1).
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate

Definition at line 851 of file block_discontinuity.cuh.

◆ FlagHeadsAndTails() [2/4]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeadsAndTails ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
FlagT(&)  tail_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline

Sets both head and tail flags indicating discontinuities between items partitioned across the thread block.

  • The flag head_flagsi is set for item inputi when flag_op(previous-item, inputi) returns true (where previous-item is either the preceding item in the same thread or the last item in the previous thread).
  • For thread0, item input0 is always flagged.
  • The flag tail_flagsi is set for item inputi when flag_op(inputi, next-item) returns true (where next-item is either the next item in the same thread or the first item in the next thread).
  • For threadBLOCK_THREADS-1, item inputITEMS_PER_THREAD-1 is always flagged.
  • \blocked
  • \granularity
  • \smemreuse
Snippet
The code snippet below illustrates the head- and tail-flagging of 512 integer items that are partitioned in a blocked arrangement across 128 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_discontinuity.cuh>
__global__ void ExampleKernel(...)
{
// Specialize BlockDiscontinuity for a 1D block of 128 threads on type int
// Allocate shared memory for BlockDiscontinuity
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Collectively compute head and flags for discontinuities in the segment
int head_flags[4];
int tail_flags[4];
head_flags, tail_flags, thread_data, cub::Inequality());
Suppose the set of input thread_data across the block of threads is { [0,0,1,1], [1,1,1,1], [2,3,3,3], ..., [124,125,125,125] } and that the tile_successor_item is 125. The corresponding output head_flags in those threads will be { [1,0,1,0], [0,0,0,0], [1,1,0,0], [0,1,0,0], ... }. and the corresponding output tail_flags in those threads will be { [0,1,0,0], [0,0,0,1], [1,0,0,...], ..., [1,0,0,1] }.
Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary predicate functor type having member T operator()(const T &a, const T &b) or member T operator()(const T &a, const T &b, unsigned int b_index), and returning true if a discontinuity exists between a and b, otherwise false. b_index is the rank of b in the aggregate tile of data.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate

Definition at line 735 of file block_discontinuity.cuh.

◆ FlagHeadsAndTails() [3/4]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeadsAndTails ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
tile_predecessor_item,
FlagT(&)  tail_flags[ITEMS_PER_THREAD],
tile_successor_item,
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline

Sets both head and tail flags indicating discontinuities between items partitioned across the thread block.

  • The flag head_flagsi is set for item inputi when flag_op(previous-item, inputi) returns true (where previous-item is either the preceding item in the same thread or the last item in the previous thread).
  • For thread0, item input0 is compared against tile_predecessor_item.
  • The flag tail_flagsi is set for item inputi when flag_op(inputi, next-item) returns true (where next-item is either the next item in the same thread or the first item in the next thread).
  • For threadBLOCK_THREADS-1, item inputITEMS_PER_THREAD-1 is compared against tile_successor_item.
  • \blocked
  • \granularity
  • \smemreuse
Snippet
The code snippet below illustrates the head- and tail-flagging of 512 integer items that are partitioned in a blocked arrangement across 128 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_discontinuity.cuh>
__global__ void ExampleKernel(...)
{
// Specialize BlockDiscontinuity for a 1D block of 128 threads on type int
// Allocate shared memory for BlockDiscontinuity
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Have thread0 obtain the predecessor item for the entire tile
int tile_predecessor_item;
if (threadIdx.x == 0) tile_predecessor_item == ...
// Have thread127 obtain the successor item for the entire tile
int tile_successor_item;
if (threadIdx.x == 127) tile_successor_item == ...
// Collectively compute head and flags for discontinuities in the segment
int head_flags[4];
int tail_flags[4];
head_flags, tile_predecessor_item, tail_flags, tile_successor_item,
thread_data, cub::Inequality());
Suppose the set of input thread_data across the block of threads is { [0,0,1,1], [1,1,1,1], [2,3,3,3], ..., [124,125,125,125] }, that the tile_predecessor_item is 0, and that the tile_successor_item is 125. The corresponding output head_flags in those threads will be { [0,0,1,0], [0,0,0,0], [1,1,0,0], [0,1,0,0], ... }. and the corresponding output tail_flags in those threads will be { [0,1,0,0], [0,0,0,1], [1,0,0,...], ..., [1,0,0,0] }.
Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary predicate functor type having member T operator()(const T &a, const T &b) or member T operator()(const T &a, const T &b, unsigned int b_index), and returning true if a discontinuity exists between a and b, otherwise false. b_index is the rank of b in the aggregate tile of data.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[in]tile_predecessor_item[thread0 only] Item with which to compare the first tile item (input0 from thread0).
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]tile_successor_item[threadBLOCK_THREADS-1 only] Item with which to compare the last tile item (inputITEMS_PER_THREAD-1 from threadBLOCK_THREADS-1).
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate

Definition at line 1094 of file block_discontinuity.cuh.

◆ FlagHeadsAndTails() [4/4]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagHeadsAndTails ( FlagT(&)  head_flags[ITEMS_PER_THREAD],
tile_predecessor_item,
FlagT(&)  tail_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline

Sets both head and tail flags indicating discontinuities between items partitioned across the thread block.

  • The flag head_flagsi is set for item inputi when flag_op(previous-item, inputi) returns true (where previous-item is either the preceding item in the same thread or the last item in the previous thread).
  • For thread0, item input0 is compared against tile_predecessor_item.
  • The flag tail_flagsi is set for item inputi when flag_op(inputi, next-item) returns true (where next-item is either the next item in the same thread or the first item in the next thread).
  • For threadBLOCK_THREADS-1, item inputITEMS_PER_THREAD-1 is always flagged.
  • \blocked
  • \granularity
  • \smemreuse
Snippet
The code snippet below illustrates the head- and tail-flagging of 512 integer items that are partitioned in a blocked arrangement across 128 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_discontinuity.cuh>
__global__ void ExampleKernel(...)
{
// Specialize BlockDiscontinuity for a 1D block of 128 threads on type int
// Allocate shared memory for BlockDiscontinuity
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Have thread0 obtain the predecessor item for the entire tile
int tile_predecessor_item;
if (threadIdx.x == 0) tile_predecessor_item == ...
// Have thread127 obtain the successor item for the entire tile
int tile_successor_item;
if (threadIdx.x == 127) tile_successor_item == ...
// Collectively compute head and flags for discontinuities in the segment
int head_flags[4];
int tail_flags[4];
head_flags, tile_predecessor_item, tail_flags, tile_successor_item,
thread_data, cub::Inequality());
Suppose the set of input thread_data across the block of threads is { [0,0,1,1], [1,1,1,1], [2,3,3,3], ..., [124,125,125,125] }, that the tile_predecessor_item is 0, and that the tile_successor_item is 125. The corresponding output head_flags in those threads will be { [0,0,1,0], [0,0,0,0], [1,1,0,0], [0,1,0,0], ... }. and the corresponding output tail_flags in those threads will be { [0,1,0,0], [0,0,0,1], [1,0,0,...], ..., [1,0,0,1] }.
Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary predicate functor type having member T operator()(const T &a, const T &b) or member T operator()(const T &a, const T &b, unsigned int b_index), and returning true if a discontinuity exists between a and b, otherwise false. b_index is the rank of b in the aggregate tile of data.
Parameters
[out]head_flagsCalling thread's discontinuity head_flags
[in]tile_predecessor_item[thread0 only] Item with which to compare the first tile item (input0 from thread0).
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate

Definition at line 975 of file block_discontinuity.cuh.

◆ FlagTails() [1/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagTails ( FlagT(&)  tail_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op 
)
inline

Sets tail flags indicating discontinuities between items partitioned across the thread block, for which the last item has no reference and is always flagged.

  • The flag tail_flagsi is set for item inputi when flag_op(inputi, next-item) returns true (where next-item is either the next item in the same thread or the first item in the next thread).
  • For threadBLOCK_THREADS-1, item inputITEMS_PER_THREAD-1 is always flagged.
  • \blocked
  • \granularity
  • \smemreuse
Snippet
The code snippet below illustrates the tail-flagging of 512 integer items that are partitioned in a blocked arrangement across 128 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_discontinuity.cuh>
__global__ void ExampleKernel(...)
{
// Specialize BlockDiscontinuity for a 1D block of 128 threads on type int
// Allocate shared memory for BlockDiscontinuity
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Collectively compute tail flags for discontinuities in the segment
int tail_flags[4];
BlockDiscontinuity(temp_storage).FlagTails(tail_flags, thread_data, cub::Inequality());
Suppose the set of input thread_data across the block of threads is { [0,0,1,1], [1,1,1,1], [2,3,3,3], ..., [124,125,125,125] }. The corresponding output tail_flags in those threads will be { [0,1,0,0], [0,0,0,1], [1,0,0,...], ..., [1,0,0,1] }.
Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary predicate functor type having member T operator()(const T &a, const T &b) or member T operator()(const T &a, const T &b, unsigned int b_index), and returning true if a discontinuity exists between a and b, otherwise false. b_index is the rank of b in the aggregate tile of data.
Parameters
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate

Definition at line 553 of file block_discontinuity.cuh.

◆ FlagTails() [2/2]

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
template<int ITEMS_PER_THREAD, typename FlagT , typename FlagOp >
__device__ __forceinline__ void cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::FlagTails ( FlagT(&)  tail_flags[ITEMS_PER_THREAD],
T(&)  input[ITEMS_PER_THREAD],
FlagOp  flag_op,
tile_successor_item 
)
inline

Sets tail flags indicating discontinuities between items partitioned across the thread block.

  • The flag tail_flagsi is set for item inputi when flag_op(inputi, next-item) returns true (where next-item is either the next item in the same thread or the first item in the next thread).
  • For threadBLOCK_THREADS-1, item inputITEMS_PER_THREAD-1 is compared against tile_successor_item.
  • \blocked
  • \granularity
  • \smemreuse
Snippet
The code snippet below illustrates the tail-flagging of 512 integer items that are partitioned in a blocked arrangement across 128 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/block/block_discontinuity.cuh>
__global__ void ExampleKernel(...)
{
// Specialize BlockDiscontinuity for a 1D block of 128 threads on type int
// Allocate shared memory for BlockDiscontinuity
// Obtain a segment of consecutive items that are blocked across threads
int thread_data[4];
...
// Have thread127 obtain the successor item for the entire tile
int tile_successor_item;
if (threadIdx.x == 127) tile_successor_item == ...
// Collectively compute tail flags for discontinuities in the segment
int tail_flags[4];
tail_flags, thread_data, cub::Inequality(), tile_successor_item);
Suppose the set of input thread_data across the block of threads is { [0,0,1,1], [1,1,1,1], [2,3,3,3], ..., [124,125,125,125] } and that tile_successor_item is 125. The corresponding output tail_flags in those threads will be { [0,1,0,0], [0,0,0,1], [1,0,0,...], ..., [1,0,0,0] }.
Template Parameters
ITEMS_PER_THREAD[inferred] The number of consecutive items partitioned onto each thread.
FlagT[inferred] The flag type (must be an integer type)
FlagOp[inferred] Binary predicate functor type having member T operator()(const T &a, const T &b) or member T operator()(const T &a, const T &b, unsigned int b_index), and returning true if a discontinuity exists between a and b, otherwise false. b_index is the rank of b in the aggregate tile of data.
Parameters
[out]tail_flagsCalling thread's discontinuity tail_flags
[in]inputCalling thread's input items
[in]flag_opBinary boolean flag predicate
[in]tile_successor_item[threadBLOCK_THREADS-1 only] Item with which to compare the last tile item (inputITEMS_PER_THREAD-1 from threadBLOCK_THREADS-1).

Definition at line 637 of file block_discontinuity.cuh.

◆ PrivateStorage()

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
__device__ __forceinline__ _TempStorage & cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::PrivateStorage ( )
inlineprivate

Internal storage allocator.

Definition at line 137 of file block_discontinuity.cuh.

Field Documentation

◆ linear_tid

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
unsigned int cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::linear_tid
private

Linear thread-id.

Definition at line 254 of file block_discontinuity.cuh.

◆ temp_storage

template<typename T , int BLOCK_DIM_X, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int PTX_ARCH = CUB_PTX_ARCH>
_TempStorage& cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::temp_storage
private

Shared storage reference.

Definition at line 251 of file block_discontinuity.cuh.


The documentation for this class was generated from the following file: