DeviceSegmentedRadixSort provides device-wide, parallel operations for computing a batched radix sort across multiple, non-overlapping sequences of data items residing within device-accessible memory. More...
DeviceSegmentedRadixSort provides device-wide, parallel operations for computing a batched radix sort across multiple, non-overlapping sequences of data items residing within device-accessible memory.
unsigned char
, int
, double
, etc.) as well as CUDA's __half
half-precision floating-point type. Although the direct radix sorting method can only be applied to unsigned integral types, DeviceSegmentedRadixSort is able to sort signed and floating-point types via simple bit-wise transformations that ensure lexicographic key ordering.Definition at line 76 of file device_segmented_radix_sort.cuh.
Static Public Member Functions | |
Key-value pairs | |
template<typename KeyT , typename ValueT , typename OffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairs (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into ascending order. (~2N auxiliary storage required) | |
template<typename KeyT , typename ValueT , typename OffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairs (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, DoubleBuffer< ValueT > &d_values, int num_items, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into ascending order. (~N auxiliary storage required) | |
template<typename KeyT , typename ValueT , typename OffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairsDescending (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into descending order. (~2N auxiliary storage required). | |
template<typename KeyT , typename ValueT , typename OffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairsDescending (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, DoubleBuffer< ValueT > &d_values, int num_items, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of key-value pairs into descending order. (~N auxiliary storage required). | |
Keys-only | |
template<typename KeyT , typename OffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeys (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, int num_items, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into ascending order. (~2N auxiliary storage required) | |
template<typename KeyT , typename OffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeys (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, int num_items, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into ascending order. (~N auxiliary storage required). | |
template<typename KeyT , typename OffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeysDescending (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, int num_items, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into descending order. (~2N auxiliary storage required). | |
template<typename KeyT , typename OffsetIteratorT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeysDescending (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, int num_items, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts segments of keys into descending order. (~N auxiliary storage required). | |
|
inlinestatic |
Sorts segments of keys into ascending order. (~2N auxiliary storage required)
[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).P
) temporary storage, see the sorting interface using DoubleBuffer wrappers below.int
keys. KeyT | [inferred] Key type |
OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the ith is considered empty. |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 546 of file device_segmented_radix_sort.cuh.
|
inlinestatic |
Sorts segments of keys into ascending order. (~N auxiliary storage required).
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.int
keys. KeyT | [inferred] Key type |
OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the ith is considered empty. |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 645 of file device_segmented_radix_sort.cuh.
|
inlinestatic |
Sorts segments of keys into descending order. (~2N auxiliary storage required).
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P
) temporary storage, see the sorting interface using DoubleBuffer wrappers below.int
keys. KeyT | [inferred] Key type |
OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the ith is considered empty. |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 734 of file device_segmented_radix_sort.cuh.
|
inlinestatic |
Sorts segments of keys into descending order. (~N auxiliary storage required).
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.int
keys. KeyT | [inferred] Key type |
OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the ith is considered empty. |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 832 of file device_segmented_radix_sort.cuh.
|
inlinestatic |
Sorts segments of key-value pairs into ascending order. (~2N auxiliary storage required)
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P
) temporary storage, see the sorting interface using DoubleBuffer wrappers below.int
keys with associated vector of int
values. KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | d_values_in | Device-accessible pointer to the corresponding input sequence of associated value items |
[out] | d_values_out | Device-accessible pointer to the correspondingly-reordered output sequence of associated value items |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the ith is considered empty. |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 143 of file device_segmented_radix_sort.cuh.
|
inlinestatic |
Sorts segments of key-value pairs into ascending order. (~N auxiliary storage required)
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.int
keys with associated vector of int
values. KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in,out] | d_values | Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the ith is considered empty. |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 252 of file device_segmented_radix_sort.cuh.
|
inlinestatic |
Sorts segments of key-value pairs into descending order. (~2N auxiliary storage required).
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P
) temporary storage, see the sorting interface using DoubleBuffer wrappers below.int
keys with associated vector of int
values. KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Device-accessible pointer to the input data of key data to sort |
[out] | d_keys_out | Device-accessible pointer to the sorted output sequence of key data |
[in] | d_values_in | Device-accessible pointer to the corresponding input sequence of associated value items |
[out] | d_values_out | Device-accessible pointer to the correspondingly-reordered output sequence of associated value items |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the ith is considered empty. |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 345 of file device_segmented_radix_sort.cuh.
|
inlinestatic |
Sorts segments of key-value pairs into descending order. (~N auxiliary storage required).
segment_offsets
(of length num_segments+1
) can be aliased for both the d_begin_offsets
and d_end_offsets
parameters (where the latter is specified as segment_offsets+1
).[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.int
keys with associated vector of int
values. KeyT | [inferred] Key type |
ValueT | [inferred] Value type |
OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in,out] | d_values | Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
[in] | num_items | The total number of items to sort (across all segments) |
[in] | num_segments | The number of segments that comprise the sorting data |
[in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments , such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
[in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments , such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_* . If d_end_offsets[i]-1 <= d_begin_offsets[i] , the ith is considered empty. |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 454 of file device_segmented_radix_sort.cuh.