DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within device-accessible memory. More...
DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within device-accessible memory.
unsigned char
, int
, double
, etc.) as well as CUDA's __half
half-precision floating-point type. Although the direct radix sorting method can only be applied to unsigned integral types, DeviceRadixSort is able to sort signed and floating-point types via simple bit-wise transformations that ensure lexicographic key ordering.uint32
keys. \plots_belowDefinition at line 83 of file device_radix_sort.cuh.
Static Public Member Functions | |
KeyT-value pairs | |
template<typename KeyT , typename ValueT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairs (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts key-value pairs into ascending order. (~2N auxiliary storage required) | |
template<typename KeyT , typename ValueT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairs (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, DoubleBuffer< ValueT > &d_values, int num_items, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts key-value pairs into ascending order. (~N auxiliary storage required) | |
template<typename KeyT , typename ValueT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairsDescending (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, const ValueT *d_values_in, ValueT *d_values_out, int num_items, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts key-value pairs into descending order. (~2N auxiliary storage required). | |
template<typename KeyT , typename ValueT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortPairsDescending (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, DoubleBuffer< ValueT > &d_values, int num_items, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts key-value pairs into descending order. (~N auxiliary storage required). | |
Keys-only | |
template<typename KeyT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeys (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, int num_items, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts keys into ascending order. (~2N auxiliary storage required) | |
template<typename KeyT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeys (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, int num_items, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts keys into ascending order. (~N auxiliary storage required). | |
template<typename KeyT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeysDescending (void *d_temp_storage, size_t &temp_storage_bytes, const KeyT *d_keys_in, KeyT *d_keys_out, int num_items, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts keys into descending order. (~2N auxiliary storage required). | |
template<typename KeyT > | |
static CUB_RUNTIME_FUNCTION cudaError_t | SortKeysDescending (void *d_temp_storage, size_t &temp_storage_bytes, DoubleBuffer< KeyT > &d_keys, int num_items, int begin_bit=0, int end_bit=sizeof(KeyT) *8, cudaStream_t stream=0, bool debug_synchronous=false) |
Sorts keys into descending order. (~N auxiliary storage required). | |
|
inlinestatic |
Sorts keys into ascending order. (~2N auxiliary storage required)
[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P
) temporary storage, see the sorting interface using DoubleBuffer wrappers below.uint32
and uint64
keys, respectively.int
keys. KeyT | [inferred] KeyT type |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Pointer to the input data of key data to sort |
[out] | d_keys_out | Pointer to the sorted output sequence of key data |
[in] | num_items | Number of items to sort |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 507 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts keys into ascending order. (~N auxiliary storage required).
[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.uint32
and uint64
keys, respectively.int
keys. KeyT | [inferred] KeyT type |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in] | num_items | Number of items to sort |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 596 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts keys into descending order. (~2N auxiliary storage required).
[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P
) temporary storage, see the sorting interface using DoubleBuffer wrappers below.int
keys. KeyT | [inferred] KeyT type |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Pointer to the input data of key data to sort |
[out] | d_keys_out | Pointer to the sorted output sequence of key data |
[in] | num_items | Number of items to sort |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 671 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts keys into descending order. (~N auxiliary storage required).
[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.int
keys. KeyT | [inferred] KeyT type |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in] | num_items | Number of items to sort |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 755 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts key-value pairs into ascending order. (~2N auxiliary storage required)
[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P
) temporary storage, see the sorting interface using DoubleBuffer wrappers below.uint32,uint32
and uint64,uint64
pairs, respectively.int
keys with associated vector of int
values. KeyT | [inferred] KeyT type |
ValueT | [inferred] ValueT type |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Pointer to the input data of key data to sort |
[out] | d_keys_out | Pointer to the sorted output sequence of key data |
[in] | d_values_in | Pointer to the corresponding input sequence of associated value items |
[out] | d_values_out | Pointer to the correspondingly-reordered output sequence of associated value items |
[in] | num_items | Number of items to sort |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 148 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts key-value pairs into ascending order. (~N auxiliary storage required)
[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.uint32,uint32
and uint64,uint64
pairs, respectively.int
keys with associated vector of int
values. KeyT | [inferred] KeyT type |
ValueT | [inferred] ValueT type |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in,out] | d_values | Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
[in] | num_items | Number of items to sort |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 249 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts key-value pairs into descending order. (~2N auxiliary storage required).
[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.P
) temporary storage, see the sorting interface using DoubleBuffer wrappers below.int
keys with associated vector of int
values. KeyT | [inferred] KeyT type |
ValueT | [inferred] ValueT type |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_keys_in | Pointer to the input data of key data to sort |
[out] | d_keys_out | Pointer to the sorted output sequence of key data |
[in] | d_values_in | Pointer to the corresponding input sequence of associated value items |
[out] | d_values_out | Pointer to the correspondingly-reordered output sequence of associated value items |
[in] | num_items | Number of items to sort |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 329 of file device_radix_sort.cuh.
|
inlinestatic |
Sorts key-value pairs into descending order. (~N auxiliary storage required).
[begin_bit, end_bit)
of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.int
keys with associated vector of int
values. KeyT | [inferred] KeyT type |
ValueT | [inferred] ValueT type |
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in,out] | d_keys | Reference to the double-buffer of keys whose "current" device-accessible buffer contains the unsorted input keys and, upon return, is updated to point to the sorted output keys |
[in,out] | d_values | Double-buffer of values whose "current" device-accessible buffer contains the unsorted input values and, upon return, is updated to point to the sorted output values |
[in] | num_items | Number of items to sort |
[in] | begin_bit | [optional] The least-significant bit index (inclusive) needed for key comparison |
[in] | end_bit | [optional] The most-significant bit index (exclusive) needed for key comparison (e.g., sizeof(unsigned int) * 8) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false . |
Definition at line 425 of file device_radix_sort.cuh.