DeviceSegmentedReduce provides device-wide, parallel operations for computing a reduction across multiple sequences of data items residing within device-accessible memory. More...
DeviceSegmentedReduce provides device-wide, parallel operations for computing a reduction across multiple sequences of data items residing within device-accessible memory.

Definition at line 65 of file device_segmented_reduce.cuh.
Static Public Member Functions | |
| template<typename InputIteratorT , typename OutputIteratorT , typename OffsetIteratorT , typename ReductionOp , typename T > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | Reduce (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, ReductionOp reduction_op, T initial_value, cudaStream_t stream=0, bool debug_synchronous=false) |
Computes a device-wide segmented reduction using the specified binary reduction_op functor. | |
| template<typename InputIteratorT , typename OutputIteratorT , typename OffsetIteratorT > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | Sum (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
| Computes a device-wide segmented sum using the addition ('+') operator. | |
| template<typename InputIteratorT , typename OutputIteratorT , typename OffsetIteratorT > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | Min (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
| Computes a device-wide segmented minimum using the less-than ('<') operator. | |
| template<typename InputIteratorT , typename OutputIteratorT , typename OffsetIteratorT > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | ArgMin (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
| Finds the first device-wide minimum in each segment using the less-than ('<') operator, also returning the in-segment index of that item. | |
| template<typename InputIteratorT , typename OutputIteratorT , typename OffsetIteratorT > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | Max (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
| Computes a device-wide segmented maximum using the greater-than ('>') operator. | |
| template<typename InputIteratorT , typename OutputIteratorT , typename OffsetIteratorT > | |
| static CUB_RUNTIME_FUNCTION cudaError_t | ArgMax (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OutputIteratorT d_out, int num_segments, OffsetIteratorT d_begin_offsets, OffsetIteratorT d_end_offsets, cudaStream_t stream=0, bool debug_synchronous=false) |
| Finds the first device-wide maximum in each segment using the greater-than ('>') operator, also returning the in-segment index of that item. | |
|
inlinestatic |
Finds the first device-wide maximum in each segment using the greater-than ('>') operator, also returning the in-segment index of that item.
d_out is cub::KeyValuePair <int, T> (assuming the value type of d_in is T)d_out[i].value and its offset in that segment is written to d_out[i].key.{1, std::numeric_limits<T>::lowest()} tuple is produced for zero-length inputssegment_offsets (of length num_segments+1) can be aliased for both the d_begin_offsets and d_end_offsets parameters (where the latter is specified as segment_offsets+1).> operators that are non-commutative.int data elements. | InputIteratorT | [inferred] Random-access input iterator type for reading input items (of some type T) \iterator |
| OutputIteratorT | [inferred] Output iterator type for recording the reduced aggregate (having value type KeyValuePair<int, T>) \iterator |
| OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
| [in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_in | Pointer to the input sequence of data items |
| [out] | d_out | Pointer to the output aggregate |
| [in] | num_segments | The number of segments that comprise the sorting data |
| [in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments, such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
| [in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments, such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_*. If d_end_offsets[i]-1 <= d_begin_offsets[i], the ith is considered empty. |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 568 of file device_segmented_reduce.cuh.
|
inlinestatic |
Finds the first device-wide minimum in each segment using the less-than ('<') operator, also returning the in-segment index of that item.
d_out is cub::KeyValuePair <int, T> (assuming the value type of d_in is T)d_out[i].value and its offset in that segment is written to d_out[i].key.{1, std::numeric_limits<T>::max()} tuple is produced for zero-length inputssegment_offsets (of length num_segments+1) can be aliased for both the d_begin_offsets and d_end_offsets parameters (where the latter is specified as segment_offsets+1).< operators that are non-commutative.int data elements. | InputIteratorT | [inferred] Random-access input iterator type for reading input items (of some type T) \iterator |
| OutputIteratorT | [inferred] Output iterator type for recording the reduced aggregate (having value type KeyValuePair<int, T>) \iterator |
| OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
| [in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_in | Pointer to the input sequence of data items |
| [out] | d_out | Pointer to the output aggregate |
| [in] | num_segments | The number of segments that comprise the sorting data |
| [in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments, such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
| [in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments, such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_*. If d_end_offsets[i]-1 <= d_begin_offsets[i], the ith is considered empty. |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 385 of file device_segmented_reduce.cuh.
|
inlinestatic |
Computes a device-wide segmented maximum using the greater-than ('>') operator.
std::numeric_limits<T>::lowest() as the initial value of the reduction.segment_offsets (of length num_segments+1) can be aliased for both the d_begin_offsets and d_end_offsets parameters (where the latter is specified as segment_offsets+1).> operators that are non-commutative.int data elements. | InputIteratorT | [inferred] Random-access input iterator type for reading input items \iterator |
| OutputIteratorT | [inferred] Output iterator type for recording the reduced aggregate \iterator |
| OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
| [in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_in | Pointer to the input sequence of data items |
| [out] | d_out | Pointer to the output aggregate |
| [in] | num_segments | The number of segments that comprise the sorting data |
| [in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments, such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
| [in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments, such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_*. If d_end_offsets[i]-1 <= d_begin_offsets[i], the ith is considered empty. |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 483 of file device_segmented_reduce.cuh.
|
inlinestatic |
Computes a device-wide segmented minimum using the less-than ('<') operator.
std::numeric_limits<T>::max() as the initial value of the reduction for each segment.segment_offsets (of length num_segments+1) can be aliased for both the d_begin_offsets and d_end_offsets parameters (where the latter is specified as segment_offsets+1).< operators that are non-commutative.int data elements. | InputIteratorT | [inferred] Random-access input iterator type for reading input items \iterator |
| OutputIteratorT | [inferred] Output iterator type for recording the reduced aggregate \iterator |
| OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
| [in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_in | Pointer to the input sequence of data items |
| [out] | d_out | Pointer to the output aggregate |
| [in] | num_segments | The number of segments that comprise the sorting data |
| [in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments, such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
| [in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments, such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_*. If d_end_offsets[i]-1 <= d_begin_offsets[i], the ith is considered empty. |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 300 of file device_segmented_reduce.cuh.
|
inlinestatic |
Computes a device-wide segmented reduction using the specified binary reduction_op functor.
segment_offsets (of length num_segments+1) can be aliased for both the d_begin_offsets and d_end_offsets parameters (where the latter is specified as segment_offsets+1).int data elements. | InputIteratorT | [inferred] Random-access input iterator type for reading input items \iterator |
| OutputIteratorT | [inferred] Output iterator type for recording the reduced aggregate \iterator |
| OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
| ReductionOp | [inferred] Binary reduction functor type having member T operator()(const T &a, const T &b) |
| T | [inferred] Data element type that is convertible to the value type of InputIteratorT |
| [in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_in | Pointer to the input sequence of data items |
| [out] | d_out | Pointer to the output aggregate |
| [in] | num_segments | The number of segments that comprise the sorting data |
| [in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments, such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
| [in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments, such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_*. If d_end_offsets[i]-1 <= d_begin_offsets[i], the ith is considered empty. |
| [in] | reduction_op | Binary reduction functor |
| [in] | initial_value | Initial value of the reduction for each segment |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 133 of file device_segmented_reduce.cuh.
|
inlinestatic |
Computes a device-wide segmented sum using the addition ('+') operator.
0 as the initial value of the reduction for each segment.segment_offsets (of length num_segments+1) can be aliased for both the d_begin_offsets and d_end_offsets parameters (where the latter is specified as segment_offsets+1).+ operators that are non-commutative..int data elements. | InputIteratorT | [inferred] Random-access input iterator type for reading input items \iterator |
| OutputIteratorT | [inferred] Output iterator type for recording the reduced aggregate \iterator |
| OffsetIteratorT | [inferred] Random-access input iterator type for reading segment offsets \iterator |
| [in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
| [in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
| [in] | d_in | Pointer to the input sequence of data items |
| [out] | d_out | Pointer to the output aggregate |
| [in] | num_segments | The number of segments that comprise the sorting data |
| [in] | d_begin_offsets | Pointer to the sequence of beginning offsets of length num_segments, such that d_begin_offsets[i] is the first element of the ith data segment in d_keys_* and d_values_* |
| [in] | d_end_offsets | Pointer to the sequence of ending offsets of length num_segments, such that d_end_offsets[i]-1 is the last element of the ith data segment in d_keys_* and d_values_*. If d_end_offsets[i]-1 <= d_begin_offsets[i], the ith is considered empty. |
| [in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
| [in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. Also causes launch configurations to be printed to the console. Default is false. |
Definition at line 215 of file device_segmented_reduce.cuh.