DeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within device-accessible memory.
More...
DeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within device-accessible memory.
- Overview
- A run-length encoding computes a simple compressed representation of a sequence of input elements such that each maximal "run" of consecutive same-valued data items is encoded as a single data value along with a count of the elements in that run.
- Usage Considerations
- \cdp_class{DeviceRunLengthEncode}
- Performance
- \linear_performance{run-length encode}
- The following chart illustrates DeviceRunLengthEncode::RunLengthEncode performance across different CUDA architectures for
int32
items. Segments have lengths uniformly sampled from [1,1000].
- \plots_below
Definition at line 78 of file device_run_length_encode.cuh.
|
template<typename InputIteratorT , typename UniqueOutputIteratorT , typename LengthsOutputIteratorT , typename NumRunsOutputIteratorT > |
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t | Encode (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, UniqueOutputIteratorT d_unique_out, LengthsOutputIteratorT d_counts_out, NumRunsOutputIteratorT d_num_runs_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false) |
| Computes a run-length encoding of the sequence d_in .
|
|
template<typename InputIteratorT , typename OffsetsOutputIteratorT , typename LengthsOutputIteratorT , typename NumRunsOutputIteratorT > |
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t | NonTrivialRuns (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OffsetsOutputIteratorT d_offsets_out, LengthsOutputIteratorT d_lengths_out, NumRunsOutputIteratorT d_num_runs_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false) |
| Enumerates the starting offsets and lengths of all non-trivial runs (of length > 1) of same-valued keys in the sequence d_in .
|
|
◆ Encode()
template<typename InputIteratorT , typename UniqueOutputIteratorT , typename LengthsOutputIteratorT , typename NumRunsOutputIteratorT >
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t cub::DeviceRunLengthEncode::Encode |
( |
void * |
d_temp_storage, |
|
|
size_t & |
temp_storage_bytes, |
|
|
InputIteratorT |
d_in, |
|
|
UniqueOutputIteratorT |
d_unique_out, |
|
|
LengthsOutputIteratorT |
d_counts_out, |
|
|
NumRunsOutputIteratorT |
d_num_runs_out, |
|
|
int |
num_items, |
|
|
cudaStream_t |
stream = 0 , |
|
|
bool |
debug_synchronous = false |
|
) |
| |
|
inlinestatic |
Computes a run-length encoding of the sequence d_in
.
- For the ith run encountered, the first key of the run and its length are written to
d_unique_out[i]
and d_counts_out[i]
, respectively.
- The total number of runs encountered is written to
d_num_runs_out
.
- The
==
equality operator is used to determine whether values are equivalent
- \devicestorage
- Performance
- The following charts illustrate saturated encode performance across different CUDA architectures for
int32
and int64
items, respectively. Segments have lengths uniformly sampled from [1,1000].
- The following charts are similar, but with segment lengths uniformly sampled from [1,10]:
- Snippet
- The code snippet below illustrates the run-length encoding of a sequence of
int
values.
#include <cub/cub.cuh>
int *d_in;
int *d_counts_out;
...
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cudaMalloc(&d_temp_storage, temp_storage_bytes);
UniqueOutputIteratorT d_unique_out
< Pointer to the input sequence of keys
UniqueOutputIteratorT ValuesInputIteratorT AggregatesOutputIteratorT NumRunsOutputIteratorT d_num_runs_out
Pointer to total number of runs encountered (i.e., the length of d_unique_out)
KeyT const ValueT ValueT OffsetT OffsetT num_items
[in] Total number of input data items
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t Encode(void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, UniqueOutputIteratorT d_unique_out, LengthsOutputIteratorT d_counts_out, NumRunsOutputIteratorT d_num_runs_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false)
Computes a run-length encoding of the sequence d_in.
- Template Parameters
-
InputIteratorT | [inferred] Random-access input iterator type for reading input items \iterator |
UniqueOutputIteratorT | [inferred] Random-access output iterator type for writing unique output items \iterator |
LengthsOutputIteratorT | [inferred] Random-access output iterator type for writing output counts \iterator |
NumRunsOutputIteratorT | [inferred] Output iterator type for recording the number of runs encountered \iterator |
- Parameters
-
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_in | Pointer to the input sequence of keys |
[out] | d_unique_out | Pointer to the output sequence of unique keys (one key per run) |
[out] | d_counts_out | Pointer to the output sequence of run-lengths (one count per run) |
[out] | d_num_runs_out | Pointer to total number of runs |
[in] | num_items | Total number of associated key+value pairs (i.e., the length of d_in_keys and d_in_values ) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
Definition at line 148 of file device_run_length_encode.cuh.
◆ NonTrivialRuns()
template<typename InputIteratorT , typename OffsetsOutputIteratorT , typename LengthsOutputIteratorT , typename NumRunsOutputIteratorT >
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t cub::DeviceRunLengthEncode::NonTrivialRuns |
( |
void * |
d_temp_storage, |
|
|
size_t & |
temp_storage_bytes, |
|
|
InputIteratorT |
d_in, |
|
|
OffsetsOutputIteratorT |
d_offsets_out, |
|
|
LengthsOutputIteratorT |
d_lengths_out, |
|
|
NumRunsOutputIteratorT |
d_num_runs_out, |
|
|
int |
num_items, |
|
|
cudaStream_t |
stream = 0 , |
|
|
bool |
debug_synchronous = false |
|
) |
| |
|
inlinestatic |
Enumerates the starting offsets and lengths of all non-trivial runs (of length > 1) of same-valued keys in the sequence d_in
.
- For the ith non-trivial run, the run's starting offset and its length are written to
d_offsets_out[i]
and d_lengths_out[i]
, respectively.
- The total number of runs encountered is written to
d_num_runs_out
.
- The
==
equality operator is used to determine whether values are equivalent
- \devicestorage
- Performance
- Snippet
- The code snippet below illustrates the identification of non-trivial runs within a sequence of
int
values.
#include <cub/cub.cuh>
int *d_in;
...
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cudaMalloc(&d_temp_storage, temp_storage_bytes);
OffsetsOutputIteratorT d_offsets_out
< [in] Pointer to input sequence of data items
OffsetsOutputIteratorT LengthsOutputIteratorT d_lengths_out
[out] Pointer to output sequence of run-lengths
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t NonTrivialRuns(void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OffsetsOutputIteratorT d_offsets_out, LengthsOutputIteratorT d_lengths_out, NumRunsOutputIteratorT d_num_runs_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false)
Enumerates the starting offsets and lengths of all non-trivial runs (of length > 1) of same-valued ke...
- Template Parameters
-
InputIteratorT | [inferred] Random-access input iterator type for reading input items \iterator |
OffsetsOutputIteratorT | [inferred] Random-access output iterator type for writing run-offset values \iterator |
LengthsOutputIteratorT | [inferred] Random-access output iterator type for writing run-length values \iterator |
NumRunsOutputIteratorT | [inferred] Output iterator type for recording the number of runs encountered \iterator |
- Parameters
-
[in] | d_temp_storage | Device-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done. |
[in,out] | temp_storage_bytes | Reference to size in bytes of d_temp_storage allocation |
[in] | d_in | Pointer to input sequence of data items |
[out] | d_offsets_out | Pointer to output sequence of run-offsets (one offset per non-trivial run) |
[out] | d_lengths_out | Pointer to output sequence of run-lengths (one count per non-trivial run) |
[out] | d_num_runs_out | Pointer to total number of runs (i.e., length of d_offsets_out ) |
[in] | num_items | Total number of associated key+value pairs (i.e., the length of d_in_keys and d_in_values ) |
[in] | stream | [optional] CUDA stream to launch kernels within. Default is stream0. |
[in] | debug_synchronous | [optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false . |
Definition at line 244 of file device_run_length_encode.cuh.
The documentation for this struct was generated from the following file: