OpenFPM_pdata  4.1.0
Project that contain the implementation of distributed structures
 
Loading...
Searching...
No Matches
cub::DeviceRunLengthEncode Struct Reference

DeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within device-accessible memory. More...

Detailed Description

DeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within device-accessible memory.

Overview
A run-length encoding computes a simple compressed representation of a sequence of input elements such that each maximal "run" of consecutive same-valued data items is encoded as a single data value along with a count of the elements in that run.
Usage Considerations
\cdp_class{DeviceRunLengthEncode}
Performance
\linear_performance{run-length encode}
The following chart illustrates DeviceRunLengthEncode::RunLengthEncode performance across different CUDA architectures for int32 items. Segments have lengths uniformly sampled from [1,1000].
\plots_below

Definition at line 78 of file device_run_length_encode.cuh.

Static Public Member Functions

template<typename InputIteratorT , typename UniqueOutputIteratorT , typename LengthsOutputIteratorT , typename NumRunsOutputIteratorT >
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t Encode (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, UniqueOutputIteratorT d_unique_out, LengthsOutputIteratorT d_counts_out, NumRunsOutputIteratorT d_num_runs_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false)
 Computes a run-length encoding of the sequence d_in.
 
template<typename InputIteratorT , typename OffsetsOutputIteratorT , typename LengthsOutputIteratorT , typename NumRunsOutputIteratorT >
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t NonTrivialRuns (void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OffsetsOutputIteratorT d_offsets_out, LengthsOutputIteratorT d_lengths_out, NumRunsOutputIteratorT d_num_runs_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false)
 Enumerates the starting offsets and lengths of all non-trivial runs (of length > 1) of same-valued keys in the sequence d_in.
 

Member Function Documentation

◆ Encode()

template<typename InputIteratorT , typename UniqueOutputIteratorT , typename LengthsOutputIteratorT , typename NumRunsOutputIteratorT >
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t cub::DeviceRunLengthEncode::Encode ( void *  d_temp_storage,
size_t &  temp_storage_bytes,
InputIteratorT  d_in,
UniqueOutputIteratorT  d_unique_out,
LengthsOutputIteratorT  d_counts_out,
NumRunsOutputIteratorT  d_num_runs_out,
int  num_items,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Computes a run-length encoding of the sequence d_in.

  • For the ith run encountered, the first key of the run and its length are written to d_unique_out[i] and d_counts_out[i], respectively.
  • The total number of runs encountered is written to d_num_runs_out.
  • The == equality operator is used to determine whether values are equivalent
  • \devicestorage
Performance
The following charts illustrate saturated encode performance across different CUDA architectures for int32 and int64 items, respectively. Segments have lengths uniformly sampled from [1,1000].
The following charts are similar, but with segment lengths uniformly sampled from [1,10]:
Snippet
The code snippet below illustrates the run-length encoding of a sequence of int values.
#include <cub/cub.cuh> // or equivalently <cub/device/device_run_length_encode.cuh>
// Declare, allocate, and initialize device-accessible pointers for input and output
int num_items; // e.g., 8
int *d_in; // e.g., [0, 2, 2, 9, 5, 5, 5, 8]
int *d_unique_out; // e.g., [ , , , , , , , ]
int *d_counts_out; // e.g., [ , , , , , , , ]
int *d_num_runs_out; // e.g., [ ]
...
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceRunLengthEncode::Encode(d_temp_storage, temp_storage_bytes, d_in, d_unique_out, d_counts_out, d_num_runs_out, num_items);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run encoding
cub::DeviceRunLengthEncode::Encode(d_temp_storage, temp_storage_bytes, d_in, d_unique_out, d_counts_out, d_num_runs_out, num_items);
// d_unique_out <-- [0, 2, 9, 5, 8]
// d_counts_out <-- [1, 2, 1, 3, 1]
// d_num_runs_out <-- [5]
UniqueOutputIteratorT d_unique_out
< Pointer to the input sequence of keys
UniqueOutputIteratorT ValuesInputIteratorT AggregatesOutputIteratorT NumRunsOutputIteratorT d_num_runs_out
Pointer to total number of runs encountered (i.e., the length of d_unique_out)
KeyT const ValueT ValueT OffsetT OffsetT num_items
[in] Total number of input data items
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t Encode(void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, UniqueOutputIteratorT d_unique_out, LengthsOutputIteratorT d_counts_out, NumRunsOutputIteratorT d_num_runs_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false)
Computes a run-length encoding of the sequence d_in.
Template Parameters
InputIteratorT[inferred] Random-access input iterator type for reading input items \iterator
UniqueOutputIteratorT[inferred] Random-access output iterator type for writing unique output items \iterator
LengthsOutputIteratorT[inferred] Random-access output iterator type for writing output counts \iterator
NumRunsOutputIteratorT[inferred] Output iterator type for recording the number of runs encountered \iterator
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_inPointer to the input sequence of keys
[out]d_unique_outPointer to the output sequence of unique keys (one key per run)
[out]d_counts_outPointer to the output sequence of run-lengths (one count per run)
[out]d_num_runs_outPointer to total number of runs
[in]num_itemsTotal number of associated key+value pairs (i.e., the length of d_in_keys and d_in_values)
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false.

Definition at line 148 of file device_run_length_encode.cuh.

◆ NonTrivialRuns()

template<typename InputIteratorT , typename OffsetsOutputIteratorT , typename LengthsOutputIteratorT , typename NumRunsOutputIteratorT >
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t cub::DeviceRunLengthEncode::NonTrivialRuns ( void *  d_temp_storage,
size_t &  temp_storage_bytes,
InputIteratorT  d_in,
OffsetsOutputIteratorT  d_offsets_out,
LengthsOutputIteratorT  d_lengths_out,
NumRunsOutputIteratorT  d_num_runs_out,
int  num_items,
cudaStream_t  stream = 0,
bool  debug_synchronous = false 
)
inlinestatic

Enumerates the starting offsets and lengths of all non-trivial runs (of length > 1) of same-valued keys in the sequence d_in.

  • For the ith non-trivial run, the run's starting offset and its length are written to d_offsets_out[i] and d_lengths_out[i], respectively.
  • The total number of runs encountered is written to d_num_runs_out.
  • The == equality operator is used to determine whether values are equivalent
  • \devicestorage
Performance
Snippet
The code snippet below illustrates the identification of non-trivial runs within a sequence of int values.
#include <cub/cub.cuh> // or equivalently <cub/device/device_run_length_encode.cuh>
// Declare, allocate, and initialize device-accessible pointers for input and output
int num_items; // e.g., 8
int *d_in; // e.g., [0, 2, 2, 9, 5, 5, 5, 8]
int *d_offsets_out; // e.g., [ , , , , , , , ]
int *d_lengths_out; // e.g., [ , , , , , , , ]
int *d_num_runs_out; // e.g., [ ]
...
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run encoding
// d_offsets_out <-- [1, 4]
// d_lengths_out <-- [2, 3]
// d_num_runs_out <-- [2]
OffsetsOutputIteratorT d_offsets_out
< [in] Pointer to input sequence of data items
OffsetsOutputIteratorT LengthsOutputIteratorT d_lengths_out
[out] Pointer to output sequence of run-lengths
CUB_RUNTIME_FUNCTION static __forceinline__ cudaError_t NonTrivialRuns(void *d_temp_storage, size_t &temp_storage_bytes, InputIteratorT d_in, OffsetsOutputIteratorT d_offsets_out, LengthsOutputIteratorT d_lengths_out, NumRunsOutputIteratorT d_num_runs_out, int num_items, cudaStream_t stream=0, bool debug_synchronous=false)
Enumerates the starting offsets and lengths of all non-trivial runs (of length > 1) of same-valued ke...
Template Parameters
InputIteratorT[inferred] Random-access input iterator type for reading input items \iterator
OffsetsOutputIteratorT[inferred] Random-access output iterator type for writing run-offset values \iterator
LengthsOutputIteratorT[inferred] Random-access output iterator type for writing run-length values \iterator
NumRunsOutputIteratorT[inferred] Output iterator type for recording the number of runs encountered \iterator
Parameters
[in]d_temp_storageDevice-accessible allocation of temporary storage. When NULL, the required allocation size is written to temp_storage_bytes and no work is done.
[in,out]temp_storage_bytesReference to size in bytes of d_temp_storage allocation
[in]d_inPointer to input sequence of data items
[out]d_offsets_outPointer to output sequence of run-offsets (one offset per non-trivial run)
[out]d_lengths_outPointer to output sequence of run-lengths (one count per non-trivial run)
[out]d_num_runs_outPointer to total number of runs (i.e., length of d_offsets_out)
[in]num_itemsTotal number of associated key+value pairs (i.e., the length of d_in_keys and d_in_values)
[in]stream[optional] CUDA stream to launch kernels within. Default is stream0.
[in]debug_synchronous[optional] Whether or not to synchronize the stream after every kernel launch to check for errors. May cause significant slowdown. Default is false.

Definition at line 244 of file device_run_length_encode.cuh.


The documentation for this struct was generated from the following file: