Table$to_arrow_table

Table$to_arrow_table(max_results=NULL, variables=NULL, batch_preprocessor=NULL, max_parallelization=parallely::availableCores()) → Arrow Table

Returns an Arrow Table representing a table on Redivis. Since arrow is the underlying transport format for Redivis tables, loading data directly into an arrow table will always be the most performant and memory efficient.

Parameters:

max_results : int, default NULL The max number of records to load into the dataframe. If not specified, the entire table will be loaded.

variables : list(str) | character vector The specific variables to return, e.g., variables = c("name", "date") . If not specified, all variables in the table will be returned.

batch_preprocessor : function, default NULL Function used to preprocess the data, invoked for each batch of records as they are initially loaded. This can be helpful in reducing the size of the data before the final table is loaded. The function accepts one argument, an Arrow RecordBatch, and must return a Arrow RecordBatch or NULL. If you prefer to work with the data solely in a streaming manner, see Table$to_arrow_batch_reader()

max_parallelization : int, default parallely::availableCores() The maximum parallelization when loading the table. Uses the future::multicore strategy when supported, falling back to future::multisession if not.

Returns:

Arrow Table

PreviousTable$to_arrow_dataset NextTable$to_data_frame

Last updated 1 year ago

Was this helpful?