Query.to_arrow_table
Query.to_arrow_table(max_results=None, *, progress=True, batch_preprocessor=None, max_parallelization=os.cpu_count()) → pyarrow.Table
Returns a representation of the query results as a PyArrow Table. Since arrow is the underlying transport format for Redivis data, loading data directly into an arrow table will always be the most performant in-memory option.
Parameters:
max_results
: int, default None
The maximum number of rows to return. If not specified, all rows in the query results will be read.
progress
: bool, default True
Whether to show a progress bar.
batch_preprocessor
: function, default None
Function used to preprocess the data, invoked for each batch of records as they are initially loaded. This can be helpful in reducing the size of the data before being loaded into a dataframe. The function accepts one argument, a pyarrow.RecordBatch
, and must return a pyarrow.RecordBatch
or None
. If you prefer to work with the data solely in a streaming manner, see Query.to_arrow_batch_iterator()
max_parallelization
: int, default os.cpu_count()
The maximum number of threads utilized when loading the query.
Returns:
Last updated