Query.to_polars_lazyframe

Query.to_polars_lazyframe(max_results=None, , progress=True, batch_preprocessor=None,* max_parallelization=os.cpu_count()) → polars.LazyFrame

Returns a representation of the query results as a polars.LazyFrame, which can be used for parallel processing and larger-than-memory analysis. The underlying polars lazyframe is backed by Arrow Feather files on disk, meaning that loading a query results in this method will not lead to significant memory consumption. The feather files used by the dataset are stored in your operating system's temp directory, unless the REDIVIS_TMPDIR environment variable is set.

Parameters:

max_results : int, default None The maximum number of rows to return. If not specified, all rows in the query results will be read.

progress : bool, default True Whether to show a progress bar.

batch_preprocessor : function, default None Function used to preprocess the data, invoked for each batch of records as they are initially loaded. This can be helpful in reducing the size of the data before being loaded into a dataframe. The function accepts one argument, a pyarrow.RecordBatch, and must return a pyarrow.RecordBatch or None. If you prefer to work with the data solely in a streaming manner, see Query.to_arrow_batch_iterator()

max_parallelization : int, default os.cpu_count() The maximum number of threads utilized when loading the query.

Returns:

polars.LazyFrame

Last updated 1 year ago

Was this helpful?