Table.to_arrow_batch_iterator

Table.to_arrow_batch_iterator(max_results=None, , variables=None, progress=True*) → pyarrow.RecordBatch iterator

Returns an iterator that can be used to consume a table in chunks of PyArrow RecordBatches. Allows for streaming workflows where only a small portion of the table is read into memory at a time.

Parameters:

max_results : int, default None The maximum number of rows to return. If not specified, all rows in the table will be read.

variables : list<str>, default None A list of variable names to read, improving performance when not all variables are needed. If unspecified, all variables will be represented in the returned rows. Variable names are case-insensitive, though the names in the results will reflect the variable's true casing. The order of the columns returned will correspond to the order of names in this list.

progress : bool, default True Whether to show a progress bar.

Yields:

pyarrow.RecordBatch

Last updated 1 year ago

Was this helpful?