Table.to_pandas_dataframe
Table.to_pandas_dataframe(max_results=None, *, variables=None, progress=True, batch_preprocessor=None, dtype_backend="pyarrow", date_as_object=False, max_parallelization=os.cpu_count()) → pandas.DataFrame
Returns a representation of the table as a Pandas dataframe.
Parameters:
max_results
: int, default None
The maximum number of rows to return. If not specified, all rows in the table will be read.
variables
: list<str>, default None
A list of variable names to read, improving performance when not all variables are needed. If unspecified, all variables will be represented in the returned rows. Variable names are case-insensitive, though the names in the results will reflect the variable's true casing. The order of the columns returned will correspond to the order of names in this list.
progress
: bool, default True
Whether to show a progress bar.
batch_preprocessor
: function, default None
Function used to preprocess the data, invoked for each batch of records as they are initially loaded. This can be helpful in reducing the size of the data before being loaded into a dataframe. The function accepts one argument, a pyarrow.RecordBatch
, and must return a pyarrow.RecordBatch
or None
. If you prefer to work with the data solely in a streaming manner, see Table.to_arrow_batch_iterator()
dtype_backend
: {"pyarrow","numpy","numpy_nullable"}, default "pyarrow"
The data type backend to use for the dataframe. PyArrow is a new datatype available in pandas 2.0, which offers substantially improved performance and memory efficiency, alongside straightforward type mapping to the data in Redivis. Pyarrow dtypes will work with most existing code, and can be converted to numpy dtypes as needed. If you prefer to work with numpy dtypes, consult the pandas data type conversion documentation to learn how Redivis types are mapped to numpy.
date_as_object
: bool, default False
Whether variables of Redivis type date
should be expressed as datetime.date
objects, rather than the default np.datetime64[ns]
. Only relevant for "numpy"|"numpy_nullable"
dtype_backends. Learn more >
max_parallelization
: int, default os.cpu_count()
The maximum number of threads utilized when loading the table.
Returns:
Last updated