Redivis API
User documentationredivis.com
  • Introduction
  • Referencing resources
  • Client libraries
    • redivis-js
      • Getting started
      • Examples
    • redivis-python
      • Getting started
      • Reference
        • redivis
          • redivis.current_notebook
          • redivis.file
          • redivis.organization
          • redivis.query
          • redivis.table
          • redivis.user
        • Dataset
          • Dataset.add_labels
          • Dataset.create
          • Dataset.create_next_version
          • Dataset.delete
          • Dataset.exists
          • Dataset.get
          • Dataset.list_tables
          • Dataset.list_versions
          • Dataset.query
          • Dataset.release
          • Dataset.remove_labels
          • Dataset.table
          • Dataset.unrelease
          • Dataset.update
          • Dataset.version
        • File
          • File.download
          • File.get
          • File.read
          • File.stream
        • Member
          • Member.add_labels
          • Member.exists
          • Member.get
          • Member.remove_labels
          • Member.update
        • Notebook
          • Notebook.create_output_table
        • Organization
          • Organization.dataset
          • Organization.list_datasets
          • Organization.list_members
          • Organization.member
        • Query
          • Query.download_files
          • Query.get
          • Query.list_files
          • Query.list_rows
          • Query.to_arrow_batch_iterator
          • Query.to_arrow_dataset
          • Query.to_arrow_table
          • Query.to_dataframe
          • Query.to_geopandas_dataframe
          • Query.to_dask_dataframe
          • Query.to_pandas_dataframe
          • Query.to_polars_lazyframe
        • Table
          • Table.add_files
          • Table.create
          • Table.delete
          • Table.download
          • Table.download_files
          • Table.get
          • Table.exists
          • Table.list_files
          • Table.list_rows
          • Table.list_uploads
          • Table.list_variables
          • Table.to_arrow_batch_iterator
          • Table.to_arrow_dataset
          • Table.to_arrow_table
          • Table.to_dataframe
          • Table.to_geopandas_dataframe
          • Table.to_dask_dataframe
          • Table.to_pandas_dataframe
          • Table.to_polars_lazyframe
          • Table.update
          • Table.upload
          • Table.variable
        • Upload
          • Upload.create
          • Upload.delete
          • Upload.exists
          • Upload.get
          • Upload.insert_rows
          • Upload.list_variables
          • Upload.to_*
        • Version
          • Version.dataset
          • Version.delete
          • Version.exists
          • Version.get
          • Version.previous_version
          • Version.next_version
        • User
          • User.dataset
          • User.list_datasets
          • User.workflow
          • User.list_workflows
        • Variable
          • Variable.get
          • Variable.exists
          • Variable.update
        • Workflow
          • Workflow.get
          • Workflow.exists
          • Workflow.list_tables
          • Workflow.query
          • Workflow.table
      • Examples
        • Listing resources
        • Querying data
        • Reading tabular data
        • Uploading data
        • Working with non-tabular files
    • redivis-r
      • Getting started
      • Reference
        • redivis
          • redivis$current_notebook
          • redivis$file
          • redivis$organization
          • redivis$query
          • redivis$table
          • redivis$user
        • Dataset
          • Dataset$create
          • Dataset$create_next_version
          • Dataset$delete
          • Dataset$exists
          • Dataset$get
          • Dataset$list_tables
          • Dataset$query
          • Dataset$release
          • Dataset$table
          • Dataset$unrelease
          • Dataset$update
        • File
          • File$download
          • File$get
          • File$read
          • File$stream
        • Notebook
          • Notebook$create_output_table
        • Organization
          • Organization$dataset
          • Organization$list_datasets
        • Query
          • Query$download_files
          • Query$get
          • Query$list_files
          • Query$to_arrow_batch_reader
          • Query$to_arrow_dataset
          • Query$to_arrow_table
          • Query$to_data_frame
          • Query$to_data_table
          • Query$to_tibble
          • Query$to_sf_tibble
        • Table
          • Table$add_files
          • Table$create
          • Table$delete
          • Table$download
          • Table$download_files
          • Table$get
          • Table$exists
          • Table$list_files
          • Table$list_uploads
          • Table$list_variables
          • Table$to_arrow_batch_reader
          • Table$to_arrow_dataset
          • Table$to_arrow_table
          • Table$to_data_frame
          • Table$to_data_table
          • Table$to_tibble
          • Table$to_sf_tibble
          • Table$update
          • Table$upload
          • Table$variable
        • Upload
          • Upload$create
          • Upload$delete
          • Upload$exists
          • Upload$get
          • Upload$insert_rows
          • Upload$list_variables
          • Upload$to_*
        • User
          • User$dataset
          • User$list_datasets
          • User$workflow
          • User$list_workflows
        • Variable
          • Variable$get
          • Variable$exists
          • Variable$update
        • Workflow
          • Workflow$get
          • Workflow$exists
          • Workflow$list_tables
          • Workflow$query
          • Workflow$table
      • Examples
        • Listing resources
        • Querying data
        • Reading tabular data
        • Uploading data
        • Working with non-tabular data
  • REST API
    • General structure
    • Authorization
    • Access
      • get
      • list
    • Datasets
      • delete
      • get
      • list
      • patch
      • post
    • Exports
      • download
      • get
      • post
    • Files
      • createSignedUrl
      • get
      • head
      • post
    • Members
      • get
      • list
    • Queries
      • get
      • post
      • listRows
    • ReadSessions
      • post
      • getStream
    • Tables
      • createTempUploads
      • delete
      • get
      • list
      • listRows
      • patch
      • post
    • Uploads
      • delete
      • get
      • insertRows
      • list
      • listRows
      • post
    • Variables
      • get
      • list
      • patch
    • Versions
      • delete
      • get
      • list
      • post
      • release
      • unrelease
    • Workflows
      • get
      • list
  • Resource definitions
    • Access
    • Dataset
    • Export
    • Member
    • Organization
    • Query
    • Table
    • Upload
    • User
    • Variable
    • Version
    • Workflow
Powered by GitBook
On this page
  • Query.to_arrow_dataset(max_results=None, *, progress=True, batch_preprocessor=None, max_parallelization=os.cpu_count()) → pyarrow.dataset.Dataset
  • Parameters:
  • Returns:

Was this helpful?

  1. Client libraries
  2. redivis-python
  3. Reference
  4. Query

Query.to_arrow_dataset

PreviousQuery.to_arrow_batch_iteratorNextQuery.to_arrow_table

Last updated 11 months ago

Was this helpful?

Query.to_arrow_dataset(max_results=None, *, progress=True, batch_preprocessor=None, max_parallelization=os.cpu_count()) →

Returns a representation of the query results as a pyarrow Dataset. Pyarrow datasets are backed by files on disk, rather than in memory, allowing you to load a query's results without contributing to memory usage. The file used by the dataset is stored in your operating system's temp directory, unless the REDIVIS_TMPDIR environment variable is set.

Parameters:

max_results : int, default None The maximum number of rows to return. If not specified, all rows in the query results will be read.

progress : bool, default True Whether to show a progress bar.

batch_preprocessor : function, default None Function used to preprocess the data, invoked for each batch of records as they are initially loaded. This can be helpful in reducing the size of the data before being loaded into a dataframe. The function accepts one argument, a , and must return a pyarrow.RecordBatch or None. If you prefer to work with the data solely in a streaming manner, see

max_parallelization : int, default os.cpu_count() The maximum number of threads utilized when loading the query.

Returns:

See also

pyarrow.dataset.Dataset
pyarrow.RecordBatch
Query.to_arrow_batch_iterator()
pyarrow.dataset.Dataset
Query.to_arrow_batch_iterator()
Query.to_arrow_table()
Query.to_geopandas_dataframe()
Query.to_dask_dataframe()
Query.to_pandas_dataframe()
Query.to_polars_lazyframe()