Query

class Query

Used to execute a SQL query against table(s) in Redivis, using the Redivis SQL query syntax, and read out the results.

Constructors

redivis.query(query_string)

Execute a SQL query within the current default scope (either a dataset or workflow). In a Redivis notebook, the default scope will always be the notebook's workflow, and the notebook's source table can be referenced via the _source_ identifier. If no default scope is specified, all tables in the query must be fully qualified. Consult the referencing resources documentation to learn more.

Dataset.query(query_string)

Execute a SQL query scoped to a specific dataset. Tables referenced by the query do not need to be fully qualified, since the table lookup is already scoped to the dataset. Consult the referencing resources documentation to learn more.

Workflow.query(query_string)

Execute a SQL query scoped to a specific workflow. Tables referenced by the query do not need to be fully qualified, since the table lookup is already scoped to the dataset. Consult the referencing resources documentation to learn more.

Examples

# Execute any SQL query and read the results
query = redivis.query("SELECT 1 + 1 AS two, 'foo' AS bar")
query.to_pandas_dataframe()
# 	two	bar
# 0	2	foo

# The query can reference any table on Redivis 
query = redivis.query("""
    SELECT * 
    FROM demo.iris_species.iris 
    WHERE SepalLengthCm > 5
""")
query.to_pandas_dataframe()
# 	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
# 0	33	5.2	        4.1	        1.5	        0.1	        Iris-setosa
# ...

# Other methods to read data:
# query.to_arrow_batch_iterator()
# query.to_arrow_dataset()
# query.to_arrow_dataset()
# query.to_geopandas_dataframe()
# query.to_dask_dataframe()
# query.to_polars_lazyframe()

Attributes

properties

A dict containing the API resource representation of the query. This will always be populated after the query has been created, and can be refreshed by calling query.get()

Methods

Query.download_files([path, *, overwrite, ...])

Download all files represented by a file_id variable in the query results to a local directory.

Fetch query metadata. Once called, the properties attribute on the query will be fully populated.

Query.list_files([max_results, *, ...])

Return a list of File instances for query results containing a file_id variable.

Query.list_rows([max_results, *, variables, ...])

Deprecated. Return a list of named tuples referencing the rows of the query results. Use Query.to_arrow_table().to_pydict() instead.

Return an iterator that yields pyarrow.RecordBatches, for processing the query results in a memory-efficient streaming manner.

Query.to_arrow_dataset([max_results, ...])

Return a pyarrow.dataset.Dataset for the query results. Data is backed by disk, allowing for larger-than-memory analysis.

Query.to_arrow_table([max_results, ...])

Return a pyarrow.Table with the query results.

Query.to_dataframe([max_results, ...])

Deprecated. Please use to_[geo]pandas_dataframe instead.

Return a geopandas.GeoDataframe. For working with query results that contain a geography variable.

Query.to_dask_dataframe([max_results, ...])

Return a dask.DataFrame. Data is backed by disk, allowing for larger-than-memory analysis.

Query.to_pandas_dataframe([max_results, ...])

Return a pandas.DataFrame with the query results.

Return a polars.LazyFrame. Data is backed by disk, allowing for larger-than-memory analysis.

Last updated