Query

class Query

Used to execute a SQL query against table(s) in Redivis, using the Redivis SQL query syntax, and read out the results.

Constructors

redivis.query(query_string)

Execute a SQL query within the current default scope (either a dataset or workflow). In a Redivis notebook, the default scope will always be the notebook's workflow, and the notebook's source table can be referenced via the _source_ identifier. If no default scope is specified, all tables in the query must be fully qualified. Consult the referencing resources documentation to learn more.

Dataset.query(query_string)

Execute a SQL query scoped to a specific dataset. Tables referenced by the query do not need to be fully qualified, since the table lookup is already scoped to the dataset. Consult the referencing resources documentation to learn more.

Workflow.query(query_string)

Execute a SQL query scoped to a specific workflow. Tables referenced by the query do not need to be fully qualified, since the table lookup is already scoped to the dataset. Consult the referencing resources documentation to learn more.

Examples

# Execute any SQL query and read the results
query = redivis.query("SELECT 1 + 1 AS two, 'foo' AS bar")
query.to_pandas_dataframe()
# 	two	bar
# 0	2	foo

# The query can reference any table on Redivis 
query = redivis.query("""
    SELECT * 
    FROM demo.iris_species.iris 
    WHERE SepalLengthCm > 5
""")
query.to_pandas_dataframe()
# 	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
# 0	33	5.2	        4.1	        1.5	        0.1	        Iris-setosa
# ...

# Other methods to read data:
# query.to_arrow_batch_iterator()
# query.to_arrow_dataset()
# query.to_arrow_dataset()
# query.to_geopandas_dataframe()
# query.to_dask_dataframe()
# query.to_polars_lazyframe()

# To simplify table references, execute a query scoped to a dataset or workflow
dataset = redivis.organization("Demo").dataset("CMS 2014 Medicare Data")
query = dataset.query("""
    SELECT 
        hospice_providers.name, 
        inpatient_charges.drg_definition
    -- The tables inpatient_chargers, hospice_providers are assumed to be 
    -- within the scoped dataset
    FROM inpatient_charges
    INNER JOIN hospice_providers 
        ON hospice_providers.provider_id = inpatient_charges.provider_id
""")

# In a notebook, all queries are scoped to the current workflow.
# Additionally, the notebooks source table can simply be referenced as _source_
query = redivis.query("SELECT * FROM _source_ LIMIT 10")

Attributes

properties

A dict containing the API resource representation of the query. This will always be populated after the query has been created, and can be refreshed by calling query.get()

Methods

Query.download_files([path, *, overwrite, ...])

Download all files represented by a file_id variable in the query results to a local directory.

Query.get()

Fetch query metadata. Once called, the properties attribute on the query will be fully populated.

Query.list_files([max_results, *, ...])

Return a list of File instances for query results containing a file_id variable.

Query.list_variables([max_results, *, ...])

Return a list of Variable instances associated with this query's output.

Query.list_rows([max_results, *, variables, ...])

Deprecated. Return a list of named tuples referencing the rows of the query results. Use Query.to_arrow_table().to_pydict() instead.

Query.to_arrow_batch_iterator([...])

Return an iterator that yields pyarrow.RecordBatches, for processing the query results in a memory-efficient streaming manner.

Query.to_arrow_dataset([max_results, ...])

Return a pyarrow.dataset.Dataset for the query results. Data is backed by disk, allowing for larger-than-memory analysis.

Query.to_arrow_table([max_results, ...])

Return a pyarrow.Table with the query results.

Query.to_dataframe([max_results, ...])

Deprecated. Please use to_[geo]pandas_dataframe instead.

Query.to_geopandas_dataframe([...])

Return a geopandas.GeoDataframe. For working with query results that contain a geography variable.

Query.to_dask_dataframe([max_results, ...])

Return a dask.DataFrame. Data is backed by disk, allowing for larger-than-memory analysis.

Query.to_pandas_dataframe([max_results, ...])

Return a pandas.DataFrame with the query results.

Query.to_polars_lazyframe([max_results])

Return a polars.LazyFrame. Data is backed by disk, allowing for larger-than-memory analysis.

Query.variable(name)

Reference a Variable within the query's output.

PreviousOrganization.secret NextQuery.download_files

Last updated 17 days ago

Was this helpful?