Reference

The redivis python modules provides an interface to construct representations of Redivis entities and to create, modify, read, and delete them.

Resources are generally constructed by chaining together multiple constructor methods, reflecting the hierarchical nature of entities in Redivis. For example, to list all variables on table (which belongs to a dataset in an organization), we would write:

import redivis

variables = (
    redivis.organization("Demo")      # Returns an instance of an Organization
    .dataset("CMS 2014 Medicare Data" # Returns an instance of a Dataset
    .table("Home health agencies")    # Returns an instance of a Table
    .list_variables()                 # Retuns a list of Variable instances
)

Interfaces

The redivis namespace. Provides constructor methods for most of the other classes.

Class representing a Redivis dataset. Provides constructor methods for Tables and Queries scoped to a given dataset, as well as methods for creating, deleting, and updating datasets.

Class representing a non-tabular file on Redivis.

Class representing a Redivis organization. Provides constructor methods for Datasets and Workflows scoped to a given organization.

Class representing a Redivis workflow. Provides constructor methods for Tables and Queries scoped to a given workflow.

Class representing a running SQL query that references tables on Redivis.

Class representing a tabular data upload on a Table.

Class representing a Redivis table. Numerous methods available for reading data from the table, as well as uploading data and metadata.

Class representing a Redivis user. Provides constructor methods for Datasets and Workflows scoped to a given user.

Class representing a specific variable with a Table.

Environment variables

The following environment variables may be set to modify the behavior of the redivis-python client.

REDIVIS_API_TOKEN

If using this library in an external environment, you'll need set this env variable to your API token in order to authenticate. This is not relevant for code executed in Redivis notebooks.

Important: this token acts as a password, and should never be inlined in your code, committed to source control, or otherwise published.

REDIVIS_DEFAULT_WORKFLOW

If set, tables referenced via redivis.table() and unqualified table names in redivis.query() will be assumed to be within the default workflow. In Redivis notebooks, this environment variable is always set to the current workflow.

Takes the form user_name.workflow_identifier. All notebooks on Redivis automatically set the default workflow to that notebook's workflow. Learn more about referencing resources >

REDIVIS_DEFAULT_DATASET

If set, tables referenced via redivis.table() and unqualified table names in redivis.query() will be assumed to be within the default dataset.

Takes the form owner_name.dataset_identifier. If both a default dataset and workflow are set, the default workflow will supersede the dataset. Learn more about referencing resources >

REDIVIS_TMPDIR

If set, this directory will be used to temporarily store data for disk-backed data objects (e.g., see Table.to_arrow_dataset). Otherwise, the default OS temp directory will be used.

Pandas datatype conversions

When reading data from a table, query, or upload, you have the option to return the results as a pandas DataFrame. By default, this dataframe will use the pyarrow dtype backend, whose types easily map to how the data is stored in Redivis tables. Pyarrow is recommended when possible, as it avoids many of the challenges with nullable data inherent with the numpy datatypes (and can easily be converted as needed).

However, to mimic historic pandas behavior, you can instead specify dtype_backend={"numpy","numpy_nullable"}. The latter utilizes the experimental nullable datatypes in Pandas, which allows for nullable booleans and integers.

When loading a Redivis table into pandas using these dtypes, the following conversions will apply:

Redivis type
Pandas type

Float

float64

DateTime

pd.Timestamp (np.datetime64[ns])

Date

pd.Timestamp (np.datetime64[ns])

Time

object (with datetime.time objects)

Geography

If a pandas.DataFrame: str If a geopandas.GeoDataFrame: geopandas.GeoSeries

dtype="numpy"

Boolean

bool

Boolean with nulls

object (with values True, False, None)

Integer

int64

Integer with nulls

float64

String

str

dtype="numpy_nullable"

Boolean

pd.BooleanDtype()

Integer

pd.Int64Dtype()

String

pd.StringDtype()

For Date variables, if you prefer to work with datetime.date objects, rather than NumPy's dateTime64 dtype, provide the argument date_as_object=True

Last updated