Reference
The redivis
python modules provides an interface to construct representations of Redivis entities and to create, modify, read, and delete them.
Resources are generally constructed by chaining together multiple constructor methods, reflecting the hierarchical nature of entities in Redivis. For example, to list all variables on table (which belongs to a dataset in an organization), we would write:
Interfaces
The redivis namespace. Provides constructor methods for most of the other classes.
Class representing a Redivis dataset. Provides constructor methods for Tables and Queries scoped to a given dataset, as well as methods for creating, deleting, and updating datasets.
Class representing a non-tabular file on Redivis.
Class representing a Redivis organization. Provides constructor methods for Datasets and Workflows scoped to a given organization.
Class representing a Redivis workflow. Provides constructor methods for Tables and Queries scoped to a given workflow.
Class representing a running SQL query that references tables on Redivis.
Class representing a tabular data upload on a Table.
Class representing a Redivis table. Numerous methods available for reading data from the table, as well as uploading data and metadata.
Class representing a Redivis user. Provides constructor methods for Datasets and Workflows scoped to a given user.
Class representing a specific variable with a Table.
Environment variables
The following environment variables may be set to modify the behavior of the redivis-python
client.
REDIVIS_API_TOKEN
If using this library in an external environment, you'll need set this env variable to your API token in order to authenticate. This is not relevant for code executed in Redivis notebooks.
Important: this token acts as a password, and should never be inlined in your code, committed to source control, or otherwise published.
REDIVIS_DEFAULT_WORKFLOW
If set, tables referenced via redivis.table()
and unqualified table names in redivis.query()
will be assumed to be within the default workflow. In Redivis notebooks, this environment variable is always set to the current workflow.
Takes the form user_name.workflow_identifier
. All notebooks on Redivis automatically set the default workflow to that notebook's workflow. Learn more about referencing resources >
REDIVIS_DEFAULT_DATASET
If set, tables referenced via redivis.table()
and unqualified table names in redivis.query()
will be assumed to be within the default dataset.
Takes the form owner_name.dataset_identifier
. If both a default dataset and workflow are set, the default workflow will supersede the dataset. Learn more about referencing resources >
REDIVIS_TMPDIR
If set, this directory will be used to temporarily store data for disk-backed data objects (e.g., see Table.to_arrow_dataset). Otherwise, the default OS temp directory will be used.
Pandas datatype conversions
When reading data from a table, query, or upload, you have the option to return the results as a pandas DataFrame. By default, this dataframe will use the pyarrow dtype backend, whose types easily map to how the data is stored in Redivis tables. Pyarrow is recommended when possible, as it avoids many of the challenges with nullable data inherent with the numpy datatypes (and can easily be converted as needed).
However, to mimic historic pandas behavior, you can instead specify dtype_backend={"numpy","numpy_nullable"}
. The latter utilizes the experimental nullable datatypes in Pandas, which allows for nullable booleans and integers.
When loading a Redivis table into pandas using these dtypes, the following conversions will apply:
Float
float64
DateTime
pd.Timestamp
(np.datetime64[ns]
)
Date
pd.Timestamp
(np.datetime64[ns]
)
Time
object
(with datetime.time
objects)
Geography
If a pandas.DataFrame: str
If a geopandas.GeoDataFrame: geopandas.GeoSeries
dtype="numpy"
Boolean
bool
Boolean
with nulls
object
(with values True
, False
, None
)
Integer
int64
Integer
with nulls
float64
String
str
dtype="numpy_nullable"
Boolean
pd.BooleanDtype()
Integer
pd.Int64Dtype()
String
pd.StringDtype()
For Date
variables, if you prefer to work with datetime.date
objects, rather than NumPy's dateTime64 dtype, provide the argument date_as_object=True
Last updated