Referencing resources

Overview

In many parts of this API, we will be referencing table resources on Redivis, as well as related dataset and project resources. In the user interface, we generally interact with tables via their assigned names. However, these names can pose a challenge when using the API because:

  1. Dataset, project, and table names can be modified at any time

  2. Names can contain a wide array of characters that could cause issues (e.g., when referencing tables within SQL queries).

However, the ability to interface with these resources with human-readable names allows for quicker prototyping and reduces the risk of error. As such, Redivis provides a flexible mechanism to programmatically reference these resources, with the goal of allowing for intuitive and expressive code.

All table resources contain a qualifiedReference field that can be used for subsequent calls within this API or in a SQL query. Additionally, you can find this information by clicking the "API Information" link on a dataset page, or the "Download" button on tables within a project.

On a dataset's page, view API information by clicking the link in the right panel.
From within a project, click the "Download" button on any table to view API information.

General structure

All tables on Redivis belong to either a dataset or a project. All datasets belong to either a user or organization, and all project belong to a user.

A table reference reflects this hierarchy, taking the following form:

ownerName.projectIdentifier|datasetIdentifier.tableIdentifier

The ownerName will be the shortName of the user or organization that owns the dataset or project:

ownerName = userShortName|organizationShortName

The project identifier consists of the (escaped) project name and/or the project reference id, prefaced by a colon:

projectIdentifier = {projectName:projectReferenceId}

The dataset identifier consists of the (escaped) dataset name and/or the dataset reference id, prefaced by a colon. Additionally, the dataset identifier may contain a sample flag :sample as well as a version identifier. The version identifier will identify a particular version of the dataset (of the form, :v1_0, the current version :current, or the next (unreleased) version :next .

datasetIdentifier = datasetName:datasetReferenceId[:sample][versionIdentifier]
versionIdentifier = :v1_0|:current|:next

Finally, the table identifier consists of the (escaped) table name and/or the table reference id, prefaced by a colon.

tableIdentifier = tableName:tableReferenceId

Escaping names

The organizationName and userName will never need to be escaped, as these names can only contain word characters ([A-Za-z0-9_]). Do note that these names are case-insensitive.

Dataset, project, and table names can contain a wide array of characters. To facilitate programmatic references, these names can be escaped with the following rules:

  1. All non alpha-numeric and underscore characters in names and version tags are replaced by an underscore (_) character.

  2. Multiple underscore characters are collapsed into one.

  3. Leading and trailing underscores are removed.

  4. All names are case-insensitive.

For example:

  • Census dataset: 1940-1980 -> census_dataset_1940_1980

  • ~~Leading and trailing characters. -> leading_and_trailing_characters

Uniqueness is enforced for all escaped names within the relevant scope. For example, all tables in a project and all datasets in an organization will have a unique escaped name.

If a name contains colons (:), periods (.), or backticks (`), they must be escaped.

API Examples

We can list all datasets owned by a user or organization via the datasets/list endpoint:

GET /api/v1/users/imathews/datasets
GET /api/v1/organizations/stanfordphs/datasets

We can then get a specific dataset by referencing its name:

GET /api/v1/datasets/stanfordphs.epa_air_quality
# to ensure that this is a permanent reference (even if the name changes)
# we can add a reference id
GET /api/v1/datasets/stanfordphs.epa_air_quality:155

We can get the Ozone (O3) table on this dataset at:

GET /api/v1/tables/stanfordphs.epa_air_quality:155.ozone_o3
# to ensure that this is a permanent reference (even if the table name changes)
# we can add a persistent id
GET /api/v1/tables/stanfordphs.epa_air_quality:155.ozone_o3:1

By default this gets the latest version. If we want to pin it to a specific version, we can a version tag:

GET /api/v1/tables/stanfordphs.epa_air_quality:v1_0:155.ozone_o3:1

SQL Examples

We can reference the Ozone (O3) table from the EPA Air Quality dataset as:

SELECT state FROM `stanfordphs.epa_air_quality.ozone_o3`

By default this uses the latest version of the dataset. If we want to work with version 1.0:

SELECT state FROM `stanfordphs.epa_air_quality:v1_0.ozone_o3`

If we want to work with the 1% sample:

# The order of the suffixes does not matter
SELECT state FROM `stanfordphs.epa_air_quality:v1_0:sample.ozone_o3`

Finally, we can provide ids to prevent our reference from breaking if a dataset or table is renamed:

SELECT state
FROM `stanfordphs.epa_air_quality:155:v1_0:sample.ozone_o3:1`

Referencing tables in a project is quite similar, though projects don’t have versions or samples:

SELECT mean_drg_cost
FROM ianmathews91.medicare_public_example.high_cost_in_providers_in_CA_outpu