Table

Work with tables on Redivis

class Table

Tables are the fundamental data-containing entity in Redivis. Tables belong to either a dataset or workflow, and are made up of rows and variables (columns). Various methods allow you to read table data, as well as to create / update / delete tables belonging to an unreleased version of a dataset.

Certain tables may be file index tables, which represent a collection of non-tabular files, where each row corresponds to a file. There are additional methods available on file index tables that allow for you to interface with these files.

Constructors

redivis.table(name)

Return a Table within the current default scope (either a dataset or workflow). In a Redivis notebook, the default scope will always be the notebook's workflow. If no default scope is specified, the table_reference must be fully qualified (see below). table_reference is a string that identifies the unique table. In some cases this may be the table name, though in others you'll want to include additional information to identify the table and to ensure reproducibility if the table name changes. Consult the referencing resources documentation to learn more. If you are operating within a Redivis notebook, you can specify "_source_" as the table reference to automatically refer to the notebook's source table.

Dataset.table(name)

Return a Table within a specific dataset. The table_reference does not need to be fully qualified, since the table lookup is already scoped to a dataset. Consult the referencing resources documentation to learn more.

Workflow.table(name)

Return a Table within a specific workflow. The table_reference does not need to be fully qualified, since the table lookup is already scoped to a workflow.

Dataset.list_tables()

Returns a list of Tables within a dataset

Workflow.list_tables()

Returns a list of Tables within a workflow

Examples

dataset = redivis.organization("Demo").dataset("iris_species")
table = dataset.table("Iris")

table.exists() # -> True
table.get() # table.properties is now populated with the table resource definition

table.variable("SepalLengthCm") # -> Returns a variable reference
table.to_pandas_dataframe()     # -> Returns a pandas dataframe for the table

dataset = redivis.organization("Demo").dataset("iris_species")
table = dataset.table("Iris")

table.to_pandas_dataframe()
# 	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
# 0	33	5.2	        4.1	        1.5	        0.1	        Iris-setosa
# ...

# Other methods to read data:
# table.to_arrow_batch_iterator()
# table.to_arrow_dataset()
# table.to_arrow_table()
# table.to_geopandas_dataframe()
# table.to_dask_dataframe()
# table.to_polars_lazyframe()

dataset = redivis.user("user_or_organization_name").dataset("my dataset")

# Tables can only be created on an unreleased version. 
# If necessary, create a new version:
# dataset = dataset.create_next_version()

dataset.table("my_new_table").create(description="some description")

# Learn more about uploading data in the Upload documentation
upload = table.upload('data.csv').create('/path/to/file')

dataset = redivis.organization("Demo").dataset("iris_species")
table = dataset.table("Iris")

variables = table.list_variables()

for variable in variables:
    print(variable.properties)

Attributes

dataset

A reference to the Dataset instance that constructed this table. Will be None if the table belongs to a workflow.

workflow

A reference to the Workflow instance that constructed this table. Will be None if the table belongs to a dataset.

properties

A dict containing the API resource representation of the table. This will only be populated after certain methods are called, particularly the get method, and will otherwise be None.

qualified_reference

The fully qualified reference to this table, for use (e.g.) in a SQL query.

For example,

demo.reddit:prpw:v1_0.posts:7q4m

scoped_reference

The canonical reference for the table, without any qualifiers. E.g., posts:7q4m

Methods

Reading data and metadata

Table.download([path, *, format, ...])

Export a table in a particular format and download it to disk.

Table.download_files([path, *, overwrite, ...])

Download all files represented in a file index table to a local directory.

Table.exists()

Check whether the table exists

Table.get()

Fetch table metadata. Once called, the properties attribute on the table will be fully populated.

Table.list_files([max_results, *, ...])

Return a list of File instances in a file index table.

Table.list_rows([max_results, *, variables, ...])

Deprecated. Return a list of named tuples referencing the rows of the table. Use Table.to_arrow_table().to_pydict() instead.

Table.list_variables([max_results])

Return a list of Variable instances associated with this table.

Table.to_arrow_batch_iterator([...])

Return an iterator that yields pyarrow.RecordBatches, for processing the table's data in a memory-efficient streaming manner.

Table.to_arrow_dataset([max_results, ...])

Return a pyarrow.dataset.Dataset for the table. Data is backed by disk, allowing for larger-than-memory analysis.

Table.to_arrow_table([max_results, ...])

Return a pyarrow.Table with the table's data.

Table.to_dataframe([max_results, ...])

Deprecated. Please use to_[geo]pandas_dataframe instead.

Table.to_geopandas_dataframe([...])

Return a geopandas.GeoDataframe. For working with tables that contain a geography variable.

Table.to_dask_dataframe([max_results, ...])

Return a dask.DataFrame. Data is backed by disk, allowing for larger-than-memory analysis.

Table.to_pandas_dataframe([max_results, ...])

Return a pandas.DataFrame with the table's data.

Table.to_polars_lazyframe([max_results])

Return a polars.LazyFrame. Data is backed by disk, allowing for larger-than-memory analysis.

Table.variable(name)

Reference a Variable within the table.

Uploading and modifying data

Table.add_files(*, [files, directory])

Upload non-tabular files to an unreleased file index table.

Table.create([description, ...])

Create a table within a dataset if it doesn't already exist. Table must belong to an unreleased version of the dataset.

Table.delete()

Delete a table belonging to an unreleased version of a dataset.

Table.list_uploads([max_results])

Return a list of uploads on a table

Table.update()

Update properties on the table (name, description).

Table.upload()

Create a reference to an Upload on the table, which can subsequently be used to upload tabular data.

PreviousSecret.get_value NextTable.add_files

Last updated 1 month ago

Was this helpful?