Upload.insert_rows

Upload.insert_rows(rows, , update_schema=False*) → Dict<insertRows response>

Insert rows into the upload. Can only be called on unreleased uploads of type "stream". Should be called at most once per second, per upload; for increased performance try batching multiple rows into a single request, up to a limit of 10MB per request.

Parameters:

rows : list<dict<varname, val>> The rows to insert. A list of dicts, with each dict representing a single row, where the keys are the variable names, and the values are the value for that variable in that row. E.g., [{ "var1": 1, "var2": "foo"}, { "var1": None, "var2": "bar" }]

update_schema: bool, default False Whether to automatically update the schema as new rows come in, relaxing variable types and adding new variables. If False, an error will be thrown if any of the rows in the insert request would cause a schema update. Note that there is a significant performance overhead to passing update_schema = True

Returns:

Dict<insertRows response>

Examples

import redivis

dataset = redivis.user("user_name").dataset("dataset_name", version="next")
table = dataset.table("table_name")

# schema is optional if update_schema is set to True on the insert_rows request
schema = [
    { "name": "var1", "type": "string" }, 
    { "name": "var2", "type": "integer" },
    { "name": "var3", "type": "dateTime" }
]

rows = [
    { "var1": "hello", "var2": 1, "var3": None },
    # dateTime must be in the format YYYY-MM-DD[ |T]HH:MM:SS[.ssssss]
    { "var1": "world", "var2": 2, "var3": "2020-01-01T00:00:00.123" }
]

# Reference each upload with its name, which must be unique amongst other uploads
#   for the current version of this table.
upload = table.upload(name="some_streamed_data")

# Only call create if the upload doesn't already exist
upload.create(
    type="stream", 
    # schema is optional if update_schema is set to True on insert_rows
    schema=schema,
    # If True, will only create the upload if an upload with this name doesn't already exist
    # Otherwise, a counter will be added to the name to preserve name uniqueness          
    if_not_exists=False,
    # If skip_bad_records is True, ignore records that are incompatible with the existing schema. 
    # This has no effect when update_schema is set to True on the insert_rows request.  
    skip_bad_records=False # Optional, default is False
) 

insert_response = upload.insert_rows(
    rows, 
    # If update_schema is set to True, variables can be added by subsequent streams,
    #    and variable types will be relaxed if new values are incompatible with the previous type.
    # If False, an error will be thrown if a row would cause a schema update, 
    #    unless skip_bad_records is set to True on the upload (in which case they'll be ignored)
    update_schema=False,
)

# See REST API / uploads / insertRows
print(insert_response)

PreviousUpload.get NextUpload.list_variables

Last updated 7 months ago

Was this helpful?