Upload$create

Upload$create(content = NULL, ...args) → Upload

Creates a new upload on a table and sends the provided data. The table must belong to an unreleased version, otherwise the function will throw an error. After calling create, the properties attribute on the Upload will be fully populated with the contents of the upload.get resource definition.

Parameters:

content : char | connection | raw | data.frame | arrow.Table | arrow.Dataset | sf.tibble The content of the upload. If a character vector, assumed to be the local path to the file. If a connection, can be a connection to a file, URL, or other stream. If a raw vector, assumed to be the contents of the file. Other formats are converted to parquet and then uploaded. This argument is required, unless either 1) the upload is of type="stream" , in which case data must be omitted (and sent later via upload.insert_rows()); or 2) a transfer_specification is provided. Can either be an open file object, string, or other file-like io stream.

type : str The type of file being uploaded. A list of valid types can be found in the upload.post API documentation. If uploading data from a particular variable (e.g., a data.frame), this parameter will be ignored, as Redivis internally converts this value to parquet and sets the type accordingly. Otherwise, if no type is provided, the type will be inferred based on any file extension in the upload's name, or an error will be thrown if the file extension isn't recognized.

transfer_specification : list<sourceType, sourcePath, identity> Used for transferring files from an external source, such as s3, a URL, or another table on Redivis. The values provided should match the specification for transferSpecification in the upload.post payload.

metadata : list<name=list<label, description, valueLabels>> Provide optional metadata for the variables in the file. This parameter is a named list of variable names mapping to the metadata for that variable, which is a named containing any of label=str, description=str, and valueLabels=list(value=str, label=str). Variable names are matched case-insensitive, and extraneous metadata entries are ignored.

if_not_exists : bool, default FALSE Only create the upload if an upload with this name doesn't already exist, otherwise, return the current upload. If set to False and an upload with the given name already exists (for the current version of the table), an error will be thrown, unless if rename_on_conflict=TRUE.

rename_on_conflict : bool, default FALSE By default, creating an upload with the same name as one that already exists for the particular table + version will raise an error. If set to True, a new upload will be created, with a counter added to its name to ensure name uniqueness across all uploads on the current version of the table. This option will be ignored if if_not_exists == TRUE. Only one of rename_on_conflict and replace_on_conflict may be TRUE.

replace_on_conflict : bool, default FALSE By default, creating an upload with the same name as one that already exists for the particular table + version will raise an error. If set to True, the previous upload with the same name will be deleted, and then this upload will be created. This option will be ignored if if_not_exists == TRUE. Only one of rename_on_conflict and replace_on_conflict may be TRUE.

skip_bad_records : bool, default FALSE Whether to ignore invalid or unparsable records. If FALSE, the upload will fail if it encounters any bad records. If True, the badRecordsCount attribute will be set in the upload properties.

allow_jagged_rows : bool, default FALSE Whether to allow rows that have more or fewer columns than the header row. Use caution when setting to true, as jagged rows often suggest a parsing issue; ignoring those errors could lead to data corruption.

remove_on_fail : bool, default FALSE If True, the upload will automatically be deleted if the import fails.

wait_for_finish : bool, default TRUE If True, wait for the upload to be fully imported before returning. If False, will return as soon as the data has been transferred to Redivis, but before it has been fully validated and processed. When False, remove_on_fail is ignored.

raise_on_fail : bool, default FALSE Whether to raise an exception if the upload fails.

progress : bool, default TRUE Whether to show progress bar.

schema : list<list<name, type>> Only relevant for uploads of type "stream". Defines an initial schema that will be validated on subsequent calls to insertRows. Takes the form: list(list(name="var_name", type="integer"), list(...), ...)

delimiter : str Only relevant for delimited type, the character used as the delimiter in the data (often, a comma ,). If not specified, will be automatically inferred by scanning the first 10,000 records (up to 100MB) of the file.

has_header_row : bool, default TRUE Only relevant for the delimited type; whether the first row of the data is a header containing variable names.

has_quoted_newlines : bool Only relevant for the delimited type. Set to True if you know there are line breaks within any of the data values in the file, at the tradeoff of substantially reduced import performance. If not specified, will be automatically inferred by scanning the first 10,000 records (up to 100MB) of the file.

quote_character : str Only relevant for the delimited type. The character used to escape fields that contain the delimiter (most often " for compliant delimited files). If set to None, Redivis will attempt to auto-infer the quote character by scanning the first 10,000 records (up to 100MB) of the file.

null_markers : list(str) A list of strings (up to 10 items) that should be interpreted as NULL values. For example, if your data contains the value "NA" to represent null (common when exporting CSVs from R), you can specific "NA" as a null marker to appropriately interpret these fields as NULLs.

escape_character : str Only relevant for delimited type. The character that precedes any occurrences of the quote character when it should be treated as its literal value, rather than the start or end of a quote sequence (typically, the escape character will match the quote character, but sometimes is represented as a backward slash \). If set to None, Redivis will attempt to auto-infer the quote character by scanning the first 10,000 records (up to 100MB) of the file.

Returns:

Upload

Examples

See more uploading data examples ->

dataset <- redivis$user("user_name")$dataset("dataset_name", version="next")
table <- dataset$table("table_name")

upload <- table$upload()$create(
    "./data.csv",           # Path to file, data.frame, raw vector, etc
    type="delimited",       # Inferred from file extension if not provided
    remove_on_fail=True,    # Remove the upload if a failure occurs
    wait_for_finish=True,   # Wait for the upload to finish processing
    raise_on_fail=True      # Raise an error on failure
)

dataset <- redivis$user("user_name")$dataset("dataset_name", version="next")
table <- dataset$table("table_name")

dir_path <- "/path/to/data/directory"

for (filename in list.files(dir_path)){
    upload <- table$upload()$create(content=base::file.path(dir_path, filename))
}

PreviousUpload NextUpload$delete

Last updated 6 days ago

Was this helpful?