Table$to_arrow_batch_reader

Table$to_arrow_batch_reader(max_results=NULL, variables=NULL) → RecordBatchReader

Returns a reader that mimics the Arrow RecordBatchStreamReader, which can then be consumed to process batches of rows in a streaming fashion. By streaming data, you can avoid the need to load more than a small chunk of data into memory at a time. This approach is highly scalable, since it won't be limited by available memory or disk.

The batch reader exposes a $read_next_batch() method, which should be repeatedly called to fetch the next Arrow RecordBatch. If next() returns NULL, this signals that there is no more data left to read.

Parameters:

max_results : int, default NULL The max number of records to load into the dataframe. If not specified, the entire table will be loaded.

variables : list(str) | character vector The specific variables to return, e.g., variables = c("name", "date") . If not specified, all variables in the table will be returned.

Returns:

class RecordBatchReader()

Methods:

$read_next_batch()→ Arrow RecordBatch | NULL Read the next batch of records. If NULL is returned, indicates that there is nothing left to read.

$close()→ void Close the underlying connection and releases all resources.

Notes:

The returned instance largely mirrors the behavior of the Arrow RecordBatchStreamReader, though only the $read_next_batch() method is implemented.

Make sure to call on.exit(batch_reader$close()) to ensure resources are released if your application prematurely exits.

Examples

batch_reader = redivis$table("test_scores")$to_arrow_batch_reader()

# Make sure the batch_reader gets closed when we exit this function,
# including if an error occurs
on.exit(batch_reader$close())

count <- 0
total <- 0

# Streaming works especially well when performing various aggregations
while (!is.null(batch <- reader$read_next_batch())){
  # batch is an instance of an Arrow RecordBatch -> https://arrow.apache.org/docs/r/reference/record_batch.html
  scores <- batch$scores
  count <- count + length(scores)
  total <- sum(scores)
}

print(str_interp("The average of all test cores was ${total/count}"))

PreviousTable$list_variables NextTable$to_arrow_dataset

Last updated 6 months ago

Was this helpful?