bigrquery

An R package for interfacing with the Redivis BigQuery endpoints

Overview

The redivis.bigrquery package is a light fork of the awesome r-dbi/bigrquery package, allowing you to leverage its functionality to interface with tables stored on Redivis. All authentication is managed via your Redivis API credentials.

Please note that the only supported methods are those that involve querying tables. Interfaces involved in listing BigQuery resource, referencing BigQuery datasets, or any calls to create, modify, or delete BigQuery resources are not supported.

Installation

Install from GitHub:

# install.packages('devtools')
devtools::install_github("redivis/bigrquery")

Authentication

The REDIVIS_API_TOKEN environment variable must be set to your Redivis API token, and the token must have data.data scope (or, if referencing sample tables, data.sample).

REDIVIS_API_TOKEN=your_api_token Rscript [your_script.R]
# or, within R
Sys.setenv(REDIVIS_API_TOKEN = "your_api_token")

bigrquery

The bigrquery package makes it easy to work with data stored in Google BigQuery by allowing you to query BigQuery tables and retrieve metadata about your projects, datasets, tables, and jobs. The bigrquery package provides three levels of abstraction on top of BigQuery:

  • The low-level API provides thin wrappers over the underlying REST API. All the low-level functions start with bq_, and mostly have the form bq_noun_verb(). This level of abstraction is most appropriate if you’re familiar with the REST API and you want do something not supported in the higher-level APIs.

  • The DBI interface wraps the low-level API and makes working with BigQuery like working with any other database system. This is most convenient layer if you want to execute SQL queries in BigQuery or upload smaller amounts (i.e. <100 MB) of data.

  • The dplyr interface lets you treat BigQuery tables as if they are in-memory data frames. This is the most convenient layer if you don’t want to write SQL, but instead want dbplyr to write it for you.

Usage

Low-level API

library(redivis.bigrquery)
sql <- "SELECT
provider_id, provider_state, mean_drg_cost
FROM `ianmathews91.medicare_public_example.high_cost_ca_providers`
LIMIT 10"
tb <- bq_project_query(sql)
bq_table_download(tb, max_results = 10)

DBI

library(DBI)
con <- dbConnect(redivis.bigrquery::bigquery())
sql <- "SELECT
provider_id, provider_state, mean_drg_cost
FROM `ianmathews91.medicare_public_example.high_cost_ca_providers`
LIMIT 10"
dbGetQuery(con, sql, n = 10)

dplyr

library(DBI)
library(dplyr)
con <- dbConnect(redivis.bigrquery::bigquery())
providers <- tbl(con, "ianmathews91.medicare_public_example.high_cost_ca_providers")
providers %>%
select(provider_id, provider_state, mean_drg_cost) %>%
head(10) %>%
collect()

Referencing tables

All tables referenced in SQL query strings should follow Redivis entity reference rules.

Useful links