The Big R-Book. Philippe J. S. De Brouwer
4.8.3 Databases
Spreadsheets and CSV-file are good carriers for reasonably small datasets. Any company that holds a lot of data will use a database system – see for example Chapter 13 “RDBMS” on page 285. Importing data from a database system is somewhat different. The data is usually structured in “tables” (logical units of rectangular data); however, they will seldom contain all information that we need and usually have too much rows.We need also some protocol to communicate with the database: that is the role of the structured query language (SQL) – see Chapter 14 “SQL on page 291.
database – MySQL
R can connect to many popular database systems. For example, MySQL: as usual there is a package that will provide this functionality.
MySQL
if(!any(grepl(“xls”, installed.packages()))){ install.packages(“RMySQL”)} library(RMySQL)
RMySQL
Connecting to the Database
This is explained in more detail in Chapter 15 “Connecting R to an SQL Database” on page 327, but the code segment below will already get you the essential ingredients.
# The connection is stored in an R object myConnection and # it needs the database name (db_name), username and password myConnection = dbConnect(MySQL(), user = ‘root’, password = ‘xxx’, dbname = ‘db_name’, host = ‘localhost’) # e.g. list the tables available in this database. dbListTables(myConnection)
dbConnect()
dbSendQuery()
Fetching Data Drom a Database
Once connected to the database, we can select and download the data that we will use for our analysis in R.
# Prepare the query for the database result <- dbSendQuery(myConnection, “SELECT * from tbl_students WHERE age > 33”) # fetch() will get us the results, it takes a parameter n, which # is the number of desired records. # Fetch all the records(with n = -1) and store it in a data frame. data <- fetch(result, n = -1)
Update Queries
It is also possible to manipulate data in the database directly from R. This allows use to prepare data in the database first and then download it and do our analysis, while we keep all the code in one file.
fetch()
The dbSendQuery()
function can be used to send any query, including UPDATE, INSERT, CREATE TABLE and DROP TABLE queries so we can push results back to the database.
sSQL = “” sSQL[1] <- “UPDATE tbl_students SET score = ‘A’ WHERE raw_score > 90;” sSQL[2] <- “INSERT INTO tbl_students (name, class, score, raw_score) VALUES (‘Robert’, ‘Grade 0’, 88,NULL);” sSQL[3] <- “DROP TABLE IF EXISTS tbl_students;” for (k in c(1:3)){ dbSendQuery(myConnection, sSQL[k]) }
dbSendQuery()
Create Tables from R Data-frames
Finally, it is possible to write back data from a dataset in R to a table in a database.
dbWriteTable(myConnection, “tbl_name”, data_frame_name[, ], overwrite = TRUE)
dbWriteTable()
Even while connections will be automatically closed when the scope of the database object is lost, it is a good idea to close a connection explicitly. Closing a connection explicitly, makes sure that it is closed and does not remain open if something went wrong, and hence the resources of our RDBMS are freed. Closing a database connection can be done with dbDisconnect(myConnection, …)
.
dbDisconnect()
Notes
1 1 There are different package management systems for different flavours of Linux and discussing them all is not only beyond the scope of this book, but not really necessary.We assume that if you use Genttoo, that you will know what to do or where to choose.
2 a This behaviour is caused by the dispatcher-function implementation of an object-oriented programming model. To understand how this works and what it means, we refer to Section 6.2 “S3 Objects” on page 122.
3 2 The character type in R is what in most other languages would be called a “string.” In other words, that is text that – generally – will not allow much arithmetic.
4 a See Chapter 7 “Tidy R with the Tidyverse” on page 161 and note that tibbles (the data-frame alternative in the tidyverse) do not coerce text to factors.
5 3 If your programming experience is limited, this might seem a little confusing. It is best to accept this and read further. Then, after reading Section 5.1 “Environments in R” on page 110 have another look at this section.
6 4 A direct link to this file (in zip-format) is here: http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.zip?c6f8f9a0a5f970e31538be5271051b3c.
7 5 With R it is also possible to read files directly from the Internet by supplying the URL in to the function read.csv().
♣5♣ Lexical Scoping and Environments
5.1 Environments in R
Environments can be thought of as a set of existing or declared objects (functions, variables, etc.). When we start R, it will create an environment before the command prompt is available to the user.
The top-level environment is the R command prompt. This is the “global environment” and known as R_GlobalEnv
and can be accessed as .GlobalEnv
.
global environment
R_GlobalEnv
As