The Big R-Book. Philippe J. S. De Brouwer

The Big R-Book - Philippe J. S. De Brouwer


Скачать книгу

      4.8.3 Databases

      Spreadsheets and CSV-file are good carriers for reasonably small datasets. Any company that holds a lot of data will use a database system – see for example Chapter 13RDBMS” on page 285. Importing data from a database system is somewhat different. The data is usually structured in “tables” (logical units of rectangular data); however, they will seldom contain all information that we need and usually have too much rows.We need also some protocol to communicate with the database: that is the role of the structured query language (SQL) – see Chapter 14SQL on page 291.

       database – MySQL

      R can connect to many popular database systems. For example, MySQL: as usual there is a package that will provide this functionality.

       MySQL

      if(!any(grepl(“xls”, installed.packages()))){ install.packages(“RMySQL”)} library(RMySQL)

       RMySQL

      Connecting to the Database

      This is explained in more detail in Chapter 15Connecting R to an SQL Database” on page 327, but the code segment below will already get you the essential ingredients.

      # The connection is stored in an R object myConnection and # it needs the database name (db_name), username and password myConnection = dbConnect(MySQL(), user = ‘root’, password = ‘xxx’, dbname = ‘db_name’, host = ‘localhost’) # e.g. list the tables available in this database. dbListTables(myConnection)

       dbConnect()

       dbSendQuery()

      Fetching Data Drom a Database

      Once connected to the database, we can select and download the data that we will use for our analysis in R.

      Update Queries

      It is also possible to manipulate data in the database directly from R. This allows use to prepare data in the database first and then download it and do our analysis, while we keep all the code in one file.

       fetch()

      The dbSendQuery() function can be used to send any query, including UPDATE, INSERT, CREATE TABLE and DROP TABLE queries so we can push results back to the database.

      sSQL = “” sSQL[1] <- “UPDATE tbl_students SET score = ‘A’ WHERE raw_score > 90;” sSQL[2] <- “INSERT INTO tbl_students (name, class, score, raw_score) VALUES (‘Robert’, ‘Grade 0’, 88,NULL);” sSQL[3] <- “DROP TABLE IF EXISTS tbl_students;” for (k in c(1:3)){ dbSendQuery(myConnection, sSQL[k]) }

       dbSendQuery()

      Create Tables from R Data-frames

      Finally, it is possible to write back data from a dataset in R to a table in a database.

      dbWriteTable(myConnection, “tbl_name”, data_frame_name[, ], overwrite = TRUE)

       dbWriteTable()

      image Warning – Closing the database connection

      Even while connections will be automatically closed when the scope of the database object is lost, it is a good idea to close a connection explicitly. Closing a connection explicitly, makes sure that it is closed and does not remain open if something went wrong, and hence the resources of our RDBMS are freed. Closing a database connection can be done with dbDisconnect(myConnection, …).

       dbDisconnect()

      1 1 There are different package management systems for different flavours of Linux and discussing them all is not only beyond the scope of this book, but not really necessary.We assume that if you use Genttoo, that you will know what to do or where to choose.

      2 a This behaviour is caused by the dispatcher-function implementation of an object-oriented programming model. To understand how this works and what it means, we refer to Section 6.2 “S3 Objects” on page 122.

      3 2 The character type in R is what in most other languages would be called a “string.” In other words, that is text that – generally – will not allow much arithmetic.

      4 a See Chapter 7 “Tidy R with the Tidyverse” on page 161 and note that tibbles (the data-frame alternative in the tidyverse) do not coerce text to factors.

      5 3 If your programming experience is limited, this might seem a little confusing. It is best to accept this and read further. Then, after reading Section 5.1 “Environments in R” on page 110 have another look at this section.

      6 4 A direct link to this file (in zip-format) is here: http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.zip?c6f8f9a0a5f970e31538be5271051b3c.

      7 5 With R it is also possible to read files directly from the Internet by supplying the URL in to the function read.csv().

      5.1 Environments in R

      Environments can be thought of as a set of existing or declared objects (functions, variables, etc.). When we start R, it will create an environment before the command prompt is available to the user.

      The top-level environment is the R command prompt. This is the “global environment” and known as R_GlobalEnv and can be accessed as .GlobalEnv.

       global environment

       R_GlobalEnv

      As


Скачать книгу