Sports Analytics in Practice with R. Ted Kwartler
type are used. However, you can coerce either object type to the other using `
as.matrix
` or `as.data.frame
` to switch. Just keep in mind the mixed data coercion mentioned previously.
The last data type discussed in this book is a “list” object. There are other object types including time series and arrays but for the most part this book employs mixed data types, with data frames and sparingly lists. If you are familiar with spreadsheets, think of a list as a “workbook” containing multiple “work sheets.” Each tab of the spreadsheet programs can contain different data even single values and different types. A list is similar in that each list “element” can contain a single value, multiple values, matrices, data frames, or even more lists! The following code creates a list object with varying data types and lengths, while Figure 1.7 is a graphical representation of the list.
Figure 1.7 The representation of the list with varying objects.
xList <- list(xDataFrame, fanTweet, teamA, xVec)
In complex R objects, you can get specific sections of the data by name or through indexing. The previous list has four elements, denoted with double square brackets such as `[[2]]
`. To access a specific list element, you can call the object `xList
` along with its specific element index as shown below to select the fourth element, the vector of numbers.
xList[[4]]
The same can be done with matrices or data frames using single brackets. Indexing row and column data requires two inputs separated by a comma. The selection for rows is first followed by the selection for columns. For example, let’s first call the `xDataFrame
` object in its entirety to establish familiarity. Then select the first row and third column which represents a single cell value of the data frame. Next, you can select a different row, column combination on your own within the console to establish this single value is returned.
xDataFrame xDataFrame[1,3]
Indexing also works for entire columns or entire rows. This is done by leaving the rows position blank or the columns position blank on either side of the comma. To call the second column of the data frame simply use single brackets, nothing on the left of the comma and a 2 to the right of the comma as shown.
xDataFrame[, 2]
Similarly, you can switch the index number to the left of the comma to obtain a specific row. Here, the entire fourth row is returned while the column position is left blank.
xDataFrame[4, ]
Besides the ability to have multiple data types, another benefit of the data frame object is the ability to declare a column by its name using the `$
` sign. For example, instead of an index position the column names `$numer1
` will return the entire first column of the data frame object. The two methods, indexing or by name, are equivalent but can be used interchangeably as long as the column has a declared name.
xDataFrame$number1
In fact, indexing can become more complex. You can access a specific list element, then a specific row, column, or single value by utilizing double then single brackets or `$` as shown. First, the fourth element of the list is obtained with `[[4]]`; then the second value is obtained within that vector. Keep in mind there is no need for a comma because a vector does not have rows or column. Instead, a vector merely has a position. In this case, “2” is returned.
# 4th element, vector 2nd position xList[[4]][2]
Next, the first list element is accessed, and as a data frame, the single brackets with a comma refer to the second row.
# 1st element, 2nd row xList[[1]][2,]
Similarly, the same data frame is indexed to return the first column because the “1” is to the right of the comma within the single square brackets.
# 1st element, 1st column xList[[1]][,1]
Of course, you can also use both rows and column positions separated by the comma within the single brackets.
# 1st element, 2nd row, 1st column xList[[1]][2,1]
Just to make things a bit more complex, if the list element is a data frame with named vectors, the second part of the code can employ the `$` along with the name. This will return the first list element, a data frame, and only the named column called “logical2.”
# 1st element, named column with $ xList[[1]]$logical2
Lastly, since the column of this list element is being accessed, it too can be indexed. Once again, the single column does not have a row and column pairing, it only has a position. Thus, no comma is needed and only the third position is returned in this example.
# 1st element, names column with $, third position xList[[1]]$logical2[3]
If all this seems wildly complex, do not fret. Throughout the book extensive explanation is given for both functions, inputs, and indexing. Further, with enough practice, this becomes commonplace and more readily understood.
So far, this basic explanation of R functionality has relied on base-R functions and libraries that are part of the standard installation. As mentioned previously, R can be specialized to a particular task by loading libraries. In order to obtain libraries, the `install.packages
` function must be run with a package name to download the specialized functions. This is done only once per library so that the library code is installed locally to your R installation. After the download occurs you can merely call the `library
` function with the name in order to enable the specialized functionality using the local installation. The code below installs a popular graphics library called “grammar of graphics” known as `ggplot2
` using the `install.packages
` function. After it is downloaded, the next line merely loads it as part of your R environment. This allows your R session to call functions within a “namespace” that includes base-R and now `ggplot2
` functions. It serves the purpose of specializing R for improved visualizations.
install.packages('ggplot2') library(ggplot2)
Throughout this book, multiple libraries are loaded. Novice R programmers can run into errors and frustrations regarding package installations. When executing scripts in this book that begin with `library(…)
`, an error of “there is no package called …” means you first need to use `install.packages
` to download the functionality to your library. Additionally errors may occur during the `install.packages
` step. This can be due to multiple reasons but most often stems from the fact that a package to be downloaded requires another package first. As a result, carefully read the console messages during the install phase to identify any other package prerequisites. If the `install.packages
` function executes correctly, then it is not necessary to repeat that function for each script. Thus, the code in this book only calls `library` for each specific library enabling corresponding functionality needed for the task at hand. This assumes all libraries have been previously and successfully installed.
To specialize R, first install a package with `install.packages` with the corresponding name. If installed without issue, simply call `library` any time your R session needs specialized functionality corresponding to the specific library. You will only need to use `install.packages` once but `library` will need to be called each time you start R and require the specialized functions of a particularly library.
In at least one instance in the book, a custom function is needed to make the code more concise. A custom function is like any other function loaded from a library. It is defined for an operation and requires an input and returns a value or object. The code below creates a simple custom function as an example. The function is declared as `plus3
` with the `function
` statement. Next, the input parameter is declared as `x
`. This means the function will be called `plus3
` and requires an input temporarily