The Big R-Book. Philippe J. S. De Brouwer
data, business, regulations and customers. For effectiveness in every step, one needs to pay attention to communication and permanent contact with all stakeholders and environment is key.
Note
1 1 The term “singularity” refers to the point in time where an intelligent system would be able to produce an even more intelligent system that also can create another system that is a certain percentage smarter in a time that is a certain percentage faster. This inevitably leads to exponentially increasing creating of better systems. This time series converges to one point in time, where “intelligence” of themachine would hit its absolute limits. First, record of the subject is by Stanislaw Ulam in a discussion with John Von Neuman in the 1950s and an early and convincing publication is Good (1966). It is also elaborately explored in Kurzweil (2010).
♣3♣ Conventions
This book is formatted with LATEX. The people who know this markup language will have high expectations for the consistency and format of this book. As you can expect there is
1 a table of contents at the start;
2 an index at the end, from page 1103;
3 a bibliography on page 1088;
4 as well as a list of all short-hands and symbols used on page 1117.
This is a book with a programming language as leitmotif and hence you might expect to find a lot of chunks of code. R is an interpreted language and it is usually accessed by opening the software R (simply type R
on the command prompt and press enter).1
# This is code 1+pi ## [1] 4.141593 Sys.getenv(c("EDITOR","USER","SHELL", "LC_NUMERIC")) ## EDITOR USER SHELL LC_NUMERIC ## "vi" "root" "/bin/bash" "pl_PL.UTF-8"
As you can see, the code is highlighted, that means that not all things have the same colour and it is easier to read and understand what is going on. The first line is a “comment” that means that R will not do anything with it, it is for human use only. The next line is a simple sum. In your R terminal, this what you will type or copy after the >
prompt. It will rather look like this:
> # This is code > 1+pi [1] 4.141593 > Sys.getenv(c("EDITOR","USER","SHELL","XDG_SESSION_TYPE") EDITOR USER SHELL LC_NUMERIC "vi" "philippe" "/bin/bash" "pl_PL.UTF-8" >
In this, book there is nothing in front of a command and the reply of R is preceded by two pound signs: “##.”2 The pound sign (#
) is also the symbol used by R to precede a comment, hence R will ignore this line if fed into the command prompt. This allows you to copy and paste lines or whole chunks if you are working from an electronic version of the book. If the > sign would precede the command, then R would not understand if, and if you accidentally copy the output that from the book, nothing will happen because the #-sign indicates to R to ignore the rest of the line (this is a comment for humans, not for the machine).
The function Sys.getenv()
returns us all environment variables if no parameter is given. If it is supplied with a list of parameters, then it will only return those.
Sys.getenv()
In the example above the function got three variables supplied, hence only report on these three. You will also notice that the variables are wrapped in a special function c(…)xs
. This is because the function Sys.getenv()
expects one vector as argument and the function c()
will create the vector out of a list of supplied arguments.
Note that in this paragraph above name of the function Sys.getenv()
is mono-spaced. That is our convention to use code within text. Even in the index, at the end of this book, we will follow that convention.
You will also have noticed that in text – such as this line – we refer to code fragments and functions, using fixed width font such as for example “the function mean()
calculates the average.” When this part of the code needs emphasizing or is used as a word in the sentence, we might want to highlight it additionally as follows: mean(1 + pi)
.
Some other conventions also follow from this small piece of code.We will assume that you are using Linux (unless mentioned otherwise). But do notworry: that is not something that will stand in your way. In Chapter 4 “The Basics of R” on page 21, we will get you started in Windows and all other things will be pretty much the same. Also, while most books are United States centric, we want to be as inclusive as possible and not assume that you live in the United States working from United States data.
United States of America
ISO standard
As a rule, we take a country-agnostic stance and follow the ISO standards3, for dates and dimensions of other variables. For example, we will use meters instead of feet.
Learning works best when you can do something with the knowledge that you are acquiring. Therefore, we will usually even show the code of a plot that is mainly there for illustrative purposes, so you can immediately try everything yourself.
When the code produces a plot (chart or graph), then the plot will appear generally at that point between the code lines. For example, consider we want to show the generator function for the normal distribution.
# generate 1000 random numbers between 0 and 100 x <- rnorm(1000, mean = 100, sd = 2) # to illustrate previous, we show the histogram. hist(x, col = "khaki3")# This code follows the ‘hist’ command. # In rare cases the plot will be on the this page # alone and this comment is the previous page.
In most cases, the plot will be just after the code that it generates – even if the code continues after the plot(…)
command. Therefore, the plot will usually sit exactly where the code creates it. However, in some rare cases, this will not be possible (it would create page layout that would not be aesthetically appealing). The plot will then appear near the code chunk (maybe on the next page). To help you to find and identify the plot in such case, we will usually add a numbered caption to the plot.
The R code is so ubiquitous and integrated in the text that it will appear just where it should be (though charts might move). They are integral part of the text and the comments that appear there might not be repeated in the normal text later.
There is also some other code from the command prompt and/or from SQL environments. That code appears much less, so they are numbered and appear as in Listings 3.1 and 3.2.
Listing 3.1: This is what you would see if you start R in the command line terminal. Note that the last sign is the R-prompt, inviting you to type commands. This code fragment is typical for how code that is not in the R-language has been typeset in this book.
$ R R version 3.4.4 (2018-03-15) -- "Someone to Lean On" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors.