The Big R-Book. Philippe J. S. De Brouwer
general, a warning is important to read once you will start working on your own.
Note that the boxes with a shadow are “lifted off the page” and are a little independent from the flow of the main text. Those that are no shadow are part of the main flow of the text (definitions, examples, etc.)
Notes
1 1 You, will of course, first have to install the base software R. More about this in Chapter 4 “The Basics of R” on page 21.
2 2 The number sign, #, is also known as the “hash sign” or “pound sign.” It probably evolved from the “libra ponda” (a pound weight). It is currently used in any different fields as part of phone numbers, in programming languages (e.g. in an URL it indicates a sub-address, in R it precedes a comment, etc), the command prompt for the root user in Unix and Linux, in set theory (#S is the cardinality of the set S), in topology (A#B is the connected sum of manifolds A and B), number theory (#n is the primorial of n), as keyword in some social media, etc. The pronunciation hence varies widely: “hash” when used to tag keywords (#book would be the hash sign and the tag book). Hence, reading the “#”-sign as “hashtag” is at least superfluous). Most often, it is pronounced as “pound.” Note that the musical notation is another symbol, <, that is pronounced as “sharp” as in the music (e.g. C<).
3 3 ISO standards refer to the standards published by the International Organization for Standardization (ISO). This is an international standard-defining body, founded on 23 February 1947, it promotes worldwide proprietary, industrial and commercial standards. At this point, 164 countries are member and decisions are made by representatives of those countries. ISO is also recognised by the United Nations.
♣4♣ The Basics of R
In this book we will approach data and analytics from a practitioners point of view and our tool of choice is R. R is in some sense a re-implementation of S – a programming language written in 1976 by John Chambers at Bell Labs – with added lexical scoping semantics. Usually, codewritten in S will also run in R.
S
R is a modern language with a rather short history. In 1992, the R-project was started by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. The first version was available in 1995 and the first stable version was available in 2000.
Now, the R Development Core Team (of which Chambers is a member) develops R further and maintains the code. Since a few years Microsoft has embraced the project and provides MRAN (Microsoft R Application Network). This package is also free and open source software (FOSS) and has some advantages over standard R such as enhanced performance (e.g. multi-thread support, the checkpoint package that makes results more reproducible).
FOSS
Essentially, R is …
a programming language built for statistical analysis, graphics representation and reporting;
an interpreted computer language which allows branching, looping, modular programming as well as object and functional oriented programming features.
R offers its users …
integration with the procedures written in the C, C++, .Net, Python, or FORTRAN languages for efficiency;
C
C++
.Net
Fortran
zero purchase cost (available under the GNU General Public License), and pre-compiled binary versions are provided for various operating systems like Linux, Windows, and Mac;
Linux
Windows
Mac
simplicity and effectiveness;
a free and open environment;
an effective data handling and storage facility;
a suite of operators for calculations on arrays, lists, vectors, and matrices;
a large, coherent, and integrated collection of tools for data analysis;
graphical facilities for data analysis and display either directly at the computer or printing;
a supportive on-line community;
the ability for you to stand on the shoulders of giants (e.g. by using libraries).
R is arguably the most widely used statistics programming language and is used fromuniversities to business applications, while it still gains rapidly in popularity.
If at any point you are trying to solve a particular issues and you are stuck, the online community will be very helpful. To get unstuck, do the following:
First, look up your problems by adding the keyword “R” in the search string. Most probably, someone else encountered the very same problem before you, and the answer is already posted. Avoid to post a question that has been answered before.
If you need to ask your question in a forum such as for example www.stackexchange.com then you will need to add a minimal reproducible example. The package reprex can help you to do just that.
4.1 Getting Started with R
Before we can start, we need a working installation of R on our computer. On Linux, this can be done via the command line. On Debian and its many derivatives such as Ubuntu or Mint, this looks as follows:1
installing R
sudo apt-get install r-base
On Windows or Mac, you want to refer to https://cran.r-project.org
and download the right package for your system.
To start R, open the command line and type R
(followed by enter). This is the R interpreter (or R console). You can do all your data crunching here. To leave the environment type q()
followed by [enter].
It is also possible to use R online:
https://www.tutorialspoint.com/execute_r_online.php