The Big R-Book. Philippe J. S. De Brouwer
<- new_value
5 For larger projects, where many people work on the same code it makes sense to have a look at R6 (it allows private methods, for example).
In fact, the OO system implemented in R is so that it does not come in the way of what you want to do, and you can do all your analysis of great complexity and practical applicability without even knowing about the OO systems. For casual use and the novice user alike, the OO system takes much of the complexity away. For example, one can type summary(foo)
, and it will work regardless of what this “foo” is, and you will be provided with a summary that is relevant for the type of object that foo represents.
Notes
1 1 Object oriented programming refers to the programming style that provides a methodology that enables a logical system (real life concept, such as for example “student”)) to be modelled an “objects.” In further code it will then be possible to address this object via its methods and attributes. For example, the object student can have a method “age” that returns the age of the student based in its attribute birth-date.
2 2 The reader that has knowledge of C, might want to know that this is the object-like functionality that is provided by the struct keyword in C. R is written in C and to program base-R one used indeed those structures.
3 3 Experienced C-users might want to think of this as something like the statement switch(TYPEOF(x)) in C.
4 4 A good way to see a generic function is as an overloaded function with a twist.
5 5 Sometimes generic functions are also referred to as “generics.”generic
6 a How to measure and improve speed is described in Chapter 40 “The Need for Speed” on page 793.
7 6 Notice that there are no objects of these names in base R, but for example you will find some in the methods package. This package provides formally defined methods and objects for R.methods
8 a UppercamelCase is easy to understand when comparing to lowerCamelCase, the dot.separator and snake_case. It refers to the way long names (of objects, variables and functions) are kept readable in code. They are all good alternatives, and each programmer has his/her preference, though in many communities, there are some unwritten rules. These rules are best followed because that makes your code much easier to read.
9 7 R5 has recently been added to R (2015). It responds to the need formutable objects and as such makes packages such as R.oo, proto and mutatr obsolete.
10 a The development of R6 can be followed up here: https://www.r-project.org/nosvn/pandoc/R6.html
♣7♣ Tidy R with the Tidyverse
7.1. The Philosophy of the Tidyverse
R is Free and Open Source Software (FOSS), that implies that it is free to use, but also that you have access to the code – if desired. As most FOSS projects, R is also easy to expand. Fortunately, it is also a popular language and some of thesemillions of R users1 might have created a packages and enhance R's functionality to do just what you need. This allows any R users to stand on the shoulders of giants: you do not have to re-invent the wheel, but you can just pick a package and expand your knowledge and that of humanity. That is great, and that is one of the most important reasons to use R. However, this has also a dark side: the popularity and the ease to expand the language means that there are literally thousands of packages available. It is easy to be overwhelmed by the variety and vast amount of packages available and this is also one of the key weaknesses of R.
Most of those packages will require one or more other packages to be loaded first. These packages will in their turn also have dependencies on yet other (or the same) packages. These dependenciesmight require a certain version of the upstreampackage. This package maintenance problem used to be known as the “dependency hell.” The package manager of R does, however, a good job and it usually will work as expected.
Using the same code again after a few years, is usually more challenging. In the meanwhile you might have updated R to a newer version and most packages will be updated too. It might happen that some packages have become obsolete and are not maintained anymore and therefore, the new version is not available. This can cause some other packages to fail.
Maintaining code is not a big challenge if you just write a project for a course at the university and will never use it again. Code maintenance becomes an issue when you want to use the code later …but it becomes a serious problem if other colleagues need to review your work, expand it and change it later (while you might not be available).
Another issue is that because of this flexibility, core R is not very consistent (though people will argue that while Linux does even a worse job here and still is the best OS).
OS
operating system
Consistency does matter and it follows from a the choice of a programming philosophy. For example, R is a software to do things with data, so each function should have a first argument that refers to the data. Many functions will follow this rule, but not all. Similar issues exist for arguments to functions, names of objects and classes (e.g. there is vector
and Date
, etc.)
Then there is the tidyverse
. It is a recent addition to R that is both a collection of often used functionalities and a philosophy.
The developers of tidyverse
promote2:
Use existing and common data structures. So all the packages in the tidyverse will share a common S3 class types; this means that in general functions will accept data frames (or tibbles). More low-level functions will work with the base R vector types.
Reuse data structures in your code. The idea here is that there is a better option than always over-writing a variable or create a new one in every line: pass on the output of one line to the next with a “pipe”: %>%. To be accepted in the tidyverse, the functions in a package need to be able to use this pipe.3pipe
Keep functions concise and clear. For example, do not mix side-effects and transformations, function names should be verbs where ever possible (unless they become too generic or meaningless of course), and keep functions short (they do only one thing, but do it well).
Embrace R as a functional programming language. This means that reflexes that youmight have from say C++, C#, python, PHP, etc., will have to be mended. This means for example that it is best to use immutable objects and copy-on-modify semantics and avoid using the refclass model (see Section 6.4 “The Reference Class, refclass, RC or R5 Model” on page 113). Use where possible the generic functions provided by S3 and S4. Avoid writing loops (such as repeat and for but use the apply family of functions (or refer to the