The Big R-Book. Philippe J. S. De Brouwer
name of the class itself is not confusing. Where the function print.data.frame() potentially can be the specific method for the print function for a data.frame, it can also be the specific method for the print.data function for a frame object. The name of the class tibble does not use the dot and hence cannot be confusing.
To illustrate some of these differences, consider the following code:
# -- data frame -- df <- data.frame(“value” = pi, “name” = “pi”) df$na # partial matching of column names ## [1] pi ## Levels: pi # automatic conversion to factor, plus data frame # accepts strings: df[,“name”] ## [1] pi ## Levels: pi df[,c(“name”, “value”)] ## name value ## 1 pi 3.141593 # -- tibble -- df <- tibble(“value” = pi, “name” = “pi”) df$name # column name ## [1] “pi” df$nam # no partial matching but error msg. ## Warning: Unknown or uninitialised column: ‘nam’. ## NULL df[,“name”] # this returns a tibble (no simplification) ## # A tibble: 1 x 1 ## name ## <chr> ## 1 pi df[,c(“name”, “value”)] # no conversion to factor ## # A tibble: 1 x 2 ## name value ## <chr> <dbl> ## 1 pi 3.14
This partial matching is one of the nicer functions of R, and certainly was an advantage for interactive use. However when using R in batch mode, thismight be dangerous. Partialmatching is especially dangerous in a corporate environment: datasets can have hundreds of columns and many names look alike, e.g. BAL180801, BAL180802, and BAL180803. Till a certain point it is safe to use partial matching since it will only work when R is sure that it can identify the variable uniquely. But it is bound to happen that you create new rows and suddenly someone else's code will stop working (because now R got confused).
Digression – Changing how a tibble is printed
To adjust the default behaviour of print on a tibble, run the function options
as follows:
options(
tibble.print_max=n, # If there are more than n
tibble.print_min=m, # rows, only print the m first
# (set n to Inf to show all)
tibble.width = l # max nbr of columns to print
# (set to Inf to show all)
)
options()
Tibbles are also data frames, and most older functions – that are unaware of tibbles – will work just fine. However, it may happen that some function would not work. If that happens, it is possible to coerce the tibble back into data frame with the function as.data.frame()
.
tb <- tibble(c(“a”, “b”, “c”), c(1,2,3), 9L,9) is.data.frame(tb) ## [1] TRUE # Note also that tibble did no conversion to factors, and # note that the tibble also recycles the scalars: tb ## # A tibble: 3 x 4 ## `c(“a”, “b”, “c”)` `c(1, 2, 3)` `9L` `9` ## <chr> <dbl> <int> <dbl> ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # Coerce the tibble to data-frame: as.data.frame(tb) ## c(“a”, “b”, “c”) c(1, 2, 3) 9L 9 ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # A tibble does not recycle shorter vectors, so this fails: fail <- tibble(c(“a”, “b”, “c”), c(1,2)) ## Error: Tibble columns must have consistent lengths, only values of length one are recycled: ## * Length 2: Column ‘c(1, 2)’ ## * Length 3: Column ‘c(“a”, “b”, “c”)’ # That is a major advantage and will save many programming errors.
The function view(tibble)
works as expected and is most useful when working with RStudio where it will open the tibble in a special tab.
While on the surface a tibble does the same as a data.frame, they have some crucial advantages and we warmly recommend to use them.
7.3.2 Piping with R
This section is not about creating beautiful music, it explains an argument passing system in R. Similar to the pipe in Linux, the pipe operator, |
, the operator %>%
from the package magrittr
allows to pass the output of one line to the first argument of the function on the next line.11
pipe
magrittr
% > %
When writing code, it is common to work on one object for a while. For example, when we need to import data, then work with that data to clean it, add columns, delete some, summarize data, etc.
To start, consider a simple example:
t <- tibble(“x” = runif(10)) t <- within(t, y <- 2 * x + 4 + rnorm(10, mean=0,sd=0.5))
This can also be written with the piping operator from magrittr
t <- tibble(“x” = runif(10)) %>% within(y <- 2 * x + 4 + rnorm(10, mean=0,sd=0.5))
What R does behind the scenes, is feeding the output left of the pipe operator as main input right of the pipe operator. This means that the following are equivalent:
# 1. pipe: a %>% f() # 2. pipe with shortened function: a %>% f # 3. is equivalent with: f(a)
Example: – Pipe operator
a <- c(1:10)
a %>% mean()
## [1] 5.5
a %>% mean
## [1] 5.5
mean(a)
## [1] 5.5
It might be useful to pronounce the pipe operator, %>%
as “then” to understand what it does.
# The following line
c <- a %>%
f()
# is equivalent with: