The Big R-Book. Philippe J. S. De Brouwer

The Big R-Book - Philippe J. S. De Brouwer


Скачать книгу
returns the number of levels in the factor object.

       nlevels()

      # The nlevels function returns the number of levels: print(nlevels(factor_feedback)) ## [1] 3

      Digression – The reduced importance of factors

      When R was in its infancy, both computing power and memory were not at the level as today and in most cases it made sense to coerce strings to factors. For example, the base-R functions to load data in a data-frame (i.e. two dimensional data) will silently convert strings to factors. Today, that is most probably not what you need. Therefore, we recommend to make it a habit to use the functions from the tidyverse (see Chapter 7Tidy R with the Tidyverse” on page 161).

      4.3.7.2 Ordering Factors

      In the example about creating a factor-object for feedback one will have noticed that the plotfunction does show the labels in alphabetical order and not in an order that for us – humans – would be logical. It is possible to coerce a certain order in the labels by providing the levels – in the correct order – while creating the factor-object.

      feedback <- c(‘Good’,‘Good’,‘Bad’,‘Average’,‘Bad’,‘Good’) factor_feedback <- factor(feedback, levels=c(“Bad”,“Average”,“Good”)) plot(factor_feedback)

      In Figure 4.2 on page 63 we notice that the order is now as desired (it is the order that we have provided via the attribute labels in the function factor().

      Generate Factors with the Function gl()

      Function use for gl()

      gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE) with

       n: The number of levels

       k: The number of replications (for each level)

       length (optional): An integer giving the length of the result

       labels (optional): A vector with the labels

       ordered: A boolean variable indicating whether the results should be ordered.

       gl()

      image Question #4

      Use the dataset mtcars (from the library MASS) and explore the distribution of number of gears. Then explore the correlation between gears and transmission.

      image Question #5

      Then focus on the transmission and create a factor-object with the words “automatic” and “manual” instead of the numbers 0 and 1.

      Use the ?mtcars to find out the exact definition of the data.

       mtcars

      image Question #6

      Use the dataset mtcars (fromthe libraryMASS) and explore the distribution of the horsepower (hp). How would you proceed to make a factoring (e.g. Low, Medium, High) for this attribute? Hint: Use the function cut().

       cut()

      4.3.8 Data Frames

      4.3.8.1 Introduction to Data Frames

       data frame

       rectangular data

      Data frames are very useful for statistical modelling; they are objects that contain data in a tabular way. Unlike a matrix in data frame each column can contain different types of data. For example, the first column can be factorial, the second logical, and the third numerical. It is a composite data type consisting of a list of vectors of equal length.

      Data frames are created using the data.frame() function.

       data.frame()

       pairs()

Schematic illustration of the standard plot for a data frame in R shows each column printed in function of each other. This is useful to see correlations or how generally the data is structured.

      4.3.8.2 Accessing Information from a Data Frame

       summary()

       head()

       tail()

      # Get the structure of the data frame: str(data_test) ## ‘data.frame’: 5 obs. of 4 variables: ## $ Name : Factor w/ 5 levels “Laura”,“Lisa”,..: 5 4 3 2 1 ## $ Gender: Factor w/ 2 levels “Female”,“Male”: 2 2 1 1 1 ## $ Score : num 78 88 92 89 84 ## $ Age : num 42 38 26 30 35 # Note that the names became factors (see warning below) # Get the summary of the data frame: summary(data_test) ## Name Gender Score Age ## Laura:1 Female:3 Min. :78.0 Min. :26.0 ## Lisa :1 Male :2 1st Qu.:84.0 1st Qu.:30.0 ## Paula:1 Median :88.0 Median


Скачать книгу