The Big R-Book. Philippe J. S. De Brouwer

The Big R-Book - Philippe J. S. De Brouwer


Скачать книгу
some measures of central tendency become more appropriate to use than others. In the following sections, we will look at the mean, mode, and median, and learn how to calculate them and under what conditions they are most appropriate to be used.

      8.1.1 Mean

       mean

      Probably the most used measure of central tendency is the “mean.” In this section we will start from the arithmetic mean, but illustrate some other concepts that might be more suited in some situations too.

       central tendency – mean

      8.1.1.1 The Arithmetic Mean

       mean – arithmetic

      The most popular type of mean is the “arithmetic mean.” It is the average of a set of numerical values; and it is calculated by adding those values first together and then dividing by the number of values in the aforementioned set.

       mean – arithmetic

      Definition: Arithmetic mean

      images (for discrete distributions)

      images (for continuous distributions)

images

       mean

       P()

       probability

       probability

       f()

equation

      Not surprisingly, the arithmetic mean in R is calculated by the function mean().

       probability density function

       mean()

      # The mean of a vector: x <- c(1,2,3,4,5,60) mean(x) ## [1] 12.5 # Missing values will block the override the result: x <- c(1,2,3,4,5,60,NA) mean(x) ## [1] NA # Missing values can be ignored with na.rm = TRUE: mean(x, na.rm = TRUE) ## [1] 12.5 # This works also for a matrix: M <- matrix(c(1,2,3,4,5,60), nrow=3) mean(M) ## [1] 12.5

      image Hint – Outliers

      The mean is highly influenced by the outliers. To mitigate this to some extend the parameter trim allows to remove the tails. It will sort all values and then remove the x% smallest and x% largest observations.

      v <- c(1,2,3,4,5,6000) mean(v) ## [1] 1002.5 mean(v, trim = 0.2) ## [1] 3.5

      8.1.1.2 Generalised Means

       mean – generalized

      More generally, a mean can be defined as follows:

      Definition: f-mean

equation

      f(x) = x : arithmetic mean,

      images: harmonic mean,

      f(x) = xm: power mean,

      f(x) = lnx : geometric mean, images

       arithmetic mean

       mean – harmonic

       harmonic mean

       mean – power

       power mean

       mean – geometric

       geometric mean

      The Power Mean

      One particular generalized mean is the power mean or Hölder mean. It is defined for a set of K positive numbers xk by

equation

       holder mean

       mean – holder

      by choosing particular values for m one can get the quadratic, arithmetic, geometric and harmonic means.

       mean – quadratic

      m → ∞: maximum of xk

      m = 2: quadratic mean

      m = 1: arithmetic mean

      m → 0: geometric mean

      m = 1: harmonic mean

      m → −∞: minimum of xk

      Example: Whichmeanmakes most sense?

      returns <- c(0.5,-0.5,0.5,-0.5) # Arithmetic mean: aritmean <- mean(returns) # The ln-mean: log_returns <- returns for(k in 1:length(returns)) { log_returns[k] <- log( returns[k] + 1) } logmean <- mean(log_returns) exp(logmean) - 1 ## [1] -0.1339746 # What is the value of the investment after these returns: V_0 <- 1 V_T <- V_0 for(k in 1:length(returns)) { V_T <- V_T * (returns[k] + 1) } V_T ## [1] 0.5625 # Compare this to our predictions: ## mean of log-returns V_0 * (exp(logmean) - 1) ## [1] -0.1339746 ## mean of returns V_0 * (aritmean + 1) ## [1] 1

      8.1.2 The Median


Скачать книгу