The Big R-Book. Philippe J. S. De Brouwer

The Big R-Book - Philippe J. S. De Brouwer


Скачать книгу
dnorm(x,mean(SP500),sd(SP500)),col=“blue”,lwd=2)

      A better way to check for normality is to study the Q-Q plot. A Q-Q plot compares the sample quantiles with the quantiles of the distribution and it makes very clear where deviations appear.

       Q-Q plot

       library(MASS)

       qqnorm(SP500,col=“red”); qqline(SP500,col=“blue”)

Graph depicts a Q-Q plot is a good way to judge if a set of observations is normally distributed or not

      8.4.2 Binomial Distribution

      The Binomial distribution models the probability of an event which has only two possible outcomes. For example, the probability of finding exactly 6 heads in tossing a coin repeatedly for 10 times is estimated during the binomial distribution.

       distribution – binomial

      The Binomial Distribution in R

      As for all distributions, R has four in-built functions to generate binomial distribution:

       dbinom(x, size, prob): The density function

       dbinom()

       pbinom()

       dbinom()

       pbinom(x, size, prob): The cumulative probability of an event

       pbinom()

       qbinom(p, size, prob): Gives a number whose cumulative value matches a given probability value

       qbinom()

       rbinom(n, size, prob): Generates random variables following the binomial distribution.

       rbinom()

      Following parameters are used:

       x: A vector of numbers

       p: A vector of probabilities

       n: The number of observations

       size: The number of trials

       prob: The probability of success of each trial

      An Example of the Binomial Distribution

Graph depicts the probability to get maximum x tails when flipping a fair coin, illustrated with the binomial distribution.

      # Probability of getting 5 or less heads from 10 tosses of # a coin. pbinom(5,10,0.5) ## [1] 0.6230469 # visualize this for one to 10 numbers of tosses x <- 1:10 y <- pbinom(x,10,0.5) plot(x,y,type=“b”,col=“blue”, lwd=3, xlab=“Number of tails”, ylab=“prob of maxium x tails”, main=“Ten tosses of a coin”)# How many heads should we at least expect (with a probability # of 0.25) when a coin is tossed 10 times. qbinom(0.25,10,1/2) ## [1] 4

      Similar to theNormal distribution, random draws of the Binomial distribution can be obtained via a function that starts with the letter ‘r’: rbinom().

       rbinom()

      # Find 20 random numbers of tails from and event of 10 tosses # of a coin rbinom(20,10,.5) ## [1] 5 7 2 6 7 4 6 7 3 2 5 9 5 9 5 5 5 5 5 6

      Mileage may vary, but in many research people want to document what they have done and will need to include some summary statistics in their paper or model documentation. The standard summary of the relevant object might be sufficient.

      N <- 100 t <- data.frame(id = 1:N, result = rnorm(N)) summary(t) ## id result ## Min. : 1.00 Min. :-1.8278 ## 1st Qu.: 25.75 1st Qu.:-0.5888 ## Median : 50.50 Median :-0.0487 ## Mean : 50.50 Mean :-0.0252 ## 3rd Qu.: 75.25 3rd Qu.: 0.4902 ## Max. :100.00 Max. : 2.3215

      image Note – A tibble is a special form of data-frame

      A tibble and data frame will produce the same summaries.

      We might want to produce some specific information that somehow follows the format of the table. To illustrate this, we start from the dataset mtcars and assume that we want to make a summary per brand for the top-brands (defined as the most frequent appearing in our database).

      library(tidyverse) # not only for %>% but also for group_by, etc. # In mtcars the type of the car is only in the column names, # so we need to extract it to add it to the data n <- rownames(mtcars) # Now, add a column brand (use the first letters of the type) t <- mtcars %>% mutate(brand = str_sub(n, 1, 4)) # add column

      To achieve this, the function group_by() from dplyr will be very handy. Note that this function does not change the dataset as such, it rather adds a layer of information about the grouping.

       group_by()


Скачать книгу