Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Applied Univariate, Bivariate, and Multivariate Statistics

12 -0.09 -0.35 se parent 0.06 child 0.08

The skew for child has a value of −0.09, indicating a slight negative skew. This is confirmed by visualizing the distribution (and by a relatively close inspection in order to spot the skewness):

> hist(child) Histogram depicting the plot of child versus frequency.

2.11 SAMPLING DISTRIBUTIONS

Sampling distributions are at the cornerstone of statistical inference. The sampling distribution of a statistic is a theoretical probability distribution of that statistic. As defined by Degroot and Schervish (2002), “the sampling distribution of a statistic tells us what values a statistic is likely to assume and how likely it is to assume those values prior to observing our data” (p. 391).

As an example, we will generate a theoretical sampling distribution of the mean for a given population with mean μ and variance, σ². The distribution we will create is entirely idealized in that it does not exist in nature anywhere. It is simply a statistical theory of how the distribution of means might look if we were able to take an infinite number of samples of a given size from a given population, and on each of these samples, calculate the sample mean statistic.

When we derive sampling distributions for a statistic, we are asking the following question:

If we were to draw an infinite number of samples of size nfrom this population and calculate the sample mean on each sample, what would the distribution of sample means look like?

If we can specify this distribution, then we can evaluate obtained sample means relative to it. That is, we will be able to compare our obtained means (i.e., the ones we obtain in real empirical research) to the theoretical sampling distribution of means, and answer the question:

If my obtained sample mean really did come from this population, what is the probability of obtaining a mean such as this?

If the probability is low, you might then decide to reject the assumption that the sample mean you obtained arose from the population in question. It could have, to be sure, but it probably did not. For continuous measures, our interpretation above is slightly informal, since the probability of any particular value of the sample mean in a continuous distribution is essentially equal to 0 (i.e., in the limit, the probability equals 0). Hence, the question is usually posed such that we seek to know the probability of obtaining a mean such as the one we obtained or more extreme.

2.11.1 Sampling Distribution of the Mean

Since we regularly calculate and analyze sample means in our data, we are often interested in the sampling distribution of the mean. If we regularly computed medians, we would be equally as interested in the sampling distribution of the median.

Recall that when we consider any distribution, whether theoretical or empirical, we are usually especially interested in knowing two things about that distribution: a measure of central tendency and a measure of dispersion or variability. Why do we want to know such things? We want to know these two things because they help summarize our observations, so that instead of looking at each individual data point to get an adequate description of the objects under study, we can simply request the mean and standard deviation as telling the story (albeit an incomplete one) of the obtained observations. Similarly, when we derive a sampling distribution, we are interested in the mean and standard deviation of that theoretical distribution of a statistic.

We already know how to calculate means and standard deviations for real empirical distributions. However, we do not know how to calculate means and standard deviations for sampling distributions. It seems reasonable that the mean and standard deviation of a sampling distribution should depend in some way on the given population from which we are sampling. For instance, if we are sampling from a population that has a mean μ = 20.0 and population standard deviation σ = 5, it seems plausible that the sampling distribution of the mean should look different than if we were sampling from a population with μ = 10.0 and σ = 2. It makes sense that different populations should give rise to different theoretical sampling distributions.

What we need then is a way to specify the sampling distribution of the mean for a given population. That is, if we draw sample means from this population, what does the sampling distribution of the mean look like for this population? To answer this question, we need both the expectation of the sampling distribution (i.e., its mean) as well as the standard deviation of the sampling distribution (i.e., its standard error (SE)). We know that the expectation of the sample mean images is equal to the population mean μ. That is, images . For example, for a sample mean images , the expected value of the sample mean is equal to the population mean μ of 20.0.

To understand why images should be true, consider first how the sample mean is defined:

Incorporating this into the expectation for images , we have:

There is a rule of expectations that says that the expectation of the sum of random variables is equal to the sum of individual expectations. This being the case, we can write the expectation of the sample mean images as:

Since the expectation of each y₁ through y_n is E(y₁) = μ, E(y₂) = μ, … E(y_n) = μ, we can write

We note that the n values in numerator and denominator cancel, and so we end up with

Using the fact that E(y_i) = μ, we can also say that the expected value of a sampling distribution of the mean is equal to the mean of the population from which we did the theoretical sampling. That is, images is true, since given images , it stands that if we have say, five sample means images , the expectation of each of these means should be equal to μ, from which we can easily deduce images . That is, the mean of all the samples we could draw is equal to the population mean.

We now need a measure of the dispersion of a sampling distribution of the mean. At first glance, it may seem reasonable to assume that the variance of the sampling distribution of means should equal the variance of the population from which the sample means were drawn. However, this is not

Скачать книгу