Student Study Guide to Accompany Statistics Alive!. Wendy J. Steinberg
Quiz
1 Here are the results of a pop quiz given in an American history course: 2, 1, 9, 3, 2, 4, 5, 6, 7, 8, 2, 3, 7, 7, 2, 4, 7, 8, 5. The quiz was graded on a scale of 1 to 10. What are the mean, median, and mode of these scores?
2 Which measure of central tendency would best describe the distribution of the scores given in Question 1?
3 You have just completed teaching a computer science course for the first time. Many of your students did not do well, and the distribution of final grades was positively skewed. You would rather not terrify your students at the start of your next class, so you decide to be a little deceptive with how you report the performance of your previous students. Which measure of central tendency would be the most appropriate?
4 Following the example from Question 3, you only have 10 students enroll in your course the following semester. After the first test, you obtain the following grades (on a 0–70 scale): 45, 64, 57, 34, 65, 49, 44, 58, 58, 62. What is the mean of your class grades?
5 If you were to add one outlier to a symmetrical distribution, what would you expect to happen with regard to the measures of central tendency?
Quiz Answers
1 Mean = 4.84; median = 4.75—that is, ; mode = 2 and 7 (both four occurrences).
2 The distribution is bimodal, so there is no appropriate measure of central tendency, although both 2 and 7 may be reported.
3 You should report the mean because it is the highest value. (And you should make your tests easier!)
4 The mean would be 53.6.
5 The mean would change toward the direction of the outlier, but the mode and median would be relatively unaffected.
Module 6 Range, Variance, and Standard Deviation
Learning Objectives
Understand what dispersion means as it pertains to a set of data
Calculate the range of a set of data
Calculate the variance of a set of data
Understand how a standard deviation is obtained from the variance
Determine when the mean absolute deviation is used
Distinguish between descriptive (N) and inferential (n − 1) formulas for the variance and standard deviation
Module Summary
Dispersion is the extent to which the scores in a distribution are spread out or clustered together. Similar to measures of central tendency, measures of dispersion, or variability, are expressed as single values.
The range is the difference between the highest score and the lowest score in a data set. The range is the simplest measure of dispersion and is very insensitive to changes in the distribution; adding scores to the center of the distribution will not affect the range. The only way the range can be affected is by changing the most extreme scores. Due to these properties, the range is not used often.
Variance (s2 for samples; σ2 for populations) is the average squared distance of each score from the mean. The formula for variance is (depending on your instructor’s preference):
The definition of variance becomes more understandable by reviewing the formula. The numerator states that for every score in the distribution, you should obtain the deviation score—the distance of each score from the mean. This is found by subtracting each score from the mean. All the deviation scores sum to 0 (this is a good way to check your work when finding variances). To proceed with obtaining the variance, you must square each deviation score. This will remove any signs (+ or −) from the deviation scores and allow them to sum to a value other than 0. You then divide the sum of the squared deviation scores by the number of scores in the sample (or 1 fewer) to obtain the variance, or the average squared distance from the mean.
Unfortunately, the average squared distance from the mean is difficult to interpret. The variance tells you the average distance from the mean in area units (which we are unable to interpret) as opposed to linear units (which we normally use). Linear distance is the distance in original units, or regular score points (e.g., the linear distance between 3 and 5 is 2). To revert the variance (in area units) back to linear units, you take the square root of the variance. This result is referred to as the standard deviation. The standard deviation (s for samples, σ for populations) is the average amount a score differs (or deviates) from the mean. The formula for standard deviation is (depending on the formula used for the variance above):
The standard deviation is a standardized measure of dispersion, indicating that it can be used when working with a specific type of distribution called the normal curve. The normal curve will be discussed at length in later chapters.
Not all measures of linear dispersion are standardized. One unstandardized measure of dispersion is the mean absolute deviation, which uses the absolute value of each deviation score rather than the squaring technique. Although this provides a more intuitive measure of dispersion, the measure does not fall within known locations on the normal curve, which limits its use.
A statistic (or parameter) that is used to measure dispersion in describing a sample is called a descriptive statistic. Alternatively, you can use sample statistics to make guesses about larger populations. This is commonly referred to as inferential statistics. You are using the sample data to make inferences (guesses) about the larger population.
When using sample variances and standard deviations to infer about a larger population, it is important to note that your estimate will not be precise. In other words, your sample variance likely will not equal the population variance. In fact, you can be almost certain that the sample variance will be less than the population variance. Placing n − 1 in the denominator of the variance and standard deviation formulas will adjust for this bias when estimating the population standard deviation from a sample. This adjustment will increase the variance (or standard deviation), which will help to better approximate the population variance (or standard deviation). Because of the inferential use of the standard deviation and variance that will be introduced later, some instructors prefer to use n – 1 in the denominator of even the descriptive standard deviation and variance formulas. Ask your instructor which formula is preferred.
Computational Exercises
The following are the test grades for students in your European History class after the first test.
1 Find the deviation score for each test grade. What is the sum of these deviations?
2 Find the variance for the grades. Find the standard deviation for the grades.
3 How many standard deviations from the mean is the person with the highest grade? The person with the lowest score?
4 One of those who received a 99 is an exceptional student who turned in an extra-credit project that was worth an additional 10 points on the test. What would the variance and standard deviation be with this person’s correct grade? (Recalculate the measures of dispersion using this new high score.)
5 Reflecting back on your response to Question 4, which measure of dispersion had the largest change in absolute units?
6 How many of the original grades are between 1 and 2 SD above the mean?
The manager of a shoe store is interested in determining how many of each shoe size were sold the previous day. The store has made 10 sales of shoes with the following sizes:
7. Find the variance and standard deviation for these