Analysing Quantitative Data. Raymond A Kent
1.1 The interpretation of Cronbach’s coefficient alpha
Alpha has effectively become the measure of choice for establishing the reliability of multi-item scales. Its availability at the click of a mouse button in survey analysis programs like SPSS has almost certainly meant that it is commonly reported by researchers, but often with little understanding of what it means and what its limitations are.
Despite its wide use, there is little guidance in the literature (and none from Cronbach himself) as to what constitutes an ‘acceptable’ or ‘sufficient’ value for alpha to achieve. Most users of the statistic cite Nunnally’s (1978) recommendation that a value of 0.7 should be achieved. The coefficient is then treated as a kind of ‘test’; if alpha for any scale is greater than 0.7 then it is deemed to be sufficiently reliable. However, all those authors making recommendations about acceptable levels of alpha, including Nunnally, indicate that the desired degree of reliability is a function of the purpose of the research, for example whether it is exploratory or applied. Nunnally himself in 1978 suggested that for preliminary research ‘reliabilities of 0.70 or higher will suffice’. For ‘basic’ research, he suggests that ‘increasing reliabilities much beyond 0.80 is often wasteful of time and funds’ (Nunnally, 1978: 245). In contrast, for applied research, 0.80 ‘is not nearly high enough’. Where important decisions depend on the outcome of the measurement process a reliability of 0.90 ‘is the minimum that should be tolerated’.
None of Nunnally’s recommendations, however, have an empirical basis, a theoretical justification or an analytical rationale. Rather they seem to reflect either experience or intuition. Interestingly, Nunnally had changed his own recommendations from his 1967 edition of Psychometric Theory, which recommended that the minimally acceptable reliability for preliminary research should be in the range of 0.5 to 0.6.
Peterson (1994) reports the results of a study to ascertain the values of alpha actually obtained in articles and papers based on empirical work. From a sample of over 800 marketing and psychology-related journals, conference proceedings and some unpublished manuscripts, he reviewed all alpha coefficients found in each study, resulting in 4,286 coefficients covering a 33-year period. Reported coefficients ranged from 0.6 to 0.99 with a mean of 0.77. About 75 per cent were 0.7 or greater and 50 per cent were 0.8 or greater. Peterson found that reported alphas were not greatly affected by research design characteristics, such as sample size, type of sample, number of scale categories, type of scale, mode of administration or type of research. One exception to this is that during scale development items are often eliminated if their presence restricts the value of alpha. Not surprisingly, the alpha coefficients reported were significantly related to the number of items eliminated.
It is important to remember that alpha measures only internal consistency; if error factors associated with the passage of time are of concern to the researcher, then it will not be the most appropriate statistic. However, since alpha approximates the mean of all possible split-half reliabilities, it can be seen as a superior measure of scale equivalence. It is not, however, as is commonly supposed, an indication of unidimensionality. Alpha can, in fact, be quite high despite the presence of several different dimensions (Cortina, 1993).
When interpreting alpha coefficients, it is often forgotten, furthermore, that the values achieved are in part a function of the number of items. Thus for a three-item scale with an alpha of 0.80, the average inter-item correlation is 0.57. For a ten-item scale with an alpha of 0.80 it is only 0.28. What needs to be kept in mind is that in evaluating, say, a 40-item scale, alpha will be relatively large simply because of the number of items, and the number of items is not exactly a great measure of scale quality. When many items are pooled, internal consistency estimates are inevitably large and invariant, and therefore of limited value.
Alpha, in short, should be used with some caution. It is appropriate only when the researcher requires a measure of internal consistency, and is helpful only then if the number of items used is fairly limited. The value of alpha to be taken as ‘acceptable’ must be related to the purpose of the research, and even then only used as an indication rather than a ‘test’ to be passed with a fixed value. Furthermore, if researchers are concerned about dimensionality, then procedures like factor analysis are probably more appropriate. For a more recent review of Cronbach’s alpha see Lee and Hooley (2005).
Key points and wider issues
In constructing data, there is the potential for error from a range of sources including inappropriate specification of cases, biased selection of cases, random sampling error, poor data capture techniques, non-response, response error, interviewer error and measurement error. In reporting the results of social survey research, most researchers tend to focus on random sampling error. This is largely because they can calculate the probability of given sizes of such error based on assumptions about values, differences or covariations in the population of cases from which the sample was drawn, the size of the sample and the variability of measures taken for a given property. These calculations are explained in Chapters 4–6. However, random sampling error is likely to be only a small proportion of overall error and will, furthermore, only exist if a random sample has been taken. Assael and Keon (1982), for example, estimated that random sampling error accounted for less than 5 per cent of total survey error. Some researchers will present evidence of measure reliability; less often, evidence of validity will be suggested. There is a considerable literature on ways of calculating error of various kinds (for a review see Lessler and Kalsbeek, 1992). However, the calculations can get quite complex and most formulae assume metric data, taking the ‘mean square error’ as the key dependent variable which is explained by a range of sources of bias plus random error.
In practice, researchers are more likely to focus on ways of minimizing the likelihood of error arising in the first place by adopting strategies and procedures to control its occurrence.
Error of various kinds can always be reduced by spending more money, for example on more interviewer training and supervision, on sophisticated random sampling techniques, on pilot testing or on getting a higher response rate. However, the reduction in error has to be traded off against the extra cost involved. Furthermore, errors are often interrelated so that attempts to reduce one kind of error may actually increase another; for example, minimizing the non-response errors by persuading more reluctant respondents may well increase response error. Non-sampling errors tend to be pervasive, not well behaved and do not decrease – indeed may increase – with the size of the sample. It is sometimes even difficult to see whether they cause under- or over-estimation of population characteristics. There is, in addition, the paradox that the more efficient the sample design is in controlling sampling fluctuations, the more important in proportion become bias and non-sampling errors.
The implications of this chapter for the alcohol marketing dataset
The alcohol marketing study is based on the construction of quantitative data using a combination of interviewer-completed and self-completed (but personally delivered) questionnaires in a cross-sectional survey research design. The resulting dataset is far from being error free and there are many ambiguities. The researchers explain that they sent an information pack ‘to the homes of all second year (12–14 years) pupils attending schools in three local authority areas in the west of Scotland’ (Gordon et al., 2010a). This, they say, ‘generated a sample of 920 respondents’. Although they call this set of cases a ‘sample’, they do not mention what population the sample is meant to be a sample of. They do not say how many packs were sent out, so there is no idea of what the response rate was. Accordingly, it is probably safest to regard the 920 cases as an incomplete census of all the pupils in the selected schools. It is almost certainly not a random sample, or indeed not really a sample of any kind. The date when these packs were sent is not stated, but from other sources it was probably between October 2006 and March 2007. The authors do not discuss why the 12–14-year age group is the appropriate one to choose for this research or why the west of Scotland was chosen for its location.
In the original dataset, over 1,600 variables