Analysing Quantitative Data. Raymond A Kent
at face value as a ‘true’ record of respondents’ perceptions.
This is fine if, as researchers, what we wish to measure is perceived health status, perceived likelihood of drinking alcohol in the next year, or self-defined social class; however, different individuals will define these in different ways. Furthermore, for various reasons, the respondent may give a ‘wrong’ answer, for example because he or she has incorrect recall, has misinformation, is exaggerating or fabricating. The respondent’s answers are also likely to be affected by mood, situational factors, willingness or reluctance to impart feelings or information, the wording of the question, the way it was addressed, or the understanding of the question.
In these circumstances, researchers may seek more ‘objective’ measures – ones that are more readily observable or recordable and are more likely to be comparable across respondents. Researchers may, for example, take an indicator of the concept rather than the concept itself. Gross national product is commonly taken as an indicator of a country’s wealth; repeat purchase may be taken as an indicator of brand loyalty. Indirect measurement assumes that there is a degree of correspondence between the concept and the indicator deployed, but recognizes that the indicator is not the concept itself, only a reflection of it. Such measurement depends on the presumed relationships between observations and the concept of interest.
With concepts as complex as health status, social class or academic ability, asking just one question of respondents or taking just one measure of a nation’s wealth may be insufficient. Such concepts will tend to have several dimensions, aspects or facets. Each is then used to derive an overall measure. This may involve adding up recorded values and then taking an average, it might entail subtracting one value from another to derive differences, or it may mean using more complex statistical techniques. One of the most commonly used methods of derived measurement in the social sciences that is used to measure attitudes is the summated rating scale. A rating is an ordered classification of a grade given by a respondent in a survey, such as ‘Excellent’, ‘Good’, ‘Fair’, ‘Poor’, ‘Very poor’. In order to be able to add up ratings for several aspects, a numerical value is assigned to each category, for example 5, 4, 3, 2 and 1. These can now be totalled to give an overall score.
Suppose 150 respondents in a survey are asked to rate their level of satisfaction with five aspects of a service from ‘Very satisfied’ to ‘Very dissatisfied’ and values are allocated as illustrated in Figure 1.2. Total scores can now be added up. The maximum score a customer can give is 5 on each aspect, totalling 25. The minimum total is 5. These totals can then be divided by five to give an average value for each case.
Figure 1.2 A summated rating scale
A particular version of a summated rating scale to measure attitudes was developed by Likert in 1932. Likert scales are based on getting respondents to indicate their degree of agreement or disagreement with a series of statements about the object or focus of the attitude. Usually, these are on five-point ratings from ‘Strongly agree’, through ‘Agree’, ‘Neither agree nor disagree’, ‘Disagree’ to ‘Strongly disagree’. Likert’s main concern was with single dimensionality, that is, making sure that all the items would measure the ‘same’ thing. Accordingly, he recommended a series of steps:
1 A large list of attitude statements, both positive and negative, concerning the object of the attitude is generated, usually based on the results of qualitative research.
2 The response categories are given codes, typically 5 for ‘Strongly agree’ down to 1 for ‘Strongly disagree’ (these may need to be reversed for negative statements). The assigned codes are then treated as numerical values.
3 The list is tested on a screening sample of 100–200 respondents representative of the larger group to be studied and a total is derived for each respondent by adding up the values.
4 Statements that do not discriminate (i.e. everybody gives the same or similar answers), or that do not correlate with the total, are discarded. This is a procedure Likert called ‘item analysis’ and it avoids cluttering up the final scale with items that are either irrelevant or inconsistent with the other items.
5 The remaining statements, such as the ones in Figure 1.3, are then administered to the main sample of respondents, usually as part of a wider questionnaire survey. The items in Figure 1.3 were generated by ‘converting’ the items in Figure 1.2 into a set of Likert items.
6 Totals are derived for each respondent. These totals can be used in a variety of ways that are explained in Chapter 6.
Figure 1.3 A Likert scale
There are a number of fairly fundamental problems with the Likert scale, and indeed all summated rating scales:
The totals for each respondent may be derived from very different combinations of response. Thus a score of 15 may be derived either by neither agreeing nor disagreeing with all the items or by strongly agreeing with some and strongly disagreeing with others. Consequently, it is often a good idea also to analyse the patterns of each response on an item-by-item basis.
The derived totals are in no sense absolute; they only show relative positions. There are no ‘units’ of agreement or disagreement, while often, as in this example, the minimum score is not 0 so that a respondent scoring 20 is not ‘twice’ as favourable as another scoring 10. All we can really say is that a score of 20 is ‘higher’ than a score of 10 or 15 or whatever.
The screening sample and subsequent item analysis are often omitted by researchers who simply generate the statements, probably derived from or based on previous tests, and go straight to the main sample. This is in many ways a pity, since leaving out scale refinement and purification will result in more ambiguous, less valid and less reliable instruments.
The process of summating the ratings is potentially imposing a number system that forces metric characteristics (see ‘values’ in the next section) onto concepts that may not inherently possess these characteristics.
Such scales assume that individuals lie along a single dimension from positive to negative.
The analysis of data from summated rating scales can be quite complex, yet is seldom discussed in books on research methodology or data analysis. It involves using a range of univariate, bivariate and, sometimes, multivariate techniques, which are considered in Chapters 4–6 of this book. For a specific discussion of an example of analysing such data, see Kent (2007: 323–8).
While derived measures create a single total for each case, multidimensional models, by contrast, allow for the possibility not only that there is more than one characteristic underlying a set of observations, but also that these cannot be summed or transposed into a derived score. One possibility is to generate a profile of each dimension which is described separately in order to present a more complete picture. Ratings can be used to calculate an average across cases separately for each item, so that, for Figure 1.3, for example, there would be an average score for I get through very quickly and another for I always get the right person, and so on. There would be no attempt to add up scores for the five items. A more common way of obtaining a profile is to use a semantic differential. These measures were developed by Osgood et al. (1957) and were designed originally to investigate the underlying structure of words, but have subsequently been adapted to measure, for example, images of organizations or the services they offer. They present characteristics as a series of opposites, which may be either bipolar, like ‘sweet’ through to ‘sour’, or monopolar, like ‘sweet’ through to not ‘sweet’. Respondents may be asked to indicate, usually on a seven-value rating, where between the two extremes their views lie, as illustrated