Analysing Quantitative Data. Raymond A Kent
missing values and ‘Don’t know’ responses;
coding open-ended questions;
creating crisp or fuzzy set memberships from nominal, ordered category, ranked or metric measures.
Most of these tasks can be accomplished by using SPSS procedures, which are explained in Boxes 2.2–2.6 in this chapter.
Regrouping values
Where there are several or many values in a nominal or ordered category variable, particularly if the number of cases in the dataset is fewer than 300 or so, or if the frequencies in some of the categories are very small, it may make sense to add together the frequencies in adjacent categories if the variable is ordered, or in a way that ‘makes sense’ if it is nominal. In the alcohol marketing survey, respondents were asked what they felt about alcohol adverts on the whole. Figure 2.4, for example, shows that only 13 out of a total of 920 respondents from the alcohol marketing survey responded ‘I like alcohol adverts a lot’. It seems sensible to add these to the next category, ‘I like alcohol adverts a little’. To keep the value set balanced, the disliking of adverts a little and a lot can also be added together. The 17 who responded ‘Don’t know’ can be treated as missing values (the handling of missing values is considered later in this chapter). The resulting table is shown in Figure 2.5. Note that the Valid Percent is based on the 903 non-missing responses. Box 2.2 shows you how to do this in SPSS.
Figure 2.4 Liking of alcohol adverts
Figure 2.5 Liking of alcohol adverts collapsed to three categories
Box 2.2 Regrouping values in SPSS
If you need to transform a variable by regrouping categories, then it is the Recode procedure that you need. From the Value Labels box (which you can obtain from the Variable View screen) you can see that the codes allocated for the responses to how they felt about alcohol adverts as a whole are as shown in Figure 2.6. We need to add together codes 1 and 2, codes 4 and 5, and treat code 6 as a missing value. In SPSS, from the Menu bar, select Transform|Recode Into Different Variables. From the list of variables, select Likeads (‘How do you feel about alcohol adverts on the whole?’) and transfer to the Input Variable –> Output Variable box. Now click on Old and New Values. We need codes 1 and 2 to become 1, so in the Old Value dialog area on the left click on the first Range radio button and enter 1 then through and 2. In the New Value dialog area on the right enter 1 in the Value box and click on Add. This instruction will now be entered into the Old –> New box. We want to change code 3 to 2, so click on the Value radio button under Old Value and enter 3. Now enter 2 under New Value and click on Add. We now want codes 4 and 5 to be 3. Click on the Range radio button and enter 4 through 5. Under New Value enter 3 and click on Add. Click on Continue. Under Old Value, add code 6 and click on System-missing under New Value. Give the Output Variable a name in the Name box, for example Likeads3, and click on Change then OK. The new variable will appear as the last column.
Figure 2.6 Liking of alcohol adverts
To add value labels for the new variable, change to the Variable View. Click on the right corner of the Values cell in the appropriate row and obtain the Value Labels dialog box. Enter 1 in Value and Important under Value Label and click on Add. Now enter 2 in Value and Neither under Value Label and click on Add. Finally, enter 3 in Value and Unimportant under Value Label and click on Add. Now click on Continue and OK. You can now check this out using the Analyze|Descriptive Statistics|Frequencies procedure.
Creating class intervals
These are used to group together ranges of values on metric measures to enable the researcher to get an overview of the distribution. The intervals must be non-overlapping and as far as possible of the same width. In the alcohol marketing data, young people were asked whether or not they had seen adverts for alcohol in any of 16 different channels, and a new variable, Total number of channels seen (Totalseen), was created. Figure 2.7 shows the frequencies for each number of channels. Thus 33 claim to have seen no adverts for alcohol on any of these listed channels, one had seen them on all channels, but the majority, just over 50 per cent, said they had seen them on between four and seven channels. The creation of class intervals may be approached in a number of ways. From the Cumulative Percent column, you can see that nearly 50 per cent (49.3 per cent) had seen such adverts on up to five channels. If the researcher wanted to create a two-value measure, then the 920 cases could be split into those who had seen alcohol adverts on up to five channels and those who had seen six or more channels, as shown in Figure 2.8. This is not a binary measure in the sense that Six or more channels seen is not the ‘absence’ of membership of the category Up to five channels seen. It is really a two-value nominal measure – a dichotomy. Furthermore, the latter category includes the 33 who said they had not seen any. These do not sit well in any classification of the number of channels in which alcohol adverts were ‘seen’. It would make sense to keep the 33 as a separate category in the three-value measure in Figure 2.8. Another option is to treat the 33 as ‘missing values’ and exclude them from the table. The treatment of missing values is considered later in this section.
The two- and three-value solutions above can be considered not so much as ‘intervals’ as ordered category solutions. To keep the measure metric, the number of channels in which adverts for alcohol were seen could be grouped into several intervals of equal width, for example, of three channels, 0–2, 3–5, 6–8, 9–11, 12–14 and 15–16. The last interval is only two channels: nothing is ever perfect and compromises have to be made. The usefulness of doing this is limited when there are few values to be grouped. However, if there were 100 or so (as there would be with age in years for individuals), then grouping into class intervals (of perhaps 10 years) and obtaining a frequency distribution would enable the researcher to overview the entire distribution pattern in a simple table.
Figure 2.7 Number of channels on which adverts for alcohol have been seen
Figure 2.8 Figure 2.7 recoded into two- and three-value measures
The number, width and placing of the intervals are matters for researcher judgement and may be subject to trial and error, with the researcher trying out different groupings to see to what extent this may affect the results. To view a distribution a useful rule of thumb is to create between about 5 and 15 intervals. If there are outliers – values that are substantially different from the general body of values – then there may need to be open-ended classes at either or both ends of the table. This is quite a common way of dealing with extreme values, but it does mean that the width of the open-ended intervals is unknown. Creating class intervals in SPSS is explained in Box 2.3.
Box 2.3 Creating class intervals in SPSS
To create class intervals in SPSS you need the Recode procedure again. In Old and New Values enter the ranges 0–5 and 6–16, giving these the new codes of 1 and 2 for the two-value solution, and 0–2, 3–5, 6–8, and so on for the metric class interval solution, again giving each interval a new code. Note that ‘Up to 5 channels seen’ and ‘Six or more channels seen’ are the new researcher-defined values, but what SPSS is calling New Values are the codes that ‘stand for’ the new categories.
Computing totals
It is sometimes helpful to add together the values recorded either for two or more variables within cases or for