Analysing Quantitative Data. Raymond A Kent
range of software for assembling data is briefly reviewed at the end of Chapter 4. Some researchers like to assemble data first into a spreadsheet like Excel before exporting to a survey analysis package. Some survey analysis software packages allow the researcher to enter the data by clicking the appropriate box against the answer given on an electronic version of the questionnaire. In the background, the software creates the data matrix. The researcher does not need to pre-code response categories in the questionnaire nor engage in post-coding when the questionnaires have been completed. With online surveys, the data matrix is automatically built up as respondents submit their completed online questionnaires. Box 2.1 explains how to enter data into SPSS, the package that will be used throughout this text.
Mistakes can, of course, occur in the data entry process. Any entry that is outside the range of codes that have been allocated to a given variable will quickly show up in a table. Provided the questionnaires have been numbered, it is a simple matter to check the number of the respondent from the data matrix where the wrong codes have been entered and find what the code should have been from the questionnaire. In some packages any entry that is outside of a specified range will be flagged as the data are being keyed in. To detect erroneous codes that are inside the specified range, data may be subjected to double-entry data validation. In effect this means that the data are entered twice, usually by two different people, and any discrepancies in the two entries are flagged up by the computer and can be checked against the original questionnaire.
Box 2.1 Entering data into IBM SPSS Statistics
This software is one of the most widely used survey analysis computer programs and focuses exclusively on variable-based statistical analysis. It has gone through many versions, the latest being version 22.0. This text uses version 19.0. For more information and a free download of a demo version visit www.spss.com/spss.
You can almost certainly obtain access to SPSS by logging on to your own university or college network applications. The first window you will see is the Data Editor window (Figure 2.2). Before getting to the Data Editor, however, you need to tell SPSS what you want to do – open an existing data source, type in new data, and so on. If you are entering data for the first time, check the Type in data radio button. The Data Editor offers a data matrix whose rows represent cases (no row should contain data on more than one case) while the columns list the variables. The cells created by the intersection of rows and columns will contain the values of the variables for each case. No cell can contain more than one value.
Figure 2.2 IBM SPSS Statistics Data Editor screen
The alcohol marketing dataset was introduced in Chapter 1. The full dataset consists of 61 properties and 920 cases, and is available at https://study.sagepub.com/kent. As an exercise in data entry, instead of attempting to enter over 40,000 values, try entering just the nine key variables for the first 12 cases that are illustrated in the next chapter in Figure 3.1.
The key dependent variables relate to alcohol drinking behaviour and included here are Drinkstatus (whether or not they had ever had a proper alcoholic drink), Intentions (whether they think they will drink alcohol at any time in the next year) and Initiation (how old they were when they had their first proper alcoholic drink). Three key independent variables have been picked out: Totalaware (the number of alcohol marketing channels seen), Totalinvolve (the number of marketing involvements) and Likeads (how they feel about alcohol ads as a whole). Finally, there are two demographics: Gender (male or female) and Socialclass (A, B, C1, C2, D, E).
Before entering any data, it is advisable first to name the variables (if you do not, you will be supplied with exciting names like var00001 and var00002). These names must begin with a letter and must not end with a full stop/period. There must be no spaces and the names chosen should not be one of the key words that SPSS uses as special computing terms, for example and, not, eq, by, all.
To enter variable names, click on the Variable View tab at the bottom left of the Data Editor window. Each variable now occupies a row rather than a column as in the Data Editor window. Enter the name of the first variable Drinkstatus in the top left box. As soon as you hit Enter or the down arrow or right arrow, the remaining boxes will be filled with default settings, except for Label. It is always better to enter labels, since these are what are printed out in your tables and graphs. Labels can be the wording of the questions asked or a further explanation of the variable. For Drinkstatus, you can, for example, type in Have you ever had a proper alcohol drink? You can put in labels for the remaining seven variables.
For categorical variables, you will also need to put in Values and Value Labels. Click on the appropriate cell under Values and click again on the little blue box to the right of the cell. This will produce the Value Labels dialog box. Enter an appropriate code value (e.g. 1) and label Yes and click on Add. Repeat for each value. Note that, in SPSS, allocated codes are called ‘values’, while the values in words are ‘labels’.
The default under Decimals is usually two decimal places. If all the variables are integers, then it is worthwhile changing this to 0. Simply click on the cell and use the little down arrow to reduce to zero. Under Measure, you can put in the correct type of measure – Nominal, Ordinal or Scale. Note that Nominal includes binary measures, Ordinal does not distinguish between ordered category and ranked measures, and Scale refers to what have been called metric measures in Chapter 1. The default setting is Scale. Changing Measure to Nominal or Ordinal as appropriate creates a useful icon against each listed variable, making them easy to spot; it makes a difference to some operations in SPSS and forces you to think about what kind of measure is attained by each variable.
To copy any variable information to another variable, like value labels, just use Edit/Copy and Paste. SPSS does not have an automatic timed backup facility. You need to save your work regularly as you go along. Use the File|Save sequence as usual for Windows applications. The first time you go to save, you will be given the Save As dialog box. Make sure this indicates the drive you want. File|Exit will get you out of SPSS and back to the Program Manager or Windows desktop. SPSS will ask you if you want to save before exiting if unsaved changes have been made. Always save any changes to your data, but saving output is less important because it can quickly be recreated. The completed Variable View is shown in Figure 2.3.
Figure 2.3 The completed Variable View
Key points and wider issues
The careful checking, editing, coding and assembly of data should never be neglected. If poor-quality data are entered into the analysis, then no matter how sophisticated the statistical techniques applied, a poor or untrustworthy analysis will result. In this context the phrase ‘Garbage In, Garbage Out’ (or GIGO) is often mentioned. Checking, editing, coding, assembly and entry of data into a data matrix will commonly account for a substantial amount of time that an analyst will spend on the data.
Transforming
Before beginning data analysis, the researcher may wish to transform some of the variables in a number of ways that might include:
regrouping values on a nominal or ordered category measure to create fewer categories;
creating class intervals from metric measures;
computing totals or other scores from combinations of several values of variables;
treating groups of variables as a single multiple response question;
upgrading