Analysing Quantitative Data. Raymond A Kent

Analysing Quantitative Data

the transformation of some of the properties in a dataset may involve one or more of a number of activities, including, for variables, regrouping values on a nominal or ordered category measure to create fewer categories, creating class intervals from metric measures, computing totals or other scores from combinations of several variables, treating groups of variables as a multiple response question, upgrading or downgrading measures, handling missing values and ‘Don’t know’ responses, or coding open-ended questions;

that, for set memberships, transformations may entail creating crisp sets or fuzzy sets from existing variables;

how survey analysis software like SPSS can be used for assembling data and assisting in data transformations;

that many of the codes used in the original alcohol marketing dataset were illogical or inconsistent and many data transformations were needed before analysis could begin.

Introduction

Before quantitative data that have been constructed by researchers can be analysed using techniques appropriate for pursuing the objectives of the research, they need to be prepared in various ways to make them ready for analysis. In their raw form, captured data will consist of stacks of completed paper questionnaires or diaries, entries into an online questionnaire or records made by researchers themselves. Before statistical techniques can be applied to the data, they will need to undergo many of the various processes listed in Figure 2.1.

Figure 2.1 The data preparation process

Checking and editing

Most quantitative data in the social sciences will have been captured using some form of questionnaire, whether paper or electronic, so the first step involves checking these for usability as they are received. Questionnaires returned by interviewers or by respondents may be unusable for a number of reasons, for example:

an unacceptable number of the questions that are appropriate for a given respondent have not been answered;

the pattern of responses is such that it indicates that the respondent did not either understand or follow instructions – for example, questions that require a single response may have been given two or more responses;

one or more pages are physically missing;

the questionnaire has been answered by somebody who is not a member of the survey population;

the questionnaire was received too late to include in the analysis.

The number of returned but unusable questionnaires will generally be quite small, so discarding these will not usually be a problem. If the number of discards is large, the researcher will need to check whether they are in any obvious way different from those that are usable. Either way the number of discards should be declared in the report of the research.

Editing involves verifying response consistency and accuracy, making necessary corrections and deciding whether some or all parts of a questionnaire should be discarded. Some of these checks include:

logical checks, for example a 16 year old claiming to have a PhD, a male claiming to have had an epidural at the birth of his last child, or a respondent may have answered a series of questions about his or her usage of a particular product, but other responses indicate that he or she does not possess one or has never used one;

range checks, for example a code of 8 is entered when there are only six response categories for that question;

response set checks, for example somebody has ‘strongly agreed’ with all the items on a Likert scale.

Where a question fails a logical check, then the pattern of responses in the rest of the questionnaire may be scrutinized to see what is the most likely explanation for the apparent inconsistency. Range check failures may be referred back to the original respondent. Response set checks may indicate that the respondent is simply being frivolous and the questionnaire may be discarded.

If questionnaires are checked and edited as they come in, it may still be possible to remedy fieldwork deficiencies before they turn into a major problem. If problems are traced to particular interviewers, for example, then they can be replaced or asked to undergo further training. It may be possible to re-contact respondents to seek clarification or completion before the last date on which data can be processed. If this is not feasible, the researcher can treat these as missing values, that is, treat the questions involved as unanswered. This may be suitable if the unsatisfactory responses are not key properties and the number of questions concerned is quite small. If this is not the case, values may be imputed. How this can be done is discussed later in the chapter. The alternative is to discard the questionnaire.

Coding

As was explained in Chapter 1, data analysis software usually requires that all the values to be entered are either already numerical (as in age = 23) or they are given a number that is a code that ‘stands for’ values that are in words. Binary variables will normally be coded either as 1 or 0 or as 1 or 2. The categories for nominal and ordered category variables will generally be numbered 1, 2, 3, 4, and so on. Note that it makes sense for ordered categories to give the highest code number to the highest or most positive value as in Figures 1.2 and 1.3 in Chapter 1. Metric data already have numerical values that can be entered directly, for example the number of units of alcohol consumed last week as 10.7. Some, perhaps all, of the categorical responses on a questionnaire will have been pre-coded, that is they are already numbered on the questionnaire. If not, they need to be coded afterwards by the researcher.

Qualitative responses to open-ended questions will normally be classified into categories, which are then coded. The categories developed should meet the minimum requirements for a binary or nominal measure, namely, they should be exhaustive, mutually exclusive and refer to a single dimension. If, however, most of the spaces left for text in the questionnaire have been left empty, it may not be worthwhile doing this. Some pre-coded questions may have an ‘Other, please specify’ category, in which case some further coding may be worthwhile.

If a question is unanswered, the researcher, when entering data into a survey analysis program, can record a missing value or enter a code for, for example, ‘Not applicable’ or ‘Refused to answer’. For multiple response questions where the respondent can indicate more than one category as applicable, each response category will need to be treated as a separate variable, and will usually be coded as 1 if the category is ticked and 0 or 2 if not. The treatment of open-ended and multiple response questions is considered in more detail later in this chapter.

In large-scale projects, and particularly when data entry is to be performed by a number of people or by subcontractors, researchers will often develop a codebook, which lists all the variable names (which are short, one-word identifiers), the variable labels (which are more extended descriptions of the variables and which appear as table or chart headings), the response categories used and the code numbers assigned. This means that any researcher can work on the dataset irrespective of whether or not they were involved in the project in its formative stages. Codebooks, however, are not always needed. Survey analysis packages like SPSS record all this information as part of the data matrix.

Assembling

Data assembly means gathering together all the checked, edited and coded questionnaires, diaries or other forms of record, and entering the values for each variable for each case into data analysis software. This is usually achieved in a framework of rows and columns for storing the data called a data matrix. Data matrices are explained in more detail in Chapter

Скачать книгу