Analysing Quantitative Data. Raymond A Kent
All scientists, whether of the physical or social variety, study cases. They look for patterns or causal mechanisms among a set of entities under investigation – entities that they deem to be ‘instances’ of the phenomena being studied. Cases may be ‘micro’ entities like individuals, families, social groups of individuals or households, or they may be ‘macro’ entities like organizations, governments or nation-states. Alternatively, cases might be places, geographical areas, objects or events. Usually, these entities have specified characteristics in common, for example they are ‘alcoholics’, ‘single-person households’, ‘small businesses’, ‘private hospitals’, ‘faith schools’ or ‘democratic countries’. Cases may be legally recognized entities like a joint stock company, they may be researcher constructs, like a ‘small to medium enterprise’, or they may be a statistical artefact such as a health sector generated from a cluster analysis. Cases may be complex adaptive systems like nation-states; they may have open boundaries, emergent properties and transformational potential like social networks.
Cases are always located in time and geographical area. Time may be a moment of time or a period of time. They become sites, or potential sites, for the capture of data and while each case is inherently separate from other cases and unique in its multifarious characteristics, it is nevertheless treated as sufficiently ‘similar’ to other cases in its display of the characteristics being studied. Researchers define cases by outlining the selected characteristics that distinguish between members and non-members of a research population, but often they do so without offering a rationale for the time and places selected, for the characteristics selected or discussing what other characteristics may be important. Researchers are apt to use terms like ‘case’, ‘number of cases’ or ‘case study’, but often without any consideration of what a ‘case’ is, how it may be defined for any particular piece of research, why its creation or selection is relevant and appropriate to the objectives of the research, and why a single case, a small number of cases, a sample of cases or an entire population of cases selected for study is suitable.
Researchers who are constructing quantitative data tend to treat cases as ‘real’ objects and, furthermore, to accept established entities like households, neighbourhoods, small to medium enterprises, organizations, firms or cities as cases. These are seen to exist independently of any particular research effort; however, the researcher still needs to define them, for example what counts as a ‘household’ or a ‘city’. Researchers seldom problematize the nature and boundaries of their cases; for example, what does the concept of ‘a social network’ mean in the context of this research and can there be degrees of membership of this category? Researchers quite frequently assume that entities like organizations or groups of individuals can ‘behave’ and ‘think’ like individuals. Companies in particular are often described as ‘doing’ things like putting up prices or reacting to competitors in particular ways. Universities or hospitals may even be seen to have attitudes and opinions. Organizations can now even be sued for ‘corporate manslaughter’. Committees within the organization, perhaps as a result of a vote or some other process, ‘decide’ to do things or to take a view on an issue.
In survey research the cases are typically individuals. This is understandable since it is only an individual who can respond to survey questions; furthermore, the individual as an entity is easy to define – we know what an ‘individual’ is and what its boundaries are – but researchers still need to define what sorts of individual are members of the research population, for example ‘current prisoners serving sentences of five or more years in the UK in December 2013’. The individual acts as the ‘base’ unit against which values are recorded and from aggregations of which higher order units can be constructed and described. Thus the characteristics of ‘departments’ in an organization might be derived from aggregations of characteristics of the individuals who are its members. If the researcher calculates an average age or proportion male then these are features of the higher order unit, not of any individual cases. If the researcher is interested in making comparisons between departments in an organization, then implicitly, if not explicitly, departments now become cases.
The problem with this aggregative approach is that it may tempt the researcher to avoid considering issues relating to holistic, ‘global’ properties of cases, for example the boundaries between departments in an organization, the ‘reality’ or ‘entity-ness’ of households, along with issues of emergent properties that are not just the sum of individual characteristics. There are all kinds of assumptions – conceptual, theoretical and meta-theoretical – that are involved here. In short, the process of case creation or ‘casing’ – to use Ragin’s (1992) term – and the resultant ‘case-ness’ of the entity are not just philosophical issues, but have very practical implications for the process of data construction.
Any quantitative research will focus its attention on a specified set of cases. The set may be small or large; it may be a sample of cases, an entire population of cases or an incomplete attempt to engage the entire population. In any one particular piece of research there may be more than one type of case; indeed cases may be nested in hierarchical fashion – individuals within department, within organizations, within industries, within regions, for example. When researchers talk about the ‘number of cases’ in their research, they may be referring to a variety of different things: the total population of cases in the defined set (but whose precise number may not be known), the number of cases selected for the sample, the number of usable returns in a survey, the number for which values have been successfully recorded for a given property, or the number used in a particular calculation or statistical procedure.
Particularly where macro entities are involved, the number of cases available for study may be severely limited, for example there may only be 20 or 30 organizations of the type the researcher wished to study. Researchers may, alternatively, decide to restrict severely the number of cases to study so that each one can be investigated in detail. Cases may well be selected in ways that meet the purposes of the researcher, for example because they are ones to which researchers have access, that they consider to be the most important, that represent extremes in terms of the phenomena or outcomes being studied, that are typical or that correspond to the needs of particular research designs. Thus cases may be selected because they are most similar except in terms of the outcome and factors that theory suggests may be important in contributing to that outcome. Extraneous factors are thus ‘controlled’ as far as possible and differences in outcome may be attributed to the remaining factors that differentiate the cases. Alternatively, the strategy may be to select cases that are most different in the belief that contrasting cases will eliminate factors that are not linked to identical outcomes. Case selection may be an iterative process whereby new cases may be added or others dropped in the ongoing piece of research when new hypotheses arise that may be confirmed with more similar, or falsified with more different, cases.
Larger numbers of cases may be used in research, perhaps because the population of cases is quite large and researchers want to study them all, or because researchers want a sample that is, as far as possible, representative of the population of cases from which the same was drawn. This might enable researchers to draw conclusions about the population from evidence in the sample. Cases may be selected using randomized techniques which are the equivalent of drawing names out of a hat and are independent of human judgement, or, in a survey, interviewers may be asked to make a selection, but according to agreed rules, or they may be given quotas of types of respondent to fill.
For descriptive studies, it may be sufficient or appropriate to select only those cases that have experienced the phenomenon being studied, for example ‘failing’ schools. However, for research that goes on to examine relationships between properties, it will be usual to include both cases that have experienced the phenomenon and cases that have not. If, for example, a researcher wishes to study patterns of political mobilization within ethnic groups, then he or she will include countries that have and have not had this experience. However, does the wider population include all countries on the globe, or just those countries in which political mobilization is thought possible? If the latter, is that all countries in which there is ethnic diversity, countries where there is also ethnic inequality or countries where there is in addition a non-repressive political system?
Key