Introduction to Experimental Linguistics. Sandrine Zufferey
understand a second language superficially, and people capable of perfectly mastering both languages. A corollary of such a definition would be that very few people would belong to the monolingual group, since many people are familiar with one or more languages, apart from their mother tongue. On the other extreme, we could consider belonging to the bilingual group as only those with a perfect command of their second language. In this case, the bilingual group as would be more homogeneous, in the sense that all those belonging to it would have similar competences in their second language. But this definition raises additional questions: what do we mean by perfect command and how can command be measured? This example illustrates the need to clearly and precisely define the variables investigated in a research process. This definition procedure is called the operationalization of a research question. It represents a crucial phase in quantitative research, and we will discuss it in depth in Chapter 2.
To summarize, quantitative research aims to investigate the relationship between two or more variables. To do this, it starts from a hypothesis and defines the measures used for studying the chosen variables. Then, it relies on digital data collected from a large number of people and analyzes such data using statistical tests, in order to generalize the results.
1.1.2. Observational research and experimental research
Quantitative approaches in linguistics make an important difference between observational research and experimental research. The first example of a research tool, the questionnaire, is frequently used in linguistics to collect data in a quantitative manner. A questionnaire is a set of questions aimed at collecting different types of information about speakers, such as personal characteristics, their use of certain words or linguistic structures, or their point of view about certain linguistic phenomena. Let us now imagine that you wish to know whether there is a difference in the way that French speakers from France, Belgium and Switzerland refer to a yogurt. As Avanzi (2019) did, you could directly ask a large number of French, Belgian and Swiss people to tell you which of the two possible names, yaourt or yoghourt, they use on a daily basis. By counting the responses of more than 7,000 people, Avanzi showed that the form yaourt is mainly used in France, whereas it is never used in Switzerland, where yoghourt is the only form in use. In Belgium, the choice of yaourt and yoghourt varies from region to region.
In a slightly different way, instead of relying on the answers of people in a questionnaire, you could use linguistic data retrieved from natural productions and carry out a corpus study. In such studies, linguistic productions in the form of texts, audio or video recordings are used with the aim of counting the number of word occurrences, a grammatical form or any linguistic characteristic. In order to research the uses of yaourt or yoghourt in France, Belgium and Switzerland, first it would be necessary to select corpora comprising linguistic productions collected from these different regions. This data could come from French, Belgian and Swiss newspapers, for example. The number of occurrences of each form could be counted in each corpus and then compared, in order to reveal differences in the use of these forms from country to country.
Another way of studying quantitative data is to examine the link between two variables. Let us imagine that you wish to study the relation between learners’ age and their ability to acquire a second language. Extensive research has already been devoted to this topic and suggests that the older people are when learning a second language, the more difficult it is for them to reach a high level of proficiency (see DeKeyser and Larson-Hall (2005) for a review). In order to confirm (or refute) this hypothesis, you could test a large number of people who start learning a language at different ages and measure their language proficiency after a certain period of time. In this example, the first variable, the age when learning begins, is a quantitative variable. Likewise, the second variable, language proficiency, can be measured quantitatively using a language test. Using an appropriate statistical test, it is possible to show the existence of a link between these two variables. This type of procedure is called correlational research and unveils the degree of dependence between two variables, which is called correlation. In the case of our example, if age plays a role in second language acquisition, the correlation obtained by our test would show that the older a person is when the process of learning a language begins, the lower their mastery of the language will be after a certain learning period.
The various studies described above correspond to research based on data observation. This type of research is generally used when, for practical or ethical reasons, it is necessary to observe variables from the outside. In this type of research, researchers do not interfere with the object of study, but observe the relationship between two variables at a given moment. As a consequence, the results of an observational study must be kept at a descriptive level, since it is not possible to infer a causal relation between two variables. In our example of a correlational study, the age when learning begins is related to language proficiency, but it is not possible to state that an increase in age is the cause for the decrease in language proficiency. It might be possible that other variables not considered in our research can also explain the relationship between the variables examined. We could imagine, for example, that the context in which second language acquisition takes place is not the same depending on the age when the learning process begins. It is likely that when young children learn a second language, this takes place within a family setting, where parents may speak different languages or a different language from that of the external environment. When older people start learning a language, it is probable that they grew up in a monolingual linguistic environment and later discovered a second language at school, or when moving to another country, for example. The type of linguistic exchanges may also differ depending on age, as well as the motivation to learn, cognitive skills or many other variables. These external variables that are left aside during research are called confounding variables and are related to the two variables examined, age and language proficiency. It could be, that language learning conditions rather than age itself can account for the differences in language levels. Since it is impossible to distinguish the variables examined, from confounding variables, research based on the observation of data should not draw a conclusion from a causal relation between two variables.
In order to determine a causal relation between two variables, it is necessary to exclude any confounding variable. By using experimental methodology, the variables of interest can be manipulated to determine what effect a variable has on another variable, regardless of other possibly interfering variables. In other words, rather than observing natural data, the experimental methodology defines the conditions under which a phenomenon could be observed and then sets up an experiment in which these conditions can be manipulated, in order to measure their influence on the phenomenon under investigation. In the rest of this chapter, we will describe in more detail the various characteristics of experimental research.
1.2. Characteristics of experimental research
In this section, we will first stress the fact that experimental research must be based on a research question that makes it possible to formulate precise hypotheses. We will then see that in order to empirically assess a hypothesis, an experimental study must manipulate variables of interest while controlling other variables, which may influence the outcome of the experiment. Finally, we will discuss some methodological aspects of data collection, so that they can be analyzed through the use of statistics. These points will be elaborated in detail in the chapters dedicated to these different aspects.
1.2.1. Research questions and hypotheses
We have already emphasized that experimental research is part of a scientific process. It builds on existing knowledge in a research field and aims to increase such knowledge by studying a research question generated on the basis of an existing theory. A scientific research question identifies the potential cause for a phenomenon and postulates a cause to effect relation between the cause and phenomenon. For example, the question “how do we understand a text?” is not a research question, as it is too vague. Such a question corresponds to a general research topic, from which many