Handbook of Web Surveys. Jelke Bethlehem
a self‐completed questionnaire; thus, for example, self‐completed surveys are better in capturing sensitive topics. If the questionnaire is long self‐completion, especially using web mode, it is not in favor of high participation rates and smaller measurement error. This becomes especially critical in mobile web surveys, where the participation situation has a high probability to be disturbed from other activities or tasks the interviewee is doing.
The second step is Metadata definition. Recent literature and empirical research brought significant attention to metadata, showing a set of information describing the survey (that is, elements for evaluating the data quality, like the list and description of the variables, the target population, the sample units, and so forth). The metadata refer to variables and activities occurring in every step of the process. The International Organization for Standardization (ISO) defines metadata as “data describing and defining other data in a certain context.” In such a case, the contest is the condition under which the data collection and processing takes place. The definition of metadata is an important step, recommended for all surveys, independently from the mode. Details to consider in mobile web surveys (and in web‐only surveys) differ from traditional surveys due to the characteristics of various steps and sub‐steps. Metadata are mainly semantic based; a bad or imprecise metadata description affects the quality of the questionnaire and could cause errors in the answers (questions not correctly understood, questions that each interviewed could interpret in a different way). Metadata are useful for the selection of appropriate sampling techniques and for guiding and evaluating the procedures of the survey process. The so‐called metadata database contains metadata, which complement the data database. See Example 3.1. The data user should look at the metadata to be conscious of the quality and meaning of the data he or she is going to use. On the other hand, a surveyor should take care to write down the metadata precisely because accuracy is crucial for undertaking the survey process in a correct and valuable way. For example, if a surveyor is drawing his probability‐based sample from a largely incomplete sampling list, it will not be possible to obtain highly accurate results from the inference process. Thus, information about the sampling frames characteristics (coverage, reference time of the frame, i.e. is it updated or an old one?) is an important quality. Another example is a bad or insufficient description of one variable; in this case, each interviewed could assign to the variable a different meaning, sorting out in scarcely meaningful results. From the user point of view, this implies both the use of low‐quality data and the impossibility to understand what is the effective meaning of the variable and how to interpret the data.
EXAMPLE 3.1 A metadata database: variables definitions
The Eurostat website is offering a metadata database that includes Euro‐SDMX Metadata Structure (ESMS) (a set of international standards for the exchange of statistical information between organizations), classifications, legislation and methodology, concepts and definitions (CODED, Eurostat's Concepts and Definitions Database, and other online glossaries relating to survey statistics), glossary, national methodologies, and standard code lists.
A section reports the description of the variables in different sources. Variable descriptions are detailed explanations of the researcher's intended meaning of the variable in the questionnaire, and it is one example of basic metadata. For example, for the purposes of the Labour Force Survey, the following definition is used: “Employees are defined as persons who work for a public or private employer and who receive compensation in the form of wages, salaries, fees, gratuities, payment by results or payment in kind; non‐conscripted members of the armed forces are also included.”
In structural business statistics, employees are defined as “those persons who work for an employer and who have an employment contract and receive compensation in the form of wages, salaries, fees, gratuities, piecework pay or remuneration in kind.”
Furthermore, a worker is a wage or salary earner of a particular unit if he or she receives a wage or salary from the unit, regardless where he or she works (in or outside the production unit). A worker from a temporary employment agency is considered to be an employee of the temporary employment agency not of the unit (customer) in which they work. Metadata states that “employees include part‐time workers, seasonal workers, person on strike or on short‐term leave, but excludes those persons on long‐term leave. Employees does not include voluntary worker.”
If the variable is not precisely declared, the respondents could compile the questionnaire according different concepts; one could exclude part‐time workers, whereas another could include them. Therefore, in such a survey, measurement error would arise, or a high number of nonresponses to the specific question (item nonresponse) would emerge due to the unclear variable definition.
Most statistical offices, both NSIs and various research bodies, present a section on metadata. Research institutes, marketing research societies, and every business or institution collecting survey data should provide a clear metadata definition and communicate it to the users.
The third step is the Designing the mobile web or web‐only survey; this may be broken down into sub‐steps. Firstly, two basic sub‐steps to consider are as follows: (1) decide if the study should be experimental or observational, and (2) decide the mode of data collection.
Regarding the sub‐step Decide if the study should be experimental or observational (sub‐step 1), it should be kept in mind that an experimental study tries to catch how different factors affect the results; thus, the task is to highlight relationships between factors and the results (or outputs). There is no special interest in estimating the values of the variables at the target population level. Observational studies, on the contrary, aim at estimating the values of the variables at the target population level. Designing a survey for an experimental study does not necessary require a probability‐based sample, because the major task is getting a sort of case study for investigating causal relationships. For example, in‐the‐moment surveys, typically reaching the interviewee on the smartphone, are often lacking in probability sampling criteria; thus, mostly they have just a value of experimental studies capturing emotions and opinion when the individual is experimenting some event or action. Observational studies focus at the level of variables estimation; therefore, the probability‐based sampling technique is crucial, and the sampling design is an important step. Socioeconomic surveys in general aim at the estimation of the whole target population estimates.
Sub‐step B, Deciding the mode of data collection, is important because it verifies if organizing a survey only via the web (or mobile web) is feasible and effective. Criteria for mode selection are general and related to several aspects of the research environment and the specific issue. A mobile web survey or a web survey in general, because it is self‐completed, fits extremely well for sensitive research questions and/or for short and simple questionnaires. Efficient implementation of complex questionnaires may be efficiently implemented; this happens particularly in official statistics. Web only or mobile web in this case is more problematic; mixed-mode is preferable. One relevant constraint in the use of a probability‐based mobile web survey is the availability of an adequate sampling frame. Thus, the choice of the mode depends on many factors, and a critical one is the sampling frame availability. An inadequate mode choice might let many types of errors arise (coverage errors, extremely high unit nonresponse, and so forth) bringing about a poor‐quality result. Due to the importance of an adequate mode selection for a probability‐based mobile web survey, Thorsdottir and Biffignandi present a flowchart to show the major steps driving the mode choice. Figure 3.2 presents the actions and the decisions to be undertaken when choosing the mode of data collection.
Moving to Figure 3.2, when selecting the mode, the first problem is deciding if it is possible to draw a probability‐based survey from target population under study, i.e., the question is if everyone does have an e‐mail address.