Practical Statistics for Nursing and Health Care. Jim Fowler
(Section 2.4), where a list of all asthmatic patients registered with the GP practice exists. Quota and cluster sampling are used when it is not possible or practicable to enumerate every member of the study population.
2.7 Simple Random Sampling
In a simple random sampling design, every individual in the study population has an equal chance of being included in the sample. That is to say, steps are taken to avoid bias in the sampling. In our asthma example above, the population being sampled is all patients registered with the GP practice who are known to have asthma (say, 800). To select a simple random sample of size n = 20, each patient (‘sampling unit’) is assigned a unique number: 1, 2, 3, and so on, until all 800 patients have been numbered. Then 20 numbers in the range 1–800 are selected at random, and the patients (sampling units) corresponding to these numbers represent the sample.
First, use may be made of random number tables. Appendix A is such a table. The numbers are arranged in groups of five in rows and columns, but this arrangement is arbitrary. Starting at the top left corner, you may read: 2, 3, 1, 5, 7, 5, 4 …; or 23, 15, 75, 48, …; or 231, 575, 485 …; or 23.1, 57.5, 48.5, 90.1, …; and so on, according to your needs. When you have obtained the numbers you need for your investigation, mark the place in pencil. Next time, carry on where you left off. It is possible that a random number will prescribe a subject (sampling unit) that has already been drawn. In this event, ignore the number and take the next random number. The purpose is to eliminate your prejudice as to which items should be selected for measurement. Unfortunately, observer bias, conscious or unconscious, is notoriously difficult to avoid when gathering data in support of a particular hunch!
Second, many calculators and statistical software have a facility for generating random numbers. For example, within LibreOffice Calc spreadsheet typing ‘=RAND()’ within a cell and pressing <Enter> generates a random number between 0.0 and 1.0 in the form of a decimal fraction, e.g. 0.2771459. To generate more random decimal fractions use the mouse to drag the lower right corner of the cell containing the results of applying the ‘=RAND()’ function down the required number of rows. Please note that many spreadsheets have an auto‐update function whereby formulae are updated after each calculation. To avoid this copy the column of random numbers you have generated and then go to a new cell, right button click your mouse and select ‘Paste Special’ and tick the ‘Numbers’ box and then ‘OK.’ Once you have fixed the random decimal fractions you may use this to provide a set of integers, 2, 7, 7, 1 by multiplying by 10 and using the first digit only; or 27, 71, 45, … by multiplying by 100; or 277, 145; or 2.7, 7.1; and so on, according to your needs.
Random sampling is the preferred approach to sampling. Although it does not guarantee that a representative sample is taken from the study population (due to sampling error, described in Section 10.1), it gives a better chance than any other method of achieving this.
2.8 Systematic Sampling
Systematic sampling has similarities with simple random sampling, in that the first subject in the sample is chosen at random and then every subsequent tenth or twentieth patient (for example) is chosen to cover the entire range of the population.
Example 2.1 Systematic Sampling Interval Calculation
What interval is required to select a systematic sample of size 20 from a population of 800?
The required fixed interval is:
Therefore, the first patient (‘sampling unit’) is selected at random (as described in Section 2.8) from among patients numbered 1–40. Suppose number 23 is selected. The sample then comprises patients 23, 63, 103, 143, …, 783.
A disadvantage of systematic sampling occurs when the patients are listed in the population in some sort of periodic order, and thus we might inadvertently systematically exclude a subgroup of the population. For example, given a population of 800 patients listed by ‘first attendance’ at the clinic, and that over a 20‐week period, 40 patients registered per week, 20 during the daytime and 20 during the evening surgeries. If these patients were listed in the following order: Week 1 daytime patients, Week 1 evening patients, Week 2 daytime patients, …, Week 10 evening patients, then selecting patients 23, 63, …, 783 would result in a sample of evening clinic patients, and exclude all the daytime patients. It is possible that this could generate a biased, or unrepresentative, sample.
An argument in favour of systematic sampling occurs when patients are listed in the population in chronological order, say, by date of first attendance at the GP practice. A systematic sample would yield units whose age distribution is more likely to perfectly represent the study population.
2.9 Stratified Sampling
Stratified sampling is effective when the population comprises a number of subgroups (or ‘sub‐populations’) that are thought to have an effect on the data being collected, such as male and female, different age groupings, or ethnic origin. These subgroups are called strata. A stratum (‘layer’) is defined as a collection of individuals (sampling units) that are as alike as possible. For example, the credibility of results from a study of breast cancer would be in doubt if the proportion of premenopausal patients differed between two samples selected for comparison. By defining two strata, namely ‘pre‐menopausal patients’ and ‘not pre‐menopausal patients’, this problem is avoided.
A simple random sample is taken from each stratum. The resulting stratified samples are then more likely to reproduce the characteristics of the population. The two main approaches to deciding how many individuals should be sampled from each stratum are equal allocation and proportional allocation. The first approach results in an equal number of individuals per stratum, while the second provides samples in which the sample sizes from each stratum reflects the sizes of those in the population.
2.10 Quota Sampling
Quota sampling differs from stratified sampling in that a simple random sample is not chosen from each stratum. Instead, the sample is obtained by using the most accessible patients, as long as they represent the identified subgroups. For example, if we require details relating to 20 women patients with asthma between 30 and 50 years of age, we do not identify all individuals satisfying these criteria in the population in order to take a simple random sample of these. Rather, we simply select the first 20 individuals who present themselves and fulfil these criteria.
Quota sampling is so called because the number of sampling units (e.g. patients) required in a particular sample is referred to as the quota to be obtained. If making comparisons between different subgroups (e.g. adults and children), the sizes of the sample from each subgroup are usually decided to reflect the proportions in the population. For example, if there are twice as many adults as children in the available population, the quota of adults is twice as large as the children.
The main problem with quota sampling is that accessible individuals may not be representative of the study population. Patients who attend at their GP practice regularly may be different from those who don't, or who are unable to attend through work or other commitments.
2.11 Cluster Sampling