What is sampling?
In this article
- What is sampling?
- Types of sampling
- Probability Sampling
- Systematic Samples
- Simple Random Samples
- Stratified Random Samples
- Random Cluster Sampling
- Important Notes
- Helpful References
In research, sampling refers to the selection of a smaller group of participants from the population of interest. While it would be ideal for the entire population you are researching to take part in your study, logistically this may not be feasible. Therefore, by researching a smaller and representative group obtained from your population of interest, we are able generalise the findings back to and make inferences about the whole population.
Types of Sampling:
Samples can be obtained in a variety of ways, and can include:
- Haphazard: haphazard sampling refers to the selection of a sample of participants using ‘trial and error’ or ‘hit and miss’ approaches. Haphazard sampling does not rely on any specific criteria or approaches, and often means that your results will be unpredictable and prone to error because there is no systematic approach used.
- Purposive: purposive sampling is the selection of a non-random sample, whereby participants are specifically chosen because of a particular reason (i.e.they were chosen on purpose because they meet a certain criteria). While purposive samples are advantageous because they focus on a specific group, if the researcher makes an error in judgement about whether an individual meets criterion for inclusion this will influence the results.
- Convenience: convenience sampling is another form of non-random sampling, where participants are chosen because of convenience-related reasons, such as accessibility or availability.Such samples are advantageous because participants are often readily available and easy to access, however this approach often increases the risk of bias in the results.
While each of the above approaches have their place, a common limitation to all three approaches is that they are quite subjective in nature, and heavily rely on researcher discretion to determine them. Another limitation is that since these approaches are targeted (i.e. non-random), this raises questions about whether we can generalise our findings to the broader population of interest. While how you sample will be based on your specific research question and study design, generally the preferred method of sampling is the probability approach.
Probability sampling, also referred to as random sampling,is the independent and random selection of participants based on probability theory, in that it is controlled by chance alone. Sampling based on probability is advantageous because it increases the likelihood of obtaining a sample that is more representative of the population you are interested in.For a sample to be genuinely random, each participant drawn from the population of interest must have an equal chance of being selected, and one participant being selected must occur independently of any other participant being selected.
There are several subtypes of probability sampling, and include systematic, simple random, stratified random and cluster samples. We will explore each of these types of samples using examples based on a controversial topic: does pineapple belongs on pizza?
A systematic sample is a type of probability sampling, however systematic samples are not random. In systematic sampling, a rule for selecting participants is pre-determined and applied. A common form of this is to select every ‘nth’ person to be part of the sample.
For our pizza example, suppose you wanted to select a sample of 25 guests out of 100 who are attending a pizza party, by surveying every 4th guest who arrives. Each guest is allocated a number from 1 to 100, and as they enter you ask every 4th person whether pineapple belongs on pizza. By choosing every 4th person, you obtain a probability sample because 25% of guests have been selected, however this sample is not random because any guest who was not number 4, 8, 12 and so on had zero chance of being selected.
Simple Random Samples:
Simple random samples are random samples selected from the population of interest where each participant in the sample is equally likely to be selected compared to the next participant. Simple random selection is often used when the population you are sampling is relatively homogenous, or similar. Selection from this population is made based on entirely randomised methods, such as by lottery.
Continuing our pizza example, suppose that most of the guests at the pizza party are pineapple lovers, and each guest is to be allocated a number from 1 to 100. While your colleague is handing out numbers to guests, you hop online and locate a random number generator. Once each guest has a number, you generate a random number between 1 and 100, and you ask the guest who corresponds with the generated number whether pineapple belongs on pizza. You repeat this 25 times until you have a sample of 25 guests out of the 100 attending. You can then collate the answers obtained and see what the consensus amongst pineapple lovers is about whether it belongs on pizza. This is a probability sample because 25% of guests have been selected, and it is random because there is an equal chance of being selected at random. Simple random selection was used because the sample was relatively homogenous, in that most of the guests are known lovers of pineapple.
Stratified Random Samples:
A stratified random sample is a random sample where two or more groups are represented from your population of interest. Stratified random sampling is more often used when the population you are sampling is relatively heterogeneous, or there are notable subgroups present. This involves dividing your population into the smaller groups and then randomly selecting a sample from each - in essence, you are treating it as if there are two populations. Common examples include stratifying by age, sex or ethnicity.
In relation to our pizza example, let’s assume that at our pizza party there are 55 females and 45 males, and you have reason to believe that both sexes will respond differently to the question of whether pineapple belongs on pizza. To be representative, and for a sample of 25%, you determine that you will need to survey 14 females and 11 males. You randomly allocate each female a number from 1 to 55, and each male a number from 1 to 45. Starting with the female group, as before you hop onto the random number generator and generate a number between 1 and 55, and ask the female guest who corresponds with the generated number whether pineapple belongs on pizza. You repeat this 14 times until you have completed your female sample. Once this has been completed, you repeat the entire process for the males.We can then compare the responses of females to males to see if one sex prefers pineapple on pizza more than the other sex. Like with simple random sampling, this example is a probability sample because 25% of guests from each subgroup have been selected, and it is random because there is an equal chance of being selected at random. Stratified random selection was used because the sample was heterogenous, in that there were males and females.
Random Cluster Sampling:
Cluster sampling occurs when groups of the population of interest are selected at random. Cluster sampling often occurs in two stages – in the first stage, the population of interest is broken down into the known clusters. Then, in the second stage, multiple clusters are randomly selected and participants within each of the chosen cluster are randomly chosen to comprise the final sample.
At our pizza party of 100 guests, let’s assume that there are clusters of guests who prefer different types of pizza bases – thin, medium, thick, cheese crust, hot-dog crust and gluten-free. You allocate each cluster a number from 1 to 6, and using a random number generator, determine that for a sample of approximately 25% you will be sampling guests from clusters who prefer thin and cheese crust pizza bases. You would then follow the same procedure as stratified random sampling, as you now have two groups to sample from. Each individual from the thin crust and cheese crust groups are allocated numbers, and 25% of each group are randomly selected via random number generation and asked whether they think pineapple belongs on pizza. Like before, this is a probability sample because 25% of guests from each cluster were selected, and it is random because there is an equal chance of being selected at random from each cluster. However, cluster sampling is not necessarily as representative as stratified random sampling because not all clusters were examined.
Probability sampling is advantageous because it reduces sampling bias and demonstrates diversity in your sample (and therefore population). Independent and random sampling is also often an assumption of many inferential statistics tests, so if this assumption is not met then certain types of analyses cannot be performed. However, it’s important to remember that while probability sampling is preferred, how you sample your population of interest is dependent on your research question and study design. And, most importantly –yes, pineapple does belong on pizza! ;)
- Australian Bureau of Statistics (2021). Sample Design.
- Health Knowledge (2021). Methods of sampling from a population.