How to use this tool

Examples of applying this tool
Research validity
- Assumptions for how the research is conducted
- Preparing a representative sample

Examples of applying this tool

Usability test

Qualitative research frequently uses small-n data. The resulting insights paint with broad brushstrokes, but often does not have granularity because of the sample size. This does not challenge the validity of the results; it simply means that it detects patterns that happen at scale.

A product development team runs a usability test. The person doing research identifies that 3 out of 7 participants experience a specific usability issue, and the team is deciding whether this issue is prevalent enough to address.

The team plots the data, and determines there is a 90% left-tailed likelihood that at least 24% of users in the real world would experience this issue. (It is unlikely that exactly 42.9%, or 3/7 would be the exact real-world rate the issue would be encountered.) Therefore, they decide this is a large enough threshold to make design and engineering changes to in order to rectify the problem.

Beta PDF plot with a 90% left tail distribution shaded in for 3 of 7 participants

Small-n survey results

Imagine a survey receives 20 responses. Now, it was a typical survey that got a 1% response rate from the 2,000 people contacted to complete it.

In this survey, 12 of 20 people said they have a certain opinion that the business cares about. Using the tool, the team determines that there is an 80% likelihood that 46-72% of the full audience holds this opinion, and thus prioritizes relevant work.

Beta PDF plot with a two tail distribution shaded in for 12 out of 20 participants

However, in this same survey there is another area where only 3 out or 20 people had a certain opinion; this has a range of 9-29%, and is therefore deferred until further notice.

Beta PDF plot with a two tail distribution shaded in for 3 out of 20 participants

When to use caution: catastrophic failure with low prevalence

A researcher observes that 1 out of 10 participants in a study was unable to check out their cart, resulting in potential lost revenue. They plot these results, and find that there is a 90% likelihood that 4.9% or more users in the real world would experience this failure.

This is a moment to not rely on this tool. Despite this small percentage, the team decides to fix the issue given the crucial importance to the business.

Examples of catastrophic failures that should be addressed, as long as they are observed even at low rates:

Inability to create an account
Inability to search or filter, for a product reliant on finding content or items
Inability to buy a product or service

Research validity

Assumptions for how the research is conducted

The plotted charts are dependent on the research being conducted in an unbiased, non-leading manner.
The data must be collected from a representative sample of participants.

Preparing a representative sample

Imagine there is a jar of 1,000 gumballs, which are a mix of red and blue. The red gumballs represent the number of people in the “real world” who are experiencing a usability issue; the blue gumballs are people who do not experience this issue.

In reality, there are 465 red gumballs, or 46.5% of people who will encounter this usability problem.

However, it's not a reasonable use of time to recruit all 1,000 people to understand what problems to solve. In fact, 100 or even 20 would be a significant use of time to understand what the issues are, and how to solve them.

So, the team has someone conduct a usability study with 8 users. Now, there is no way to get to the “true” number of 46.5% -- the closest the person doing the research could do is 3/8 (37.5%) or 4/8 (50%). But that's not the point; what they need is a rough estimate of how many people have this issue out in the real world, and why this website uses Bayesian statistics to estimate the range of probabilities rather than focusing on getting the exact right percentage.

This is where a representative sample comes in: let's say this is an issue that new users encounter more, whereas power users are less likely to have the same problems. If recruitment targets the power users, they might have 7 “blue gumball” participants and only 1 “red gumball” participants, which would indicate that this is a smaller problem than in reality.

Ensuring that you recruit the appropriate participants is key to making sure that the data reflects the reality you are estimating.

Why use a Bayesian approach for small-n qualitative research?

Qualitative research frequently uses small-n data. The resulting insights paint with broad brushstrokes, but often do not have granularity because of the small sample size. This does not challenge the validity of the results; it simply means that it detects patterns that happen at scale.

However, there can be use in translating that small-n data into probabilities, which give us a range of expected “real world” values. Imagine we are building a product with 100,000 monthly active users (MAUs); if there is a strong chance that portion of users -- say, 15% -- experience an issue, even if we don’t know the “true” value, then we can decide to prioritize and fix it knowing that it was a productive use of the team’s time.

This website is intended to give credible intervals for an observed phenomenon for fewer participants.