We’re Gonna Need a Bigger Sample

Statistics

Eye tracking has characteristics of both quantitative and qualitative research. On one hand, you have access to a large amount of high-precision data, appropriate for detailed statistical analysis of visual behavior. On the other hand, observation of just a few eye tracking video sessions can provide valuable insight into visibility. By no means does this represent a methodological weakness, actually quite the opposite. The dovetailing of quantitative and qualitative research interests in eye tracking is a great strength. It does, however, complicate one aspect of research: SAMPLE SIZE.

Which of these researchers has a better sense of sample size in eye tracking studies – (A) a user experience analyst who plans to test 12 participants on a website or (B) a package designer who plans to test 120 participants viewing a new line of packages? The answer is both…or maybe neither. It’s impossible to say for sure because the appropriate N is dependent on the specific goals of each eye tracking study. When planning your own research, here are a few questions that you might ask yourself to help determine the right sample size:

Do I care about the numbers?
It sounds like an obvious ‘yes,’ but that’s not always the right answer. In some usability studies it just isn’t important to know that users spend 1.4% of their time viewing the contact information or that users average 6.5 seconds in locating a particular button. Many of our clients are only looking for obvious visibility and usability problems, coupled with eye tracking-aided qualitative feedback. They might even be actively changing the test site during data collection to account for what they observe in real time. This type of subjective analysis does not require a large sample. Recruiting, incentivizing and analyzing participants can be expensive. If you are interested in a directional assessment of observable behaviors rather than numerical analysis, you may be able to get by with 10 – 15 participants in your eye tracking study.

Do I care about statistical validity?
A statistically significant result is a wonderful thing. It means that the observed outcome is unlikely to have occurred due to chance. It means that your conclusion – VERSION A is better than VERSION B because it received 18.7% more attention – is not just your conclusion. It’sbacked up by strong statistical theory. When statistical evidence is part of your objective, you’re going to need a larger sample. How large? Well, that depends on a number of factors, including the type of statistical tests that you’ll be running (e.g. Anova, Chi Square, Correlation, etc.) and what you would consider to be a meaningful statistical difference. We typically consult with our statistician before we kick off each study to determine the optimal sample size. The recommended N typically lands somewhere between 13 – 50 participants per cell. And if you’re not sure what I mean by ‘per cell,’ keep reading. Cells are at the core or our next question.

What (if anything) am I comparing?
Let’s say the goal of your research is to compare the fixations of professional and amateur pilots in a flight simulator. This very basic study design would have two cells corresponding to the two comparison groups – pros and amateurs. What if you decided to consider gender differences as well? Okay, now you have four cells (pro-men, pro-women, amateur-men and amateur-women). As you add more features to your study you probably need to include additional cells, which means you need to test more people. As you might imagine, the sample size calculation can get complicated, and again, sometimes it’s best to consult a statistician. Anyway, here’s a takeaway for all of us non-statisticians: adding statistical comparisons (i.e. either subject groups or mutually-exclusive test conditions) to your analysis plan usually means adding cells (of 13 – 50 participants) to your study design. There are exceptions, but that’s a useful rule of thumb.

Note: I should address one other sample size question that we often hear. How many participants do I need to get a valid heat map? We’re not sure that’s really the right question to be asking. A heat map is a graphic representation of data points, not a quantifiable gauge of visual behavior. Asking how many participants make a ‘valid’ heat map is a little like asking how many brush strokes make a ‘valid’ painting. It’s not really a calculable thing. Furthermore, there are factors aside from sample size that impact the drawing of the heat map (e.g. time on page, spacing of components, page size, presence/absence of images, etc.). We’ve said it before in this blog: a heat map is a useful illustration, but it’s just a small piece of the analysis puzzle. Our recommendation is that your sample size should be selected to meet study objectives, not to produce attractive visualizations.

Featured image from Unsplash.