Measuring Expertise – A New Era in Training


In today’s world, highly skilled jobs are becoming more demanding as personnel are expected to perform critical tasks using highly complex systems in difficult environments. Surgeons and pilots are only two examples of individuals facing such environments. Pressures on the medical and aviation industries are enormous. In the medical realm, our aging population means more medical conditions to treat and thus an ever-increasing demand for skilled medical personnel. In the aviation arena, more people are travelling to more places and thus adding further strain on the aviation industry to supply evermore pilots to meet the demands both now and into the future.

Despite these two examples being seemingly quite different, they face a common basic problem—namely time-efficient and resource-efficient training. The quality of training needs to be maintained, and ideally even improved, as time goes on. Today, training is conducted in both of these fields with a mixture of time in the classroom, time using simulators (computer based or otherwise) and ultimately time doing the actual job in a highly supervised environment. Typically, trainees must complete successfully a set of defined “tasks” in order to exit the training and move towards the appropriate form of certification.

The problem with the current approach (as many trainers in these fields will tell you) is that the candidates who complete these training programs are highly varied and don’t necessarily have comparable levels of proficiency. I was shocked myself to be shown by one trainer (in charge of training surgeons in a particular technique) a list of surgical residents who would be graduating that year. Of that list, he commented, “I would only allow two of them to touch me”. The ultimate decision as to whether a person qualifies comes from a combination of checkboxes that shows a trainee completed the perquisite tasks and possibly a subjective determination from one or more supervisors that the candidate is ready. The latter may be hard to hold back if the trainee did in fact complete the mandated tasks.

EyeTracking has now begun working with a number of entities to bring its patented technology to help mitigate this problem. The training community has long searched for an objective measure of expertise or competence in order to make better and more consistent determinations as to when a trainee is proficient. Some organizations already use eye movement data to understand where the trainees look as they perform their training scenarios. Eye movement information is an important and valuable asset, especially in an aircraft where it is important for pilots to maintain specific scan patterns, ensuring that a pilot continually views specific instruments and readouts in a pre-defined order and time period.

Now, instructors understand that scan behavior can be trained and measured. What the instructors don’t know, however, is how hard each task is for each trainee. Ideally, when a trainee passes a task, he or she does so using a reasonable level of mental effort. The worrisome situation is the trainee who passes but is at or near his upper limit of manageable cognitive effort and thus is on the verge of making serious mistakes because of high cognitive workload. Looking at the scan pattern “trace” alone fails to identify when the workload of the operator began to climb or became too high. Perhaps it had already been elevated for the last ten minutes, and ultimately mental fatigue is what led to the pilot’s error. Simply put, eye movement data alone usually only pinpoints the ultimate point of failure, which in many situations is too late. Our goal should be to fix the problem when it starts and not let it escalate into a catastrophe.

Let’s dig deeper into this previous thought for a moment, as I think this example underlines many different training and certification issues of today. While we can train a person to perform a set of actions, we can’t actually know how hard that action is for the person to complete. I am told that many of the jobs we are discussing are highly competitive, so simply asking the trainees how hard they find a training task to be is likely not going to get you an answer other than “no problem” or “fine”. The upside for attaining certification in these jobs include a highly competitive salary, or the chance to realize a lifetime’s ambition, or both. For military pilots, as an example, it is a personality trait that trainee pilots often do not want to show any weakness to their superiors, and / or peers, something that instructors try hard to train out of their trainees.

EyeTracking’s revolutionary technology for measuring level of mental effort goes to the heart of this issue. It can be easily integrated into a wide range of training environments today—including medical, aircraft and automotive simulators—to enable instructors to gain a purely objective level of understanding about their trainees’ mental effort. Using eye tracking cameras (either worn as glasses, or unobtrusively mounted onto a desk, console, dashboard or cockpit) we monitor small changes in pupil diameter to provide a measure of cognitive workload. As we are using eye tracking, we of course can also tell where a person is looking. So, now we know where a person is looking and how hard (or not) the brain is working. We now know whether the trainees are “spacing out” as they look at a display or whether they are mentally engaged. We know, when each person’s workload elevates or drops outside of its norm. We know if the person has an elevated workload for a given scenario compared to other trainees as well. So even if that person successfully completes a given scenario, the instructor may wish to concentrate on retraining for that scenario if the instructor feels that this trainee was working much harder than other trainees to complete it. Why is this important? Imagine if a pilot successfully completes an instrument only nighttime approach and landing scenario but in actuality had such high cognitive workload that he nearly did not complete the task. What if the pilot is ultimately certified and finds himself in a similar situation but now additional stressors are introduced, such as the co-pilot is unconscious or an engine fails or some other unforeseen problem occurs? It is likely that the pilot with the high workload in training would not be able to cope with the additional demands as easily as another pilot with more cognitive capacity.

In today’s training programs, many trainees who are not yet fully trained may well get the OK to proceed through their program. If we could know which ones are having cognitive difficulty, the instructors of tomorrow can adapt and tailor training to the pilots and surgeons under their supervision to ensure a higher level of safety and success.

Reach out to us today to see how you can use our technology in your training program. Take advantage of the latest developments in training and move your training environment to the next level. Email us at

Patent Notice: EyeTracking, Inc.’s Cognitive Workload, Cognitive State and Level of Proficiency technologies are protected by Patents: US 7,344,251, US 7,438,418 and US 6,090,051 and all International Counterparts.

Featured image from Unsplash.

We’re Gonna Need a Bigger Sample


Eye tracking has characteristics of both quantitative and qualitative research. On one hand, you have access to a large amount of high-precision data, appropriate for detailed statistical analysis of visual behavior. On the other hand, observation of just a few eye tracking video sessions can provide valuable insight into visibility. By no means does this represent a methodological weakness, actually quite the opposite. The dovetailing of quantitative and qualitative research interests in eye tracking is a great strength. It does, however, complicate one aspect of research: SAMPLE SIZE.

Which of these researchers has a better sense of sample size in eye tracking studies – (A) a user experience analyst who plans to test 12 participants on a website or (B) a package designer who plans to test 120 participants viewing a new line of packages? The answer is both…or maybe neither. It’s impossible to say for sure because the appropriate N is dependent on the specific goals of each eye tracking study. When planning your own research, here are a few questions that you might ask yourself to help determine the right sample size:

Do I care about the numbers?
It sounds like an obvious ‘yes,’ but that’s not always the right answer. In some usability studies it just isn’t important to know that users spend 1.4% of their time viewing the contact information or that users average 6.5 seconds in locating a particular button. Many of our clients are only looking for obvious visibility and usability problems, coupled with eye tracking-aided qualitative feedback. They might even be actively changing the test site during data collection to account for what they observe in real time. This type of subjective analysis does not require a large sample. Recruiting, incentivizing and analyzing participants can be expensive. If you are interested in a directional assessment of observable behaviors rather than numerical analysis, you may be able to get by with 10 – 15 participants in your eye tracking study.

Do I care about statistical validity?
A statistically significant result is a wonderful thing. It means that the observed outcome is unlikely to have occurred due to chance. It means that your conclusion – VERSION A is better than VERSION B because it received 18.7% more attention – is not just your conclusion. It’sbacked up by strong statistical theory. When statistical evidence is part of your objective, you’re going to need a larger sample. How large? Well, that depends on a number of factors, including the type of statistical tests that you’ll be running (e.g. Anova, Chi Square, Correlation, etc.) and what you would consider to be a meaningful statistical difference. We typically consult with our statistician before we kick off each study to determine the optimal sample size. The recommended N typically lands somewhere between 13 – 50 participants per cell. And if you’re not sure what I mean by ‘per cell,’ keep reading. Cells are at the core or our next question.

What (if anything) am I comparing?
Let’s say the goal of your research is to compare the fixations of professional and amateur pilots in a flight simulator. This very basic study design would have two cells corresponding to the two comparison groups – pros and amateurs. What if you decided to consider gender differences as well? Okay, now you have four cells (pro-men, pro-women, amateur-men and amateur-women). As you add more features to your study you probably need to include additional cells, which means you need to test more people. As you might imagine, the sample size calculation can get complicated, and again, sometimes it’s best to consult a statistician. Anyway, here’s a takeaway for all of us non-statisticians: adding statistical comparisons (i.e. either subject groups or mutually-exclusive test conditions) to your analysis plan usually means adding cells (of 13 – 50 participants) to your study design. There are exceptions, but that’s a useful rule of thumb.

Note: I should address one other sample size question that we often hear. How many participants do I need to get a valid heat map? We’re not sure that’s really the right question to be asking. A heat map is a graphic representation of data points, not a quantifiable gauge of visual behavior. Asking how many participants make a ‘valid’ heat map is a little like asking how many brush strokes make a ‘valid’ painting. It’s not really a calculable thing. Furthermore, there are factors aside from sample size that impact the drawing of the heat map (e.g. time on page, spacing of components, page size, presence/absence of images, etc.). We’ve said it before in this blog: a heat map is a useful illustration, but it’s just a small piece of the analysis puzzle. Our recommendation is that your sample size should be selected to meet study objectives, not to produce attractive visualizations.

Featured image from Unsplash.

Literature Review: A Decade of the Index of Cognitive Activity

Cognitive Workload Module

In 2002, Dr. Sandra Marshall presented a landmark paper at the IEEE 7th Conference on Human Factors and Power Plants, introducing the Index of Cognitive Activity (ICA). This innovative technique “provides an objective psychophysiological measurement of cognitive workload” from pupil-based eye tracking data. In the decade since this conference, the ICA has been used by eye tracking researchers all over the world in a wide variety of contexts.

In this installment of the EyeTracking blog, we’ll take a look at some of the most interesting applications of the ICA. There are many to choose from, but here are a few of the greatest hits…

The ICA in Automotive Research

Understanding the workload of drivers is central to automotive design and regulation. Schwalm et al. collected ICA data during a driving simulation including lane changes and secondary tasks. Analyses of workload for the entire task and on a second-by-second basis indicated that the ICA (a) responded appropriately to changes in task demands, (b) correlated well with task success and self-reported workload and (c) identified shifts in participant strategy throughout the task. The researchers conclude that the ICA could be a valuable instrument in driver safety applications including learning, skill acquisition, drug effects and aging effects.

The ICA in Surgical Skill Assessment

Currently, surgical skill assessments rely heavily on subjective measures, which are susceptible to multiple biases. Richstone et al. investigated the use of the ICA and other eye metrics as an objective tool for assessing skill among laparoscopic surgeons. In this study, a sample of surgeons participated in live and simulated surgeries. Non-linear neural network analysis with the ICA and other eye metrics as inputs was able to classify expert and non-expert surgeons with greater than 90% accuracy. This application of the ICA may play an integral role in future documentation of skill throughout surgical training and provide meaningful metrics for surgeon credentialing.

The ICA in Military Team Environments

Many activities require teams of individuals to work together productively over a sustained period of time. Dr. Sandra Marshall describes a networked system for evaluating cognitive workload and/or fatigue of team members as they perform a task. The research was conducted at the Naval Postgraduate School in Monterey, CA under the Adaptive Architectures for Command and Control (A2C2) Research Program sponsored by the Office of Naval Research. Results demonstrated the viability of the ICA as a real-time monitor of team workload. This data can be examined by a supervisor or input directly into the operating system to manage unacceptable levels of workload in individual team members.

The ICA Across Eye Tracking Hardware Systems

Different research scenarios demand different eye tracking equipment. Because the ICA is utilized in so many disparate fields of study, it is important to validate this metric across different hardware systems. Bartels & Marshall evaluated four eye trackers (SMI’s Red 250, SR Research’s EyeLink II, Tobii’s TX 300 and Seeing Machines’ faceLAB 5) to determine the extent to which manufacturer, system type (head-mounted vs. remote) and sampling rate (60 Hz vs. 250 Hz) affected the recording of cognitive workload data. Each of the four systems successfully captured the ICA during a workload-inducing task. These results demonstrate the robustness of the ICA as a valid workload measure that can be applied in almost any eye tracking context.

The Index of Cognitive Activity is offered as part of EyeTracking, Inc.’s research services. It is also available through the EyeWorks Cognitive Workload Module.


Richstone, L., Schwartz, M., Seideman, C., Cadeddu, J., Marshall, S., & Kavoussi, L. (2010). Eye metrics as an objective assessment of surgical skill. Annals of Surgery. Jul; 252 (1): 177-82.

Marshall, S. (2009). What the eyes reveal: Measuring the cognitive workload of teams. In Proceedings of the 13th International Conference on Human-Computer Interaction, San Diego, CA July 2009.

Schwalm, M., Keinath A. & Zimmer, H. (2008). Pupillometry as a Method for Measuring Mental Workload within a Simulated Driving Task. In Human Factors for Assistance and Automation. Shaker Publishing, 75–87.

Bartels, M. & Marshall, S. (2012). Measuring Cognitive Workload Across Different Eye Tracking Hardware Platforms. Paper presented at 2012 Eye Tracking Research and Applications Symposium, Santa Barbara, CA March 2012

Patent Notice:

Methods, processes and technology in this document are protected by patents, including US Patent Nos.: 6,090,051, 7,344,251, 7,438,418 and 6,572,562 and all corresponding foreign counterparts.

The Danger of Safety

Driving in snow

The semiautonomous vehicle is the future of the automotive industry. Innovations such as forward collision avoidance radar and lane departure warning systems are evidence of a clear trend – little by little, demands on the driver are being shifted to the car. It’s easy to see how these and other safety advances could make our roadways less dangerous. After all, the vast majority of traffic accidents are the result of human error. Any technology that can take a bit of responsibility away from the guy fiddling with the radio and playing Angry Birds while traveling 70 MPH down the freeway is welcome.

But let’s not forget the ‘semi’ in semiautonomous. A recent feature in Wired Magazine explains the risks inherent in the automation of certain aspects of the driving experience. While computerized assistance can improve safety in dealing with stressful situations, it may actually have an opposite effect in less taxing ones. The deciding factor is cognitive load. Until vehicles reach the point of being fully autonomous, the driver must remain mentally engaged at all times. That isn’t a problem when navigating the gridlock of downtown at rush hour (i.e. high cognitive load), but consider the open road at its most hypnotic – a long straight featureless desert highway late at night. It can get quite boring. You might flip on the cruise control. You might activate voice navigation to let you know when to exit. Such actions reduce the cognitive load of a task that is already, perhaps, too low. The potential consequences include decreased situational awareness and increased reaction time. This can be a dangerous combination as you speed toward that stalled truck in your lane a few miles ahead.

So it seems that a safeguard is required to ensure that our safety features do indeed keep us safe. More specifically, the semiautonomous vehicle needs a means of monitoring the mental state of the driver, a way to determine whether or not he or she is sufficiently engaged in steering, braking, accelerating, etc. There are several ways to measure task-based cognitive workload. They run the gamut from paper-and-pencil subjective ratings (e.g. the NASA-TLX) to complex objective readings of brain activity (e.g. EEG). Obviously, you aren’t going to ask people to fill out a questionnaire or wear a network of electrodes every time they take a trip to the supermarket. The goal is to make driving safer without adding further complications. If we want to monitor workload in a real world driving scenario, we’re going to need something a bit more subtle.

EyeTracking, Inc. has a solution. The Index of Cognitive Activity (ICA) is an objective, unobtrusive means of measuring cognitive workload. Instead of relying on driver feedback or direct physiological sensors, the ICA algorithm analyzes fluctuations in pupil size while minimizing light effects. Best of all, this patented metric relies on a tool that will most likely be available in tomorrow’s cars anyway – eye tracking. The benefits of monitoring not only point of gaze, but also workload are undeniable. In this model of ICA-enhanced eye tracking, your car will be able to address four critical driving questions: (1) are your eyes are opened? (2) are your eyes focused on the road? (3) are you cognitively overwhelmed and (4) are you cognitively underwhelmed? This information can be used in real-time to alert you to the greatest hazard out there – your own visual and mental behavior.

Several major automakers have discovered this valuable metric and put it to use in their testing labs. For example, the BMW group conducts groundbreaking research using the ICA to evaluate cognitive workload during critical driving events (Schwalm, 2008). For another automaker, the ICA has been employed to examine the differences between professional racers and normal drivers. These and other applications represent key steps toward integration of a cognitive workload gauge into the next generation of automobiles. Additional R&D is required, but hopefully a new breed of semi-autonomous vehicles, capable of evaluating the mental state of the driver, is just a bit further down the road.

Featured image from Unsplash.

Monitoring Wakefulness of Air Traffic Controllers

Air traffic

At midnight on Wednesday March 23rd two commercial airplanes approaching Ronald Reagan Airport in Washington D.C. requested permission to land. The tower responded with only silence. After repeated attempts at communication, both pilots were forced to navigate their descent through the darkness without the assistance of Air traffic Control.  The landings were successful and no one was injured, but when it was revealed that the controller on duty was asleep at his post, the story captured national attention.

Fatigue is unavoidable for the air traffic controller. The combination of long hours, monotonous tasks and high stress will eventually lead to physical and mental exhaustion, no matter how many cups of coffee are consumed. The event described above is just one of five cases reported in the past month. This is not a pleasant thought for the frequent flyers among us.  It means that at any given time as we hurtle through the atmosphere in a combustible tube traveling 500 miles per hour suspended 30,000 feet above the earth, the person charged with guiding us safely to the ground might be fighting that pesky recurrent nod of the head that we have all experienced during one workday or another (hopefully in lower leverage situations). To say the least, this prospect raises concerns.

New government regulations have already been put in place to increase staff and decrease hours, but technology may offer a more proactive solution. The application of eye tracking to aviation and transportation security is not new. Over the past decade we have conducted research with FAA, TSA, ONR and NASA to examine the visual behavior and cognitive state of system operators. It’s easy to see how this technology might be applied to our current situation with ATC. The challenge, after all, is making sure that the controller’s eyes are opened and pointed at the screen. What better method for achieving this than eye tracking? It’s the most objective and reliable tool available for ensuring that attention remains focused during critical aviation events.

And while you’re at it, you might as well get the most out of this technology. In addition to detecting when the eyes are opened and directed at the screen, eye tracking can determine whether or not a person is looking at the appropriate SECTION of the screen. Such data could be used in real time to alert the controller of an unnoticed situation before it becomes a crisis. Another applicable component of eye tracking is the detection of cognitive state. Fatigue, boredom, and mental overload each leave a unique signature upon the eye. By examining fluctuations in pupil size (using the Index of Cognitive Activity) along with eye movements, blinks and divergence, we are able to determine whether or not a person is cognitively impaired. In the case of ATC, this information could be used to alert the supervisor when a given controller is too tired or stressed and needs to take a break.  

Putting more controllers in the tower for shorter periods of time is certainly a step in the right direction. However, the use of eye tracking in air traffic control would provide an additional safeguard, one that most air travellers would be delighted to know is in place.