Benchmarking in UX research

Many user researchers, especially those who focus on qualitative methods, are often asked about quantifying the user experience. We are asked to include quantitative data to supplement quotes or video/audio clips. Qualitative-based user researchers, including myself, may look towards surveys to help add that quantitate spice. However, there is much more to quantitative data than mere surveys and, often times, surveys are not the most ideal methodology.

If you are so lucky as to be churning out user research insights that your team is using to iterate and improve your product, you may be asked this (seemingly) simple question: how do you know your insights and recommendations are actually improving the user experience of your product? The first time I was asked that question, I was tongue-tied. “Well, if we are doing what the user is telling us then, of course, it is improving.” Let me tell you, that didn’t cut it.

Quickly thereafter, I learned about benchmark studies. These studies allow you to test how a website or app is progressing over time, and where it falls compared to competitors, an earlier version or industry benchmarks. Benchmarking will allow you to concretely answer that above question.

How do I conduct a benchmarking study?

1. Set up a plan

You have to start with a conversation with your team, in order to agree on the goals and objectives of your benchmarking study. Ideally, you answer the following questions:

First and foremost, can we conduct benchmarking studies on a regular basis? How often are new iterations or versions being released? How many competitors do we want to benchmark against and how often? Do we have the budget to run these tests on an ongoing basis?
What are we trying to learn? How the product is progressing over time? How does it compare to different competitors? Understanding more about a particular flow? This will help you determine if benchmarking is really the right methodology for your goals
What are we actually trying to measure? What parts of the app/website are we looking to measure — particular features or the overall experience? How difficult or easy it is to complete the most important tasks on the website/app?

2. Write an interview script

Once you have your goals and objectives set out, you need to write the script for the interviews. This script will be similar to how you write usability testing scripts, questions need to be focused on the most important and basic tasks on your website/app. There needs to be an action and final goal. For example:

For Amazon, “Find a product you’d like to buy.”
For Wunderlist, “Create a new to-do.”
For World of Warcraft, “Sign up for a free trial.”

As you can notice, these tasks are extremely straightforward. Don’t give any hints or indications on how to actually complete the task. That will completely skew the data. I know it can be hard to watch participants struggle with your product, but that is part of the benchmarking and insights you can bring back. For example:

“Click the plus icon to create a new to-do.” = Bad wording
“Create a new to-do” = Good wording

If you would like to include additional questions in the script, you can use follow-up questions, asking them to rate the difficulty of the task

After you complete the script, and once everyone has been able to input any suggestions or ideas, try to keep it as consistent as possible. It is really difficult to compare data if the interview script changes.

3. Pick your participants

As you are writing and finalizing your script, it is a good idea to begin choosing and recruiting the target participants. Although normal user research studies, such as qualitative interviews or usability testing generally call for fewer participants, it is important to realize we are working with hard numbers and quantitative data. It is a really good idea to set the total number of users to 25 or more. At 25+ users, you can more easily reach a statistical significance and draw more valid conclusions from your data.

Since you will be conducting studies on a regular basis, you don’t have to worry about going to the same group of users over and over again. It would be beneficial to include some previous participants in new studies, but it is fine to supplement that with new participants. The only important note is to be consistent with the types of people you are testing with —did you test with specific users of your product who hold a certain role? Or did you do some guerilla testing with students? Make sure you are testing with those users for the next round.

How often should I be running benchmarking studies?

In order to determine how often you should/can run the benchmarking studies, you have to consider:

What stage is your product at? If you are early in the process and continuously releasing updates/improvements, you will need to run more benchmarking studies. If your product is, further along, you could set the benchmarking to quarterly
What is your budget? If you are testing with around 25 users each time, how many times can you realistically test with your budget?
If you are releasing updates on a more random basis, you could come up with ad-hoc benchmarking studies that correlate to releases — this just might not be the most effective way to show data.

You really want to see progress over time and how your research insights are potentially improving the user experience. Determine with your team and executives the most impactful way to document these patterns and trends. Just make sure you can run more than one study, or the results will be wasted!

What metrics should I be using?

There are many metrics to look at when conducting a benchmark study. As I mentioned, many benchmarking studies will consist of task-like questions, so it is very important to quantify these tasks. Below are some effective and common ways to quantify tasks:

Task Metrics

Task Success: This simple metric tells you if a user was able to complete a given task (0=Fail, 1=Pass). You can get more fancy with this once, and assign more numbers that denote the difficulty users had with the task, but you need to determine the levels with your team prior to the study
Time on Task: This metric measures how long it takes participants to complete or fail a given task. This metric can give you a few different options to report on, where you can give the data on average task completion time, average task failure time or overall average task time (of both completed and failed tasks)
Number of errors: This task gives you the number of errors a user committed while trying to complete a task. This can allow you to also gain insight into common errors users run into while trying to complete the task. If any of your users seem to want to complete a task in a different way, a common trend of errors may occur
Single Ease Question: (SEQ): The SEQ is one question (on a 7-point scale) that measures the participant’s perceived ease of a task. It is asked after every task is completed (or failed)
Subjective Mental Effort Question (SMEQ): The SMEQ allows the user’s to rate how mentally difficult a task was to complete
SUM: This measurement allows you to take completion rates, ease, and time on task and combine it into a single metric to describe the usability and experience of a task
Confidence: Confidence is a 7-point scale that asks users to rate how confident they were that they completed the task successfully.

Using a combination of these metrics can help you highlight high priority problem areas. For example, if participants respond with a high confidence they successfully completed a task, yet the majority are failing, there is a huge discrepancy in how participants are using the product, which can lead to problems.

Questionnaire Metrics

SUS: The SUS has become an industry standard and measures perceived usability of a user experience. Because of its popularity, you can reference published statistics (for example, the average SUS score is 68).
SUPRQ: This questionnaire is ideal for benchmarking a product’s user experience. It allows participants to rate the overall quality of a product’s user experience, based on four factors: usability, trust & credibility, appearance, loyalty
SUPR-Qm: This questionnaire for the mobile app user experience is administered dynamically using an adaptive algorithm.
NPS: The Net Promoter Score is an index ranging from -100 to 100 that measures the willingness of customers to recommend a company’s products or services to others. It is used to gauge the customer’s overall satisfaction with a company’s product or service and the customer’s loyalty to the brand. The NPS can be applied to both consumer-to-business and business-to-business experiences.
Satisfaction: You can ask participants to rate their level of satisfaction with the performance of your product or even your brand, in general. You can also ask about more specific parts of your product to get a more fixed level of satisfaction

What do I compare my product to?

After you have completed your benchmarking study, what should you be comparing those numbers to and what should I aim for in order to improve the product? There are general benchmarks companies should aim for, or should try to exceed. Below are examples from MeasuringU:

Single Ease Question average: ~5.1
Completion rate average: 78%
SUS average: 68
SUPR-Q average: 50%

You can also find averages for your specific industry, either online or through your own benchmarking analysis. For example, this website could show you how your NPS compares to other similar companies.

Finally, you can compare yourself to a competitor and strive to meet or exceed on the key metrics mentioned above. Some companies have their metrics online, so you can access their scores. If not, you will have to conduct your own benchmarking study on the competitors you are looking to compare your product to. When comparing against a competitor, always remember you don’t really know their customer’s satisfaction with their product, so make sure to keep that in mind when drawing conclusions!

Even if you have to start small, benchmarking can grant you a world of insights into a product, and can be a great tool in measuring the success of your user research practice at a company!