System usability scale (SUS): Complete scoring & interpretation guide

The System Usability Scale (SUS) is a 10-question survey that measures how usable people find a product. It was created by John Brooke in 1986 and has become the industry standard for measuring perceived usability.

Each question uses a 5-point scale from “Strongly Disagree” to “Strongly Agree.” The scoring converts responses into a number from 0 to 100, though it’s not a percentage.

Research teams use SUS because it’s:

Quick - takes users 2-3 minutes to complete
Free - no licensing fees or usage restrictions
Reliable - produces consistent results across studies
Validated - decades of research confirming it works
Comparable - you can benchmark against other products

Notion runs SUS surveys after major feature releases to measure whether usability improved. They track scores over time, expecting increases when they’ve successfully made things easier to use.

The 10 SUS questions

The questionnaire uses specific wording that’s been validated over decades. Don’t change the wording or you’ll invalidate comparisons to benchmark data.

The 10 standard questions:

I think that I would like to use this system frequently.
I found the system unnecessarily complex.
I thought the system was easy to use.
I think that I would need the support of a technical person to be able to use this system.
I found the various functions in this system were well integrated.
I thought there was too much inconsistency in this system.
I would imagine that most people would learn to use this system very quickly.
I found the system very cumbersome to use.
I felt very confident using the system.
I needed to learn a lot of things before I could get going with this system.

Notice the questions alternate between positive and negative statements. This is intentional - it prevents people from just checking the same answer down the column without thinking.

How to calculate SUS scores

The scoring is a bit weird but there’s a reason for it. Here’s the step-by-step process.

Step 1: Convert responses to numbers

For each question, responses map to numbers:

Strongly Disagree = 1
Disagree = 2
Neutral = 3
Agree = 4
Strongly Agree = 5

Step 2: Calculate contribution for each question

This is where it gets funky. Odd-numbered and even-numbered questions score differently.

For odd-numbered questions (1, 3, 5, 7, 9): Subtract 1 from the user’s response.

If they responded 5 (Strongly Agree), contribution = 5 - 1 = 4
If they responded 3 (Neutral), contribution = 3 - 1 = 2
If they responded 1 (Strongly Disagree), contribution = 1 - 1 = 0

For even-numbered questions (2, 4, 6, 8, 10): Subtract the user’s response from 5.

If they responded 5 (Strongly Agree), contribution = 5 - 5 = 0
If they responded 3 (Neutral), contribution = 5 - 3 = 2
If they responded 1 (Strongly Disagree), contribution = 5 - 1 = 4

Why the difference? Even-numbered questions are negative statements. Someone strongly agreeing that your system is “unnecessarily complex” is bad. The scoring flips these so higher numbers always mean better usability.

Step 3: Sum contributions

Add up the contributions from all 10 questions. The range will be 0 to 40.

Step 4: Multiply by 2.5

Take your sum and multiply by 2.5. This converts the 0-40 range to 0-100.

Example calculation:

User responses:

Agree (4)
Disagree (2)
Strongly Agree (5)
Strongly Disagree (1)
Agree (4)
Disagree (2)
Agree (4)
Disagree (2)
Agree (4)
Strongly Disagree (1)

Contributions:

4 - 1 = 3
5 - 2 = 3
5 - 1 = 4
5 - 1 = 4
4 - 1 = 3
5 - 2 = 3
4 - 1 = 3
5 - 2 = 3
4 - 1 = 3
5 - 1 = 4

Sum = 33

SUS Score = 33 × 2.5 = 82.5

This user gave your product a SUS score of 82.5, which is quite good.

Interpreting SUS scores

A SUS score of 82.5 is “quite good” but what does that actually mean? How do you interpret scores?

The average SUS score is 68

Research analyzing hundreds of SUS studies found the average score is 68. This becomes your baseline.

Above 68: Better than average usability
Below 68: Worse than average usability

But “average” doesn’t mean “acceptable.” It just means middle of the pack.

Score ranges and grades

Jeff Sauro’s research mapped SUS scores to letter grades based on percentile rankings:

80-100: Grade A (Excellent usability)
68-79: Grade B (Good usability)
51-67: Grade C (Okay usability)
26-50: Grade D (Poor usability)
0-25: Grade F (Awful usability)

Figma’s mobile app launched with a SUS score of 73 (grade B). After six months of improvements informed by user research, they increased it to 81 (grade A).

The acceptability ranges

Another interpretation framework divides scores into acceptability ranges:

Above 71.4: Acceptable
51-71: Marginal (needs improvement)
Below 51: Not acceptable

These thresholds come from mapping SUS scores to other usability measures and user satisfaction data.

What individual scores mean

90-100: Exceptional. Users love your product’s usability. Superhuman consistently scores in this range.

80-89: Excellent. Users find it easy to use with minimal frustration. Linear and Notion typically score here.

70-79: Good. Usable but room for improvement. Most successful products fall in this range.

60-69: Okay. Users can accomplish tasks but experience friction. Many enterprise tools score here.

50-59: Poor. Users struggle. Significant usability improvements needed.

Below 50: Terrible. Major problems preventing users from accomplishing basic tasks.

Sample size considerations

How many responses do you need for reliable SUS scores?

Minimum: 12-15 responses for a single product evaluation. Fewer than this and your score could vary widely.

Recommended: 20-30 responses for more confidence in your results.

For comparisons: 30+ per condition when comparing two designs or products against each other.

Calendly runs SUS surveys with 30-40 responses per major feature. This gives them confidence that score differences reflect real usability changes, not random variation.

Small sample sizes aren’t useless, just less reliable. A score of 85 from 10 people gives you a general sense but don’t make major decisions based on it.

When to use SUS

SUS fits specific research situations better than others.

Comparing design alternatives

Testing two different designs? Have users try both and complete SUS for each. Significant score differences indicate real usability differences.

Miro tested two navigation approaches. Design A scored 71, Design B scored 79. The 8-point difference suggested Design B was meaningfully more usable. They validated with deeper usability testing and shipped Design B.

Tracking improvements over time

Run SUS surveys regularly (quarterly or after major releases) to track whether usability is improving.

Airtable tracks SUS scores quarterly. When scores drop, they investigate what changed and conduct targeted usability research to identify specific problems.

Benchmarking against competitors

Have users evaluate your product and competitors. SUS scores help quantify perceived usability differences.

Just make sure participants have actually used the products they’re rating. Don’t ask them to rate based on quick demos.

Quick usability pulse checks

Need a fast read on perceived usability? SUS takes 2-3 minutes and gives you a number you can track.

It’s not comprehensive usability testing. It’s a quick measure of user perception.

Setting improvement goals

“Improve usability” is vague. “Increase SUS score from 68 to 75 by Q3” is measurable.

Linear sets SUS score targets for major features. If a feature launches below 70, they prioritize usability improvements until it reaches acceptable levels.

When NOT to use SUS

SUS isn’t appropriate for every situation.

When you need diagnostic information

SUS tells you there’s a problem (low score) but not what the problem is. For identifying specific usability issues, use usability testing, heuristic evaluation, or analytics.

Think of SUS like a thermometer. It tells you someone has a fever but not why. You need other methods to diagnose the illness.

For very simple products

A single-purpose tool with minimal interface might get high scores from everyone, making SUS less useful for tracking improvements.

Immediately after first use

Users need enough experience to form opinions about usability. Having someone try your product for five minutes then complete SUS produces unreliable results.

Wait until users have accomplished at least 2-3 real tasks before asking for SUS ratings.

For individual features

SUS measures overall product usability. Don’t use it for specific features in isolation. Use feature-specific satisfaction questions instead.

Notion doesn’t run SUS for individual features like databases or templates. They run it for the overall Notion experience.

Best practices for administering SUS

How you administer SUS affects result quality.

Use the exact wording

Don’t paraphrase or “improve” the questions. The weird formal language (“I would imagine that most people…”) has been validated. Changes invalidate benchmarking.

Replace “system” with your product name: “I think that I would like to use Notion frequently” instead of “I think that I would like to use this system frequently.”

Timing matters

Administer SUS after users have completed meaningful tasks with your product. Not after a demo, after actual usage.

Dropbox waits until users have used a feature for at least a week before sending SUS surveys. Initial reactions differ from informed opinions.

Context is everything

Include a brief introduction explaining what users should think about when answering.

“Please rate your experience using [Product Name] for [specific use case/tasks]. Think about your overall experience over the past [timeframe].”

This focuses responses on relevant usage rather than their entire history with your product.

Don’t mix with other questions

Put SUS questions together without other questions in between. The 10 questions should flow as a unit.

You can ask other questions before or after, just don’t interrupt the SUS questionnaire itself.

Use appropriate scales

Present the 5-point scale consistently:

Strongly Disagree
Disagree
Neutral
Agree
Strongly Agree

Don’t use different labels or numbers of points. Stick with the standard.

Make it easy to complete

Use radio buttons, not dropdowns. Users should see all options at once for easy selection.

Ensure the survey is mobile-friendly. Many users will complete it on phones.

Webflow’s SUS surveys are optimized for mobile since many designers switch between desktop and mobile contexts.

Analyzing and reporting SUS data

You’ve collected responses. Now what?

Calculate scores for each participant

First, calculate individual SUS scores following the scoring method. Don’t average the raw question responses.

Calculate the mean score

Average all individual SUS scores. This is your reported SUS score.

With 25 participants scoring: 82, 79, 91, 73, 68, 85, 77, 82, 88, 71, 76, 84, 69, 92, 78, 81, 75, 87, 74, 83, 79, 86, 72, 80, 85

Mean SUS = (sum of all scores) / 25 = 79.8

Report with confidence intervals

With smaller samples, include confidence intervals showing the range where the true score likely falls.

“SUS score: 79.8 (95% CI: 76.2-83.4)” tells stakeholders you’re reasonably confident the true score falls between 76 and 83.

Compare carefully

When comparing scores, differences of 5 points or less might not be meaningful, especially with small samples.

A change from 72 to 75 isn’t necessarily real improvement. It could be sampling variation. A change from 72 to 82 is likely meaningful.

Visualize trends

When tracking over time, plot SUS scores showing trends. This makes progress (or regression) obvious.

Amplitude tracks SUS quarterly and plots scores with trend lines. They can see whether usability is improving, stable, or degrading over time.

Combine with qualitative insights

SUS tells you the what (usability score). Combine it with usability testing or user interviews to understand the why.

“Our SUS score dropped from 78 to 71 last quarter. Follow-up research revealed the new navigation confused users.”

Common SUS mistakes

Changing the questions

“I found the system unnecessarily complex” sounds awkward. Teams often want to “improve” it to “The system is too complex” or “The system is overly complicated.”

Don’t. The awkward wording is validated and tested. Changes invalidate your results and prevent benchmarking.

Using it too early

SUS after five minutes with a product produces meaningless scores. Users need enough experience to form real opinions.

Over-interpreting small differences

A 2-point SUS difference between Design A (73) and Design B (75) doesn’t mean much, especially with small samples.

Look for differences of 5-8 points or more before considering them meaningful.

Treating it like a percentage

A SUS score of 68 doesn’t mean 68% of users found it usable or that users are 68% satisfied. It’s a scaled score where 68 happens to be average.

Using it alone

SUS is one measure. It shouldn’t be your only usability metric. Combine it with task completion rates, error rates, time on task, and qualitative feedback.

Not collecting enough responses

Five responses give you a number but not much confidence. Aim for at least 15, preferably 20-30.

SUS variations and alternatives

Several variations exist for specific contexts.

Positive SUS (PSUS)

Some research suggests all positive wording might be clearer. The Positive SUS rephrases all questions positively.

Original: “I found the system unnecessarily complex” PSUS: “I found the system to be simple”

The scoring changes accordingly. PSUS isn’t as widely used, making benchmarking harder.

SUS for websites

The original SUS says “system” which works for software but sounds odd for websites. Substituting “website” is common and acceptable.

Alternatives to SUS

**UMUX (**Usability Metric for User Experience): A 4-question alternative that’s faster but less thoroughly validated.

SUPR-Q: Specifically designed for websites, measuring usability, credibility, loyalty, and appearance.

NASA-TLX: Measures task workload rather than usability, useful for complex interfaces.

Most research teams stick with SUS because it’s well-validated and allows benchmarking.

Using SUS effectively in your research practice

Start with baselines

Before making changes, establish your current SUS score. You need a baseline to measure improvement against.

Linear measured SUS for their issue creation flow before redesigning it. The baseline of 69 gave them a clear target: get above 75.

Set improvement targets

Based on your baseline, set realistic goals.

If you’re at 62, shooting for 85 might be unrealistic in one release. Target 68-70 first, then 75+, then 80+.

Run SUS consistently

Use the same methodology each time. Same introduction, same timing after usage, same delivery method.

Inconsistency makes it hard to know if score changes reflect real usability improvements or methodology differences.

Supplement with other research

When SUS scores are low, conduct usability testing to identify specific problems. When scores are high, interview users to understand what works well.

SUS tells you if there’s a problem. Other methods tell you what the problem is and how to fix it.

Communicate scores in context

“Our SUS score is 72” means nothing to most stakeholders. Frame it:

“Our SUS score is 72, which puts us in the ‘Good’ range (B grade). This is above average (68) but below our competitors who average 78.”

Context makes the number meaningful.

Getting started with SUS

If you’ve never used SUS:

Step 1: Pick a product or feature to measure. Start with something complete that users have experience with.

Step 2: Set up a survey using the exact SUS questions. Tools like Typeform, Google Forms, or Qualtrics work fine.

Step 3: Recruit 20-30 participants who’ve used the product meaningfully (at least a few times, ideally for a week or more).

Step 4: Have them complete the survey after a typical usage session.

Step 5: Calculate scores and compare to benchmarks.

Step 6: Share results with your team and identify whether deeper research is needed.

Notion started using SUS this way, measuring their overall product first. They got a baseline of 74. This established where they stood and gave them a number to beat.

The real value of SUS

SUS isn’t magic. It’s just one tool in your research toolkit.

Its value comes from being quick, standardized, and comparable. You can track changes over time, benchmark against competitors, and get a usability pulse without major research investments.

But it doesn’t replace comprehensive usability testing, user interviews, or analytics. It complements them by providing a simple metric everyone can understand.

Used well, SUS helps research teams quantify usability, track progress, and communicate about user experience in a language the whole organization understands.

Ready to start using SUS**?** Download our free SUS Survey Template with the complete questionnaire, automated scoring calculator, and interpretation guidelines.

Want help implementing SUS in your research program? Book a free 30-minute consultation to discuss how to integrate SUS effectively into your research practice.