Open-ended qualitative questions elicit detailed user stories about behaviors, motivations, and pain points to guide product decisions and discovery.!

SUS: a 10-question scale (0–100) measuring perceived usability, quick, reliable benchmark for tracking and comparing product usability.
The System Usability Scale (SUS) is a 10-question survey that measures how usable people find a product. It was created by John Brooke in 1986 and has become the industry standard for measuring perceived usability.
Each question uses a 5-point scale from "Strongly Disagree" to "Strongly Agree." The scoring converts responses into a number from 0 to 100, though it's not a percentage.
Research teams use SUS because it's:
Quick - takes users 2-3 minutes to complete
Free - no licensing fees or usage restrictions
Reliable - produces consistent results across studies
Validated - decades of research confirming it works
Comparable - you can benchmark against other products
Notion runs SUS surveys after major feature releases to measure whether usability improved. They track scores over time, expecting increases when they've successfully made things easier to use.
The questionnaire uses specific wording that's been validated over decades. Don't change the wording or you'll invalidate comparisons to benchmark data.
The 10 standard questions:
I think that I would like to use this system frequently.
I found the system unnecessarily complex.
I thought the system was easy to use.
I think that I would need the support of a technical person to be able to use this system.
I found the various functions in this system were well integrated.
I thought there was too much inconsistency in this system.
I would imagine that most people would learn to use this system very quickly.
I found the system very cumbersome to use.
I felt very confident using the system.
I needed to learn a lot of things before I could get going with this system.
Notice the questions alternate between positive and negative statements. This is intentional - it prevents people from just checking the same answer down the column without thinking.
The scoring is a bit weird but there's a reason for it. Here's the step-by-step process.
For each question, responses map to numbers:
Strongly Disagree = 1
Disagree = 2
Neutral = 3
Agree = 4
Strongly Agree = 5
This is where it gets funky. Odd-numbered and even-numbered questions score differently.
For odd-numbered questions (1, 3, 5, 7, 9): Subtract 1 from the user's response.
If they responded 5 (Strongly Agree), contribution = 5 - 1 = 4
If they responded 3 (Neutral), contribution = 3 - 1 = 2
If they responded 1 (Strongly Disagree), contribution = 1 - 1 = 0
For even-numbered questions (2, 4, 6, 8, 10): Subtract the user's response from 5.
If they responded 5 (Strongly Agree), contribution = 5 - 5 = 0
If they responded 3 (Neutral), contribution = 5 - 3 = 2
If they responded 1 (Strongly Disagree), contribution = 5 - 1 = 4
Why the difference? Even-numbered questions are negative statements. Someone strongly agreeing that your system is "unnecessarily complex" is bad. The scoring flips these so higher numbers always mean better usability.
Add up the contributions from all 10 questions. The range will be 0 to 40.
Take your sum and multiply by 2.5. This converts the 0-40 range to 0-100.
Example calculation:
User responses:
Agree (4)
Disagree (2)
Strongly Agree (5)
Strongly Disagree (1)
Agree (4)
Disagree (2)
Agree (4)
Disagree (2)
Agree (4)
Strongly Disagree (1)
Contributions:
4 - 1 = 3
5 - 2 = 3
5 - 1 = 4
5 - 1 = 4
4 - 1 = 3
5 - 2 = 3
4 - 1 = 3
5 - 2 = 3
4 - 1 = 3
5 - 1 = 4
Sum = 33
SUS Score = 33 × 2.5 = 82.5
This user gave your product a SUS score of 82.5, which is quite good.
A SUS score of 82.5 is "quite good" but what does that actually mean? How do you interpret scores?
Research analyzing hundreds of SUS studies found the average score is 68. This becomes your baseline.
Above 68: Better than average usability
Below 68: Worse than average usability
But "average" doesn't mean "acceptable." It just means middle of the pack.
Jeff Sauro's research mapped SUS scores to letter grades based on percentile rankings:
80-100: Grade A (Excellent usability)
68-79: Grade B (Good usability)
51-67: Grade C (Okay usability)
26-50: Grade D (Poor usability)
0-25: Grade F (Awful usability)
Figma's mobile app launched with a SUS score of 73 (grade B). After six months of improvements informed by user research, they increased it to 81 (grade A).
Another interpretation framework divides scores into acceptability ranges:
Above 71.4: Acceptable
51-71: Marginal (needs improvement)
Below 51: Not acceptable
These thresholds come from mapping SUS scores to other usability measures and user satisfaction data.
90-100: Exceptional. Users love your product's usability. Superhuman consistently scores in this range.
80-89: Excellent. Users find it easy to use with minimal frustration. Linear and Notion typically score here.
70-79: Good. Usable but room for improvement. Most successful products fall in this range.
60-69: Okay. Users can accomplish tasks but experience friction. Many enterprise tools score here.
50-59: Poor. Users struggle. Significant usability improvements needed.
Below 50: Terrible. Major problems preventing users from accomplishing basic tasks.
How many responses do you need for reliable SUS scores?
Minimum: 12-15 responses for a single product evaluation. Fewer than this and your score could vary widely.
Recommended: 20-30 responses for more confidence in your results.
For comparisons: 30+ per condition when comparing two designs or products against each other.
Calendly runs SUS surveys with 30-40 responses per major feature. This gives them confidence that score differences reflect real usability changes, not random variation.
Small sample sizes aren't useless, just less reliable. A score of 85 from 10 people gives you a general sense but don't make major decisions based on it.
SUS fits specific research situations better than others.
Testing two different designs? Have users try both and complete SUS for each. Significant score differences indicate real usability differences.
Miro tested two navigation approaches. Design A scored 71, Design B scored 79. The 8-point difference suggested Design B was meaningfully more usable. They validated with deeper usability testing and shipped Design B.
Run SUS surveys regularly (quarterly or after major releases) to track whether usability is improving.
Airtable tracks SUS scores quarterly. When scores drop, they investigate what changed and conduct targeted usability research to identify specific problems.
Have users evaluate your product and competitors. SUS scores help quantify perceived usability differences.
Just make sure participants have actually used the products they're rating. Don't ask them to rate based on quick demos.
Need a fast read on perceived usability? SUS takes 2-3 minutes and gives you a number you can track.
It's not comprehensive usability testing. It's a quick measure of user perception.
"Improve usability" is vague. "Increase SUS score from 68 to 75 by Q3" is measurable.
Linear sets SUS score targets for major features. If a feature launches below 70, they prioritize usability improvements until it reaches acceptable levels.
SUS isn't appropriate for every situation.
SUS tells you there's a problem (low score) but not what the problem is. For identifying specific usability issues, use usability testing, heuristic evaluation, or analytics.
Think of SUS like a thermometer. It tells you someone has a fever but not why. You need other methods to diagnose the illness.
A single-purpose tool with minimal interface might get high scores from everyone, making SUS less useful for tracking improvements.
Users need enough experience to form opinions about usability. Having someone try your product for five minutes then complete SUS produces unreliable results.
Wait until users have accomplished at least 2-3 real tasks before asking for SUS ratings.
SUS measures overall product usability. Don't use it for specific features in isolation. Use feature-specific satisfaction questions instead.
Notion doesn't run SUS for individual features like databases or templates. They run it for the overall Notion experience.
How you administer SUS affects result quality.
Don't paraphrase or "improve" the questions. The weird formal language ("I would imagine that most people...") has been validated. Changes invalidate benchmarking.
Replace "system" with your product name: "I think that I would like to use Notion frequently" instead of "I think that I would like to use this system frequently."
Administer SUS after users have completed meaningful tasks with your product. Not after a demo, after actual usage.
Dropbox waits until users have used a feature for at least a week before sending SUS surveys. Initial reactions differ from informed opinions.
Include a brief introduction explaining what users should think about when answering.
"Please rate your experience using [Product Name] for [specific use case/tasks]. Think about your overall experience over the past [timeframe]."
This focuses responses on relevant usage rather than their entire history with your product.
Put SUS questions together without other questions in between. The 10 questions should flow as a unit.
You can ask other questions before or after, just don't interrupt the SUS questionnaire itself.
Present the 5-point scale consistently:
Strongly Disagree
Disagree
Neutral
Agree
Strongly Agree
Don't use different labels or numbers of points. Stick with the standard.
Use radio buttons, not dropdowns. Users should see all options at once for easy selection.
Ensure the survey is mobile-friendly. Many users will complete it on phones.
Webflow's SUS surveys are optimized for mobile since many designers switch between desktop and mobile contexts.
You've collected responses. Now what?
First, calculate individual SUS scores following the scoring method. Don't average the raw question responses.
Average all individual SUS scores. This is your reported SUS score.
With 25 participants scoring: 82, 79, 91, 73, 68, 85, 77, 82, 88, 71, 76, 84, 69, 92, 78, 81, 75, 87, 74, 83, 79, 86, 72, 80, 85
Mean SUS = (sum of all scores) / 25 = 79.8
With smaller samples, include confidence intervals showing the range where the true score likely falls.
"SUS score: 79.8 (95% CI: 76.2-83.4)" tells stakeholders you're reasonably confident the true score falls between 76 and 83.
When comparing scores, differences of 5 points or less might not be meaningful, especially with small samples.
A change from 72 to 75 isn't necessarily real improvement. It could be sampling variation. A change from 72 to 82 is likely meaningful.
When tracking over time, plot SUS scores showing trends. This makes progress (or regression) obvious.
Amplitude tracks SUS quarterly and plots scores with trend lines. They can see whether usability is improving, stable, or degrading over time.
SUS tells you the what (usability score). Combine it with usability testing or user interviews to understand the why.
"Our SUS score dropped from 78 to 71 last quarter. Follow-up research revealed the new navigation confused users."
"I found the system unnecessarily complex" sounds awkward. Teams often want to "improve" it to "The system is too complex" or "The system is overly complicated."
Don't. The awkward wording is validated and tested. Changes invalidate your results and prevent benchmarking.
SUS after five minutes with a product produces meaningless scores. Users need enough experience to form real opinions.
A 2-point SUS difference between Design A (73) and Design B (75) doesn't mean much, especially with small samples.
Look for differences of 5-8 points or more before considering them meaningful.
A SUS score of 68 doesn't mean 68% of users found it usable or that users are 68% satisfied. It's a scaled score where 68 happens to be average.
SUS is one measure. It shouldn't be your only usability metric. Combine it with task completion rates, error rates, time on task, and qualitative feedback.
Five responses give you a number but not much confidence. Aim for at least 15, preferably 20-30.
Several variations exist for specific contexts.
Some research suggests all positive wording might be clearer. The Positive SUS rephrases all questions positively.
Original: "I found the system unnecessarily complex" PSUS: "I found the system to be simple"
The scoring changes accordingly. PSUS isn't as widely used, making benchmarking harder.
The original SUS says "system" which works for software but sounds odd for websites. Substituting "website" is common and acceptable.
UMUX (Usability Metric for User Experience): A 4-question alternative that's faster but less thoroughly validated.
SUPR-Q: Specifically designed for websites, measuring usability, credibility, loyalty, and appearance.
NASA-TLX: Measures task workload rather than usability, useful for complex interfaces.
Most research teams stick with SUS because it's well-validated and allows benchmarking.
Before making changes, establish your current SUS score. You need a baseline to measure improvement against.
Linear measured SUS for their issue creation flow before redesigning it. The baseline of 69 gave them a clear target: get above 75.
Based on your baseline, set realistic goals.
If you're at 62, shooting for 85 might be unrealistic in one release. Target 68-70 first, then 75+, then 80+.
Use the same methodology each time. Same introduction, same timing after usage, same delivery method.
Inconsistency makes it hard to know if score changes reflect real usability improvements or methodology differences.
When SUS scores are low, conduct usability testing to identify specific problems. When scores are high, interview users to understand what works well.
SUS tells you if there's a problem. Other methods tell you what the problem is and how to fix it.
"Our SUS score is 72" means nothing to most stakeholders. Frame it:
"Our SUS score is 72, which puts us in the 'Good' range (B grade). This is above average (68) but below our competitors who average 78."
Context makes the number meaningful.
If you've never used SUS:
Step 1: Pick a product or feature to measure. Start with something complete that users have experience with.
Step 2: Set up a survey using the exact SUS questions. Tools like Typeform, Google Forms, or Qualtrics work fine.
Step 3: Recruit 20-30 participants who've used the product meaningfully (at least a few times, ideally for a week or more).
Step 4: Have them complete the survey after a typical usage session.
Step 5: Calculate scores and compare to benchmarks.
Step 6: Share results with your team and identify whether deeper research is needed.
Notion started using SUS this way, measuring their overall product first. They got a baseline of 74. This established where they stood and gave them a number to beat.
SUS isn't magic. It's just one tool in your research toolkit.
Its value comes from being quick, standardized, and comparable. You can track changes over time, benchmark against competitors, and get a usability pulse without major research investments.
But it doesn't replace comprehensive usability testing, user interviews, or analytics. It complements them by providing a simple metric everyone can understand.
Used well, SUS helps research teams quantify usability, track progress, and communicate about user experience in a language the whole organization understands.
Ready to start using SUS? Download our free SUS Survey Template with the complete questionnaire, automated scoring calculator, and interpretation guidelines.
Want help implementing SUS in your research program? Book a free 30-minute consultation to discuss how to integrate SUS effectively into your research practice.
Access identity-verified professionals for surveys, interviews, and usability tests. No waiting. No guesswork. Just real B2B insights - fast.
Book a demoJoin paid research studies across product, UX, tech, and marketing. Flexible, remote, and designed for working professionals.
Sign up as an expert