Heuristic evaluation: 10 Nielsen principles explained

Heuristic evaluation means having experts review your interface against established usability principles to find problems. It’s not user testing. It’s expert review.

Think of it like a building inspector checking a house against building codes before people move in. The inspector knows what makes houses safe and functional, checks against standards, and identifies violations. You’re doing the same with interfaces.

Jakob Nielsen developed 10 usability heuristics in 1994 that remain the standard framework. These principles describe characteristics of usable interfaces based on decades of research.

Research teams use heuristic evaluation to:

Find obvious usability problems quickly before expensive user testing
Evaluate competitive products understanding their strengths and weaknesses
Assess designs early when they’re still easy to change
Train team members on usability principles

Figma’s research team uses heuristic evaluation before every major feature launch. They catch obvious issues quickly, then focus user testing on subtler problems heuristics might miss.

The 10 Nielsen usability heuristics

Each heuristic represents a different aspect of usability. Let’s go through them with real examples.

1. Visibility of system status

The principle: The system should always keep users informed about what’s happening through appropriate feedback within reasonable time.

Users shouldn’t wonder “did that work?” or “is something happening?” The interface should show its current state clearly.

Good examples:

Dropbox’s sync status icon shows whether files are syncing, synced, or have errors. Users always know if their files are safely uploaded.

Gmail’s “sending” animation when you send email. You see the message is being sent, then get confirmation when it’s done.

Linear’s loading states that show which specific data is loading rather than generic spinners. “Loading issues…” tells you what’s happening.

Bad examples:

Buttons that don’t show any response when clicked, leaving users wondering if they worked.

Forms that submit without confirmation, making users unsure if their action succeeded.

Long processes without progress indicators, forcing users to guess how long to wait.

How to evaluate: Go through every interactive element. Click buttons, submit forms, trigger actions. Does the interface clearly show that your action was registered? Do you know what’s happening at each step?

2. Match between system and the real world

The principle: The system should speak the user’s language with words, phrases, and concepts familiar to them rather than system-oriented terms.

Your interface should match how users think, not how your database is structured.

Good examples:

Notion calls their organizational units “pages” and “databases” - familiar concepts - rather than technical terms like “nodes” or “records.”

Calendly uses “event types” rather than “appointment templates” because users think about creating different types of meetings, not templates.

Superhuman’s email keyboard shortcuts mirror Gmail’s shortcuts because most users already know Gmail.

Bad examples:

Error messages like “Error 404” or “Null pointer exception” that mean nothing to normal users.

Navigation using company org chart terminology that users don’t understand.

Technical jargon when plain language would work: “authenticate” instead of “log in,” “terminate” instead of “end.”

How to evaluate: Read every piece of text. Would your target users understand it without explanation? Are you using terms from their world or yours? Check if concepts match user mental models.

3. User control and freedom

The principle: Users often perform actions by mistake and need clearly marked emergency exits to leave unwanted states without going through extended processes.

People make mistakes. Your interface should make it easy to undo them.

Good examples:

Gmail’s “undo send” feature letting you cancel sent emails within a few seconds.

Figma’s infinite undo/redo allowing designers to experiment freely knowing they can always go back.

Notion’s page history showing previous versions and letting you restore any version.

Bad examples:

Destructive actions with no undo, forcing users to start over if they make mistakes.

Wizards or flows you can’t exit partway through without losing all progress.

Required fields preventing users from saving drafts and continuing later.

How to evaluate: Try making mistakes deliberately. Delete things, make wrong selections, go down wrong paths. Can you easily undo actions or escape from states? Are there clear exits at every point?

4. Consistency and standards

The principle: Users shouldn’t have to wonder whether different words, situations, or actions mean the same thing. Follow platform and industry conventions.

Consistency reduces cognitive load. Users learn once and apply that learning everywhere.

Good examples:

Slack using standard conventions for text formatting: bold, italic, ~~strikethrough~~ matching other tools users know.

Linear placing primary actions (Save, Submit, Send) on the right side of dialogs following platform conventions.

Stripe using consistent card designs across their dashboard - similar information always appears in the same place.

Bad examples:

Save buttons that sometimes say “Save,” sometimes “Submit,” sometimes “Update” with no clear reason.

Checkboxes that sometimes mean “include this” and sometimes mean “exclude this.”

Icons with inconsistent meanings - a trash can that deletes in one place but archives in another.

How to evaluate: Look for inconsistencies. Do similar things look similar? Do the same actions always work the same way? Are you following conventions from the platforms your users know?

5. Error prevention

The principle: Better than good error messages is careful design preventing problems from occurring in the first place.

Good interfaces prevent errors rather than just handling them well after they happen.

Good examples:

Google Calendar’s conflict detection showing when you’re double-booking yourself before you confirm.

Grammarly’s real-time suggestions preventing writing errors as you type rather than catching them later.

GitHub’s protected branches preventing accidental deletions of important code.

Bad examples:

Allowing users to book overlapping meetings without warning.

Letting users delete important data without any safeguards or confirmations.

Accepting invalid data then showing errors after submission rather than during input.

How to evaluate: Look for places users could make mistakes. Does the interface prevent these mistakes or only catch them after they happen? Are constraints clear upfront?

6. Recognition rather than recall

The principle: Minimize the user’s memory load by making elements, actions, and options visible. Users shouldn’t have to remember information from one part of the interface to another.

People are better at recognizing things they see than recalling things from memory.

Good examples:

Figma’s recent files list showing thumbnails, not just names. You recognize the design you want rather than recalling exact filenames.

Notion’s @ mentions showing available pages as you type rather than requiring you to remember exact page names.

Calendly’s event type selection showing descriptions of each type rather than requiring users to remember what each does.

Bad examples:

Requiring users to remember and type exact commands rather than showing available options.

Multi-step processes where information from step 1 isn’t visible in step 3 when you need it.

Codes or IDs users must remember rather than select from lists.

How to evaluate: Go through workflows. Does the interface show what you need when you need it? Or do you have to remember things from earlier steps or previous sessions?

7. Flexibility and efficiency of use

The principle: Provide accelerators for expert users while keeping the interface simple for novices. Allow users to customize frequent actions.

Your interface should work for both beginners and experts.

Good examples:

Linear’s keyboard shortcuts for power users while maintaining full mouse/click functionality for beginners.

Notion’s slash commands providing quick access for experts while keeping menus available for everyone.

Gmail’s keyboard shortcuts, labels, and filters letting power users work faster while the basic interface remains simple.

Bad examples:

Forcing everyone through step-by-step wizards that experts find tedious.

No keyboard shortcuts or ways to batch actions for frequent tasks.

Inability to customize workflows or save preferences for repeated actions.

How to evaluate: Try doing common tasks as both a beginner and an expert. Are there shortcuts and efficiencies available for experienced users? Can the interface scale with user skill?

8. Aesthetic and minimalist design

The principle: Interfaces shouldn’t contain information that’s irrelevant or rarely needed. Every extra unit of information competes with relevant information.

Simple doesn’t mean simplistic. It means focused on what matters.

Good examples:

Linear’s clean interface showing only essential information about issues, hiding metadata until needed.

Superhuman’s minimal email interface removing clutter and focusing on the current message.

Stripe’s dashboard progressive disclosure - showing summary data upfront, details on demand.

Bad examples:

Dashboards showing 20 metrics when users primarily care about 3.

Forms with unnecessary fields just because the database can store them.

Interfaces cramming every possible action onto every screen.

How to evaluate: Look at each screen. What’s the primary task? Does everything on screen support that task? What could be removed or hidden without losing functionality?

9. Help users recognize, diagnose, and recover from errors

The principle: Error messages should be expressed in plain language, precisely indicate the problem, and constructively suggest a solution.

When errors happen, help users fix them quickly.

Good examples:

Grammarly’s specific error messages: “This word is usually spelled ‘receive’” rather than just “spelling error.”

GitHub’s error messages that explain what went wrong and suggest fixes: “Branch protection rules prevent this. Try creating a pull request instead.”

Stripe’s validation messages showing exactly which field has a problem and what format is expected.

Bad examples:

Generic error messages: “An error occurred. Please try again.”

Technical errors: “500 Internal Server Error” with no user-friendly explanation.

Messages that describe the problem but offer no solution or next steps.

How to evaluate: Trigger errors deliberately. Submit invalid forms, try unauthorized actions, break things intentionally. Are error messages helpful? Do they explain what’s wrong and how to fix it?

10. Help and documentation

The principle: Even better interfaces sometimes need documentation. Help should be easy to search, focused on the user’s task, list concrete steps, and not be too large.

Most users won’t read documentation, but when they need it, it should be helpful.

Good examples:

Notion’s inline help that appears contextually when you’re stuck, rather than requiring you to search help docs.

Figma’s tooltips showing keyboard shortcuts when you hover over tools.

Linear’s command palette with search functionality letting you find actions without memorizing commands.

Bad examples:

Help documentation requiring you to read entire guides to find simple answers.

No contextual help, forcing users to leave the interface to search external docs.

Documentation written in technical language that only developers understand.

How to evaluate: Try doing unfamiliar tasks. Can you find help when you need it? Is it specific to your current task? Does it actually help you accomplish your goal?

How to conduct a heuristic evaluation

Understanding the heuristics is one thing. Using them effectively requires a structured approach.

Step 1: Define scope

Decide what you’re evaluating. A specific feature? An entire product? A competitor’s interface?

Be specific. “Evaluate the signup flow” is better than “evaluate the product.”

Notion’s research team evaluates specific workflows - “new user onboarding,” “database creation,” “page sharing” - rather than the entire product at once.

Step 2: Recruit evaluators

You need 3-5 evaluators. Why multiple people?

Research shows one evaluator finds about 35% of usability problems. Three evaluators find about 75%. Five evaluators find about 85%. Beyond five, you get diminishing returns.

Evaluators should understand usability but don’t need to be experts in your specific domain. Sometimes external perspectives catch issues team members miss.

Step 3: Independent evaluation

Each evaluator goes through the interface independently, noting issues they find.

Use a structured template documenting:

Which heuristic is violated
Where in the interface the problem occurs
Description of the specific issue
Severity rating (minor annoyance to critical problem)

Spend 1-2 hours for a focused feature evaluation. Don’t rush but don’t overthink either.

Linear’s researchers use a shared spreadsheet where each evaluator logs issues in their own tab during independent evaluation.

Step 4: Severity rating

Not all problems are equal. Rate each issue:

Severity 0: Not actually a usability problem Severity 1: Cosmetic problem, fix if time allows Severity 2: Minor usability problem, low priority fix Severity 3: Major usability problem, important to fix Severity 4: Usability catastrophe, must fix before release

Consider both frequency (how often users encounter it) and impact (how badly it affects them).

Step 5: Consolidation meeting

Evaluators meet to discuss findings, consolidate duplicate issues, and agree on severity ratings.

This is where having multiple evaluators pays off. Discussion reveals whether one person’s concern is actually a problem or just personal preference.

Create a final list of unique issues with agreed severity ratings.

Step 6: Report and recommendations

Document findings in a format teams can act on.

For each issue include:

Screenshot showing the problem
Which heuristic is violated
Severity rating
Specific recommendation for fixing it

Figma’s evaluation reports include before/after mockups showing current state and suggested improvements.

When to use heuristic evaluation

Heuristic evaluation fits specific situations better than others.

Early in design

Evaluate prototypes and mockups before building anything. Catching problems in design is way cheaper than catching them in code.

Calendly evaluates Figma prototypes before development starts. They fix issues when it’s just changing pixels, not refactoring code.

Before user testing

Do heuristic evaluation first to catch obvious problems, then focus user testing on subtler issues experts might miss.

Why waste testing time on problems you could have caught with expert review? User testing is expensive. Use it for questions experts can’t answer.

Competitive analysis

Evaluate competitors’ products to understand their strengths and weaknesses.

Linear regularly evaluates competing project management tools identifying what works well and where competitors struggle.

Regular audits

Periodically evaluate your own product to catch usability debt accumulating over time.

As products evolve, inconsistencies and violations creep in. Regular audits catch them before they compound.

Limitations of heuristic evaluation

Heuristic evaluation is useful but not perfect. Understand its limitations.

Misses context-specific problems

Evaluators guess at usage context. They might miss problems that only occur in specific situations real users face.

User testing with actual target users in real contexts catches problems heuristic evaluation misses.

Can produce false positives

Evaluators might flag things as problems that don’t actually bother users. Some heuristic violations don’t impact real usage.

Miro’s research team validates heuristic findings with user testing before major fixes. Sometimes things that seem like problems don’t actually affect users.

Depends on evaluator expertise

Poor evaluators find fewer and less important problems. The method is only as good as the people conducting it.

Doesn’t reveal user mental models

Heuristic evaluation finds interface problems but doesn’t reveal how users think about tasks or what they’re trying to accomplish.

For understanding user needs and mental models, you need generative research methods like interviews and observations.

Combining heuristic evaluation with other methods

Heuristic evaluation works best alongside other research methods.

With user testing**:** Do heuristic evaluation first to catch obvious issues, then user testing to find problems experts miss and validate that fixes work.

With analytics: Use analytics to identify problem areas (where users struggle or drop off), then heuristic evaluation to diagnose specific usability issues.

With surveys: Survey users about satisfaction and pain points, then use heuristic evaluation to identify specific violations causing those issues.

Notion combines methods strategically. Heuristics catch interface problems. User interviews reveal user needs. Analytics show which problems affect the most users.

Common mistakes in heuristic evaluation

Treating it like user testing

Heuristic evaluation is expert review, not user testing. Evaluators assess against principles, they don’t pretend to be users.

Don’t say “I, as a user, would…” You’re an expert evaluating against heuristics.

Skipping the independent evaluation phase

When evaluators work together from the start, groupthink emerges. One person’s opinion influences others.

Independent evaluation then consolidation finds more issues.

Not using structured templates

Without structure, evaluators forget heuristics or miss documenting important details.

Use templates ensuring consistent, complete evaluations.

Confusing preferences with problems

“I don’t like this color” isn’t a heuristic violation unless it causes actual usability problems.

Focus on issues that violate principles, not personal preferences.

Evaluating too much at once

Trying to evaluate an entire complex product in one session leads to superficial evaluation.

Break it into manageable pieces - specific workflows or features.

Getting started with heuristic evaluation

If you’ve never done heuristic evaluation:

Start small: Evaluate one feature or workflow, not your entire product.

Use a template: Download a heuristic evaluation template with all 10 heuristics and structured fields for documenting issues.

Practice with familiar products: Evaluate well-known products to learn the method before evaluating your own.

Get multiple perspectives: Recruit diverse evaluators. Different backgrounds catch different issues.

Follow up with user testing**:** Validate your findings with real users to build confidence in the method.

Linear’s research team started by evaluating their competitors using heuristic evaluation. This taught them the method and revealed opportunities for their own product.

The value of heuristic evaluation

Heuristic evaluation won’t replace user testing, but it’s a valuable tool in your research toolkit.

It’s fast - 3-5 people can evaluate a feature in a few hours. It’s cheap - no participant recruitment or incentives. It catches obvious problems before you invest in development.

Used appropriately alongside other research methods, heuristic evaluation helps teams build more usable products.

The 10 Nielsen heuristics provide a shared language for discussing usability. Instead of vague complaints, you can point to specific principles being violated and explain why it matters.

Ready to conduct your first heuristic evaluation**?** Download our free Heuristic Evaluation Template with structured worksheets, severity rating guides, and report templates.

Want training on effective heuristic evaluation**?** Book a free 30-minute consultation to discuss how to integrate heuristic evaluation into your research practice.