How to write concept testing surveys that capture intent

A concept testing survey that captures genuine intent asks respondents to evaluate a specific idea against their real needs, not just react to novelty. The difference between a useful concept test and a misleading one often comes down to how the questions are written, ordered, and scaled.

Most product managers get polite approval when they survey. Respondents say an idea sounds good, then never buy the product. This guide shows you how to close that gap by designing surveys that measure what people will actually do, not just what they say they like.

Why standard approval questions fail

Generic questions like “Do you like this idea?” or “Is this useful?” produce inflated scores. Respondents default to positive answers because disagreement feels impolite, especially in a survey where they cannot see the researcher’s reaction.

The result is a concept that scores well in testing but flops at launch. Product teams point to the survey data as evidence of demand, then discover that liking an idea and paying for it are very different behaviors.

Intent questions sidestep this by anchoring responses to behavior. Instead of “Do you like this?”, you ask “How likely are you to purchase this in the next three months?” or “Would you replace your current solution with this?” These questions require respondents to mentally simulate an action, which surfaces real friction and hesitation.

Survey structure: the five-section framework

A well-structured concept testing survey follows a consistent flow regardless of the concept being tested.

Section 1: screener and context (2 to 3 questions)

Start with a brief screener to confirm respondents match your target persona. For a B2B product, this might check job role, company size, and whether they currently use tools in the relevant category. For a consumer product, it might verify purchase behavior or problem familiarity.

Do not reveal the concept at this stage. You want baseline attitudes, not reactions to something you have already primed them with.

Section 2: concept stimulus (display, not a question)

Present the concept clearly. A short paragraph describing the problem it solves and the key benefit works well for early-stage ideas. For more developed concepts, a product image, mockup, or brief video adds context. Keep the stimulus factual and concise: three to five sentences is usually enough.

Avoid promotional language in the stimulus. Phrases like “revolutionary” or “game-changing” inflate subsequent approval scores and make it impossible to know whether respondents are reacting to the concept or the framing.

Section 3: comprehension (2 to 3 questions)

Before measuring intent, confirm respondents understood what you showed them. Ask:

“In your own words, what does this product do?” (open text)
“Which of the following best describes what this product is for?” (multiple choice with one correct option and plausible distractors)

High comprehension failure (more than 20 percent of respondents who cannot accurately describe the concept) is a signal problem, not a product problem. Address the messaging before drawing intent conclusions.

Section 4: intent and appeal (4 to 6 questions)

This is the core of the survey. Use a structured sequence:

Overall appeal: “How appealing is this product to you?” on a five-point scale from Not at all appealing to Extremely appealing. This is a warm-up, not your primary signal.

Uniqueness: “Compared to products you have seen or used before, how unique is this?” on a five-point scale from Not at all unique to Completely unique. High uniqueness combined with low intent often signals a comprehension problem or a niche that is too small.

Purchase intent (the most important question): Use a five-point scale with explicit anchors:

Definitely would not buy
Probably would not buy
Might or might not buy
Probably would buy
Definitely would buy

Report the “top two box” score (options 4 and 5 combined) as your headline intent number. Industry norms vary by category, but a top-two-box score above 40 percent for a new B2B product is generally worth pursuing.

Value for price: If pricing is relevant, add “At the price described, how would you rate the value for money?” on a scale from Very poor value to Excellent value.

Open-text follow-up: “What is the main reason you would (or would not) consider buying this product?” This single open question often reveals objections that closed questions miss entirely.

Section 5: diagnostics (3 to 4 questions)

Diagnostic questions help explain intent scores and guide iteration.

Diagnostic question	What it reveals
Which benefit matters most to you? (ranked list)	Where to focus positioning
What concerns, if any, do you have about this product? (open text)	Barriers to adoption
Who else in your organization would need to approve this purchase?	Buying committee complexity
Which current solution would this replace for you?	Competitive context

Diagnostics are especially useful when intent scores are average. A concept scoring 35 percent top-two-box might still be worth launching if diagnostics reveal that price is the only objection, and a pricing adjustment would move that number.

Choosing the right scale

Inconsistent scales across questions confuse respondents and corrupt your data. Pick a convention and stick with it.

Five-point Likert scales suit most intent and attitudinal questions. They are easy to understand, translate well across languages, and give enough range to detect differences without introducing false precision.

Seven-point scales add granularity for attitudinal research where subtle differences matter, but they increase cognitive load and are harder to benchmark against industry norms.

Binary yes/no questions are useful for clear behavioral checkpoints (“Would you click to learn more?”) but strip out the intensity information you need for intent measurement.

Avoid numeric 1-to-10 scales for intent questions. Respondents interpret the endpoints inconsistently, and 10-point scales produce clustered data that looks like wide variance but is mostly noise.

For a deeper look at how different question formats affect response quality, this walkthrough of survey question types covers the tradeoffs in detail.

Ordering questions to reduce bias

Question order shapes responses. A few rules prevent the most common ordering errors.

Put general questions before specific ones. Ask about overall appeal before asking about specific features. If you ask about a feature first, respondents anchor their overall appeal score to that feature rather than evaluating the whole concept.

Put behavioral questions before attitudinal ones. Ask “How likely are you to buy this?” before asking “How innovative do you find this?” Attitudinal questions prime positive framing that inflates subsequent intent responses.

Put open-text questions at the end of each section, not the beginning. Open text after closed questions gives respondents context for elaborating. Open text at the start produces vague, generic responses.

Randomize multi-select answer options where possible. Fixed option ordering causes primacy effects: the first option receives disproportionate selection regardless of content.

Common mistakes that undermine intent data

Leading language: “How much would this time-saving feature help you?” assumes the feature saves time. Rewrite as “How much time, if any, do you think this feature would save you per week?”

Double-barreled questions: “Is this product affordable and easy to use?” cannot be answered clearly. Split into two questions.

Skipping the comprehension check: Teams often skip this to save questions. Without it, you cannot tell whether a low intent score reflects a bad concept or a badly explained concept.

Treating all respondents equally: Weight or filter by screener responses before reporting. A respondent who barely meets the screener criteria should not carry the same weight as a respondent who is a perfect-fit buyer.

Forgetting to ask about switching: For competitive markets, asking “What would you need to give up to use this?” often reveals the real barrier, which is not price or appeal but inertia and switching cost.

Monadic versus comparative design

If you are choosing between two or three concepts, you have two design options.

A monadic design splits your sample into cells, each seeing one concept. Every cell answers the same survey. You compare intent scores across cells. This approach produces the cleanest intent signal because respondents are not influenced by seeing other concepts, but you need a larger total sample.

A sequential monadic design shows each respondent all concepts in randomized order, with the full survey repeated for each. This is more efficient for smaller samples but introduces order effects and comparison bias. Respondents often score the second concept relative to the first, not on its own merits.

A comparative design shows all concepts simultaneously and asks respondents to rank or choose. This works well for feature prioritization questions but poorly for absolute intent measurement, because a concept can win the comparison while still having low overall intent.

For most B2B product concepts, monadic design with 100 to 150 respondents per cell gives cleaner data than trying to economize with a sequential study.

Translating survey data into a decision

A concept test is not a vote. High intent scores are evidence, not a guarantee. Treat them as one input alongside competitive analysis, business model feasibility, and qualitative follow-up.

A useful decision framework:

Intent score	Diagnostic signal	Recommended action
Top-two-box above 50%	Strong appeal, clear value	Proceed to prototype or pilot
Top-two-box 35 to 50%	Mixed with identifiable objections	Iterate on messaging or features, retest
Top-two-box 35 to 50%	Mixed with no clear pattern	Run qualitative follow-up interviews
Top-two-box below 35%	Low across all segments	Revisit problem framing before investing further

Qualitative interviews alongside your survey add depth that numbers cannot capture. See how to recruit participants for product research for guidance on building the right respondent mix, and this guide to writing user interview questions for follow-up techniques.

Getting the right respondents

Survey quality depends on respondent quality. A perfectly written survey filled with unqualified respondents produces useless data.

For B2B concept tests, recruit within your specific buyer persona: the right job function, company size, industry, and purchase authority. A concept for enterprise procurement software needs responses from procurement managers or finance leaders, not general business users.

Platforms like CleverX provide access to an 8M+ verified B2B and B2C panel across 150+ countries, allowing you to define tight screener criteria and recruit respondents who genuinely match your target buyer profile. For niche B2B audiences, verified professional panels are especially important because general consumer panels often lack the domain context to give meaningful intent signals.

You can also run concept testing as a product research method alongside qualitative methods and combine it within a broader research framework to triangulate findings before committing to a build decision.

For external benchmarks on concept testing question design and intent measurement methodology, the Nielsen Norman Group’s survey best practices and Qualtrics research methodology documentation are useful references. The Market Research Society standards also provide guidance on questionnaire design and respondent ethics.

Frequently asked questions

What is the single most important question in a concept testing survey?

The purchase intent question is the strongest signal. A five-point scale from ‘Definitely would not buy’ to ‘Definitely would buy’ predicts real adoption far better than asking whether respondents like or dislike the concept. Pair it with an open-text follow-up to learn why.

How many questions should a concept testing survey have?

Keep it to 10 to 15 questions for a single-concept survey and no more than 20 for a comparative study. Longer surveys increase drop-off and fatigue, which skews your intent data toward respondents who are unusually motivated or unusually agreeable.

Should I use Likert scales or rating scales for concept testing?

Likert scales work better for attitudinal questions (agreement, likelihood, preference) because they capture direction and intensity. Simple 1-to-10 rating scales suit overall appeal scores. Avoid mixing scale types within the same section, as respondents find it cognitively taxing.

How do I avoid social desirability bias in concept testing surveys?

Write questions in a neutral tone that does not signal the ‘correct’ answer. Replace ‘How much do you love this idea?’ with ‘How likely are you to use this?’ Use indirect techniques like asking what a friend or colleague would think. Randomize answer options where possible.

What is the difference between monadic and comparative concept testing surveys?

A monadic survey shows one concept to each respondent, capturing unbiased individual reactions. A comparative survey shows two or more concepts so respondents can rank or choose between them. Monadic tests give cleaner intent signals; comparative tests reveal relative preference when you need to pick between options.

How many respondents do I need for a concept testing survey?

For a single monadic concept test, 150 to 200 respondents is typically enough to detect meaningful intent differences. For comparative studies with two or three concepts, you will need 100 to 150 per concept cell to reach statistical confidence. B2B concepts with narrow audiences can work with 50 to 75 qualified respondents.