How to run AI-moderated interviews: a complete guide for research teams
A complete guide to running AI-moderated user interviews. Covers setup methodology, conversational AI vs static surveys, AI vs human moderator comparison, vendor landscape, prompt design, and validation techniques.
AI-moderated interviews use conversational AI to autonomously conduct user research sessions, dynamically probing responses and adapting in real time. Unlike static survey forms, they sustain natural dialogue with participants, generating responses 2.5 to 8 times longer and richer than traditional questionnaires. Unlike human-moderated interviews, they scale to hundreds of sessions in hours instead of weeks. This guide covers how AI-moderated interviews work, when to use them versus human moderation or static surveys, the step-by-step setup methodology, prompt design best practices, vendor landscape, and validation techniques.
Frequently asked questions
What is an AI-moderated interview?
An AI-moderated interview is a research session where a conversational AI conducts the interview with a real human participant, asking questions, listening to responses, and dynamically following up based on what the participant says. Unlike static surveys (which deliver pre-scripted questions in a fixed order with no adaptation) and unlike fully human-moderated interviews (which require a human researcher to conduct each session), AI-moderated interviews combine the scalability of automation with the dynamic probing of conversational research. They are most useful for high-volume validation, survey augmentation, and research where consistency across sessions matters more than human rapport.
How are AI-moderated interviews different from static surveys?
Static surveys deliver fixed questions in a fixed order, with no adaptation to participant responses. Participants click through forms, often satisficing on long surveys and abandoning when fatigued. AI-moderated interviews instead conduct dynamic dialogues: the AI listens to each response, asks follow-up questions based on what was said, probes for deeper insight, and adapts the conversation in real time. Research from conversational AI vendors shows that participants give responses 2.5 to 8 times longer in conversational AI interviews than in equivalent static surveys, with higher completion rates and richer qualitative depth.
How are AI-moderated interviews different from human-moderated interviews?
AI-moderated interviews are faster, cheaper, and more scalable than human moderation. They run hundreds of sessions in hours instead of weeks, cost 70 to 90% less per session, and deliver consistent question framing across every interview. Human-moderated interviews retain the advantage of empathy, body language reading, and creative pivots in unexpected directions. The difference is consistency vs nuance: AI moderation excels at structured probing at scale, while humans excel at rapport, sensitive topics, and exploratory research where the path is unpredictable.
When should I use AI-moderated interviews?
Use AI-moderated interviews for high-volume validation studies, frequent product testing where consistency matters, survey augmentation where you need richer responses than static forms allow, and exploratory research with broad audiences where speed is critical. Avoid AI moderation for sensitive topics requiring empathy (health, mental health, trauma-informed research), highly exploratory research where unexpected pivots matter, and small-sample studies where the human moderator advantage outweighs speed gains. The synthetic respondents vs real participants comparison covers the broader decision framework for AI-assisted research.
How accurate are AI-moderated interviews compared to human-moderated?
AI-moderated interviews produce comparable quantitative results to human-moderated interviews on structured questions, with better consistency due to identical question framing across sessions. On qualitative depth, AI moderators capture more verbatim content per participant (longer, more thoughtful responses) but miss the emotional nuance and unexpected directions that skilled human moderators surface. The Nielsen Norman Group’s analysis of AI moderation found that AI works well for evaluative research with clear questions and falls short for open-ended discovery research that depends on moderator judgment.
What tools do I need to run AI-moderated interviews?
You need three things: a conversational AI platform that can conduct interviews (Outset.ai, Strella, Marvin AI, GroupSolver, Maze conversational AI, Qualtrics XM conversational, CleverX dialogue AI), a participant recruitment source (your customer panel, a recruitment platform, or in-product intercepts), and an analysis tool that handles the output (Dovetail, Marvin, or your existing research repository). Most AI-moderated interview platforms include transcription, theme tagging, and basic analysis built in, reducing the need for separate analysis tools for early-stage work.
How AI-moderated interviews work
AI-moderated interviews are technically simple but operationally sophisticated. Understanding the architecture helps you evaluate platforms and design effective studies.
The four core components
1. Conversational AI moderator. A large language model (typically GPT-4, Claude, Gemini, or similar) is configured with the role of a research interviewer. The model receives a discussion guide, understands the research objectives, and conducts the interview with participants in real time.
2. Participant interface. Participants interact with the AI through a chat interface (text), voice interface (real-time audio), or asynchronous voice (record-and-respond). Most platforms support multiple formats and let participants choose.
3. Probing logic. The AI is configured with instructions for when and how to follow up: probe for specific examples, ask “tell me more about that,” request clarification on ambiguous answers, and recognize when a topic is exhausted. This logic is what distinguishes AI moderation from static survey delivery.
4. Output processing. Sessions are recorded, transcribed, and tagged automatically. The AI may also generate session summaries, theme classifications, and sentiment scores. Output flows to research repositories like Dovetail, Marvin, or directly into reports.
The session lifecycle
A single AI-moderated interview session typically runs:
- Welcome and consent: AI introduces itself, explains the study, and captures consent
- Context-setting questions: AI asks background questions to establish participant context
- Core research questions: AI works through the discussion guide, probing as needed
- Adaptive follow-ups: AI follows interesting threads based on participant responses
- Wrap-up and thanks: AI summarizes, asks any closing questions, and ends the session
A typical session runs 10 to 20 minutes, shorter than a human-moderated equivalent but with more concentrated content because there is no small talk, scheduling overhead, or moderator pacing inefficiency.
Step-by-step methodology for running AI-moderated interviews
Step 1: Set goals and draft a discussion guide
Define what you want to learn from the interviews. AI moderation works best when objectives are specific and the discussion guide is tightly scoped.
Good objective: “Test usability of the new patient handoff feature in our telehealth app and identify the top 3 friction points for nurses on shift change.”
Bad objective: “Understand nurses better.”
Draft 6 to 10 core questions. AI moderators can handle longer discussion guides, but participant fatigue and session quality drop after question 10. Each question should:
- Be open-ended (not yes/no)
- Have a clear research purpose
- Allow for follow-up probing
- Avoid leading language
The AI will generate dynamic follow-ups based on participant responses, so you don’t need to script every possible probe. Provide guidance on what to probe for (specific examples, emotional reactions, workarounds) rather than scripting every follow-up.
Step 2: Choose the format
| Format | Best for | Trade-offs |
|---|---|---|
| Real-time voice chat | Highest engagement, captures emotion, faster sessions | Participants must be available in real time |
| Asynchronous voice (record-and-respond) | Time-zone flexibility, participants think between responses | Less natural flow, longer to complete |
| Text chat | Easiest to scale, lowest tech requirements, participant comfort | Less emotional nuance, potentially shorter responses |
Voice formats produce richer data; text formats scale more easily. Most teams start with text and move to voice for studies where emotion matters.
Step 3: Configure the AI moderator
Configure the AI with:
- Role and tone: “You are a friendly, neutral research interviewer focused on understanding how nurses experience patient handoffs.”
- Discussion guide: The 6-10 core questions with brief context
- Probing instructions: When to ask for examples, when to follow up on emotion, when to move on
- Constraints: What NOT to do (avoid leading questions, don’t share opinions, don’t agree or disagree with participants)
- Compliance: For regulated work, restrict the AI from collecting PHI in chat windows
- Voice and subtitles: For voice formats, choose voice characteristics and enable subtitles for accessibility
Step 4: Recruit participants and launch
Recruit participants through your usual channels: existing customer panel, recruitment platforms, in-product intercepts, or external panels. AI-moderated interviews scale easily, so you can recruit larger samples than human moderation supports.
For regulated industries, ensure:
- Recruitment complies with relevant regulations (HIPAA, COPPA, GDPR)
- AI vendor has appropriate Business Associate Agreements (BAAs) for healthcare
- Consent forms cover AI moderation specifically
- Data handling meets your retention and access requirements
Launch the study and let it run. AI moderators handle hundreds of sessions in parallel across time zones, completing studies that would take human moderators weeks in hours.
Step 5: Monitor and analyze
While the study runs, monitor early sessions for:
- Question clarity: Are participants understanding what’s being asked?
- AI behavior: Is the AI probing appropriately or asking irrelevant follow-ups?
- Response quality: Are responses substantive or shallow?
- Bias signals: Is the AI leading participants toward certain answers?
Pause and adjust if early sessions reveal problems. The advantage of AI moderation is that you can iterate the discussion guide between batches without losing time to reschedule moderators.
After data collection:
- Review automated transcription for accuracy
- Review AI-generated theme tags for accuracy
- Look for outliers and unexpected findings (the AI may have missed these)
- Export to your research repository for synthesis
Step 6: Iterate before scaling
Run a pilot of 5 to 10 sessions before scaling to full sample size. Use the pilot to:
- Refine prompts for neutrality
- Test edge cases (skeptical participants, unclear answers, off-topic responses)
- Validate that the AI captures the depth you need
- Adjust the discussion guide for clarity
The cost of piloting is low (10 sessions in an afternoon), and the cost of scaling a flawed study is high (hundreds of sessions with bad data).
AI-moderated vs human-moderated interviews
| Dimension | AI-moderated | Human-moderated |
|---|---|---|
| Speed to results | Hours for 100+ sessions | Days to weeks for 10-20 sessions |
| Cost per session | $5-$25 platform cost + incentive | $100-$300/hour moderator + incentive |
| Consistency | Identical framing across all sessions | Varies by moderator, fatigue, day |
| Probing quality | Good for structured questions; mechanical | Excellent; reads context and emotion |
| Empathy and rapport | Limited; functional politeness | High; genuine human connection |
| Body language reading | None | Strong (in person or video) |
| Creative pivots | Rare; sticks to guide | Common; pursues unexpected threads |
| Scaling | Trivially scalable to thousands | Bound by moderator capacity |
| Time-zone coverage | 24/7 | Bound by moderator schedule |
| Note-taking and transcription | Automatic | Requires separate tools/effort |
| Sensitive topics | Generally inappropriate | Required for trauma-informed work |
| Best for | Validation, frequent testing, broad audiences | Discovery, sensitive topics, complex contexts |
| Cost savings vs human | 70-90% cheaper | Baseline |
| Quality on novel topics | Weak (no judgment) | Strong (moderator adapts) |
| Consistency advantage | Major (no moderator drift) | Major weakness (moderator variance) |
When AI moderation wins
AI moderation is the better choice when:
- You need to run many sessions quickly (weekly or biweekly testing cycles)
- Consistency across sessions matters more than depth in any single session
- The research question is well-defined and structured
- You have a broad audience and want to maximize sample size
- Cost is a significant constraint
- You need 24/7 availability across time zones
When human moderation wins
Human moderation remains essential when:
- The research involves sensitive topics (health, mental health, trauma, financial vulnerability)
- The research is exploratory and the path may shift mid-session
- Building rapport is critical for honest responses
- Body language and emotional cues are important data
- Stakeholder credibility requires human researcher involvement
- The audience is unfamiliar with technology or uncomfortable with AI
Conversational AI vs static surveys
The most underappreciated comparison is between AI-moderated interviews and traditional static surveys. The differences are dramatic.
Why conversational AI outperforms static surveys
Response length: Participants give responses 2.5 to 8 times longer in conversational AI interviews compared to static survey free-text fields. The dialogue format encourages elaboration; the form format encourages brevity.
Completion rates: Conversational AI typically delivers higher completion rates for studies of equivalent length, because the dialogue feels less tedious than clicking through forms.
Depth of insight: Conversational AI can probe ambiguous answers in real time; static surveys cannot. The result is qualitative depth that static surveys cannot match.
Engagement quality: Participants treat conversational AI more like a real conversation, leading to less satisficing and more thoughtful answers.
Bias reduction: The conversational format reduces some forms of survey bias (response order effects, anchoring on multiple choice options) but introduces others (social desirability shifts, AI sycophancy if not configured carefully).
When static surveys remain better
Static surveys are still the right choice for:
- Pure quantitative research: Likert scales, rating tasks, structured data collection where you want clean comparable data across thousands of respondents
- Massive sample sizes: Surveys delivered to 10,000+ respondents where conversational AI cost would be prohibitive
- Highly structured tasks: A/B testing of design variants where participant input is limited to a few clicks
- Audiences uncomfortable with AI conversation: Some demographics prefer the predictability of forms
The hybrid approach
The most effective programs use both: static surveys for clean quantitative data on structured questions, and AI-moderated interviews for the qualitative depth that surveys cannot capture. Sending the same audience through both channels often produces complementary findings: the survey shows what people think, the conversation reveals why.
Vendor landscape
The AI-moderated interview space has matured rapidly since 2024. Here are the leading platforms in 2026.
| Platform | Focus | Notable strengths |
|---|---|---|
| Outset.ai | Research-focused AI moderation | Most established research-vertical platform; strong probing logic |
| Strella | Customer research with AI moderation | Voice-first; integrated analysis |
| Marvin AI (heymarvin) | AI-moderated interviewer at scale | ”Scale interviews 1000x” positioning; integrated with research repository |
| Maze conversational AI | Usability testing with AI moderation | Built into broader Maze platform |
| Qualtrics XM conversational | Enterprise survey augmentation | Conversational AI layer on traditional XM |
| GroupSolver | AI-moderated chat surveys | Chat format; structured + open output |
| CleverX dialogue AI | AI-moderated interviews with verified panel | Combines AI moderation with verified participant panel for B2B research |
| Anthropic Interviewer | Anthropic’s research interview product | Built on Claude; experimental and research-grade |
What to evaluate when choosing a platform
1. Probing quality. Run a pilot interview yourself. Does the AI ask intelligent follow-ups, or does it just deliver scripted questions?
2. Output quality. How accurate is the transcription? Are the auto-generated themes useful or generic?
3. Compliance and security. Does the vendor sign BAAs for HIPAA work? Where is data stored? What’s the retention policy?
4. Format support. Does the platform support voice, text, and async formats? Can participants choose?
5. Integration with your research stack. Does it export to Dovetail, Marvin, your CRM, or wherever you store research?
6. Pricing model. Per-session, subscription, or enterprise? Match to your study volume.
7. Verified participant panels. Some platforms include access to verified participant panels, reducing the need for separate recruitment.
Prompt design best practices
The single biggest determinant of AI moderation quality is prompt design. These practices distinguish good prompts from bad.
1. Specify probing depth explicitly
Don’t assume the AI will probe well by default. Tell it:
For each main question, ask 2-3 follow-up questions to deepen the response. Probe for:
- Specific examples ("Can you tell me about a time when...")
- Emotional reactions ("How did that make you feel?")
- Workarounds and alternatives ("What did you do instead?")
- Causation ("Why do you think that happens?")
Move on when you've captured a clear answer or the participant indicates they have nothing more to add.
2. Constrain leading behavior
LLMs naturally drift toward agreement and positive framing. Counteract this:
Important constraints:
- Do NOT express opinions on the participant's responses
- Do NOT validate or invalidate their experiences with words like "great" or "interesting"
- Do NOT lead the participant toward specific answers
- Treat skepticism, criticism, and frustration as valuable data, not problems to solve
- If a participant disagrees with the product or feature, follow that thread; don't deflect
3. Define when to move on
Without explicit guidance, AI moderators may probe too long or move on too quickly. Specify:
Move to the next question when:
- The participant has given a substantive answer with at least one specific example
- The participant indicates they have nothing more to add
- The conversation is going in circles
- 3-4 follow-ups have been asked on the current question
Stay on the current question when:
- The answer is vague or generic
- A specific phrase suggests an interesting thread to pursue
- The participant mentions something contradictory to a prior answer
4. Handle edge cases
Tell the AI how to handle:
- Off-topic responses: Gently redirect to the research question
- Refusal to answer: Accept gracefully and move on
- Confusion: Clarify the question without leading
- Emotional distress: Acknowledge, offer to skip, and provide resources if appropriate
- Profanity or inappropriate content: Acknowledge and continue, or end the session per policy
5. Audit prompt outputs
For regulated work, log every prompt and response. Review periodically to catch drift, bias, or compliance issues. The AI may behave differently across model updates, audiences, or topics.
Validation and quality assurance
A few sessions of bad data can poison an entire study. These practices catch problems early.
1. Pilot with team members first
Run 3 to 5 pilot sessions with team members or friendly testers before launching to real participants. Walk through the AI moderator interaction yourself. Does it feel natural? Does it probe meaningfully?
2. Manual review of early sessions
For the first 10 to 20 real sessions, manually review every transcript. Look for:
- AI asking irrelevant follow-ups
- AI missing obvious threads
- AI leading participants
- AI failing to handle unusual responses
- Participant confusion or frustration
3. Compare AI tagging to manual review
The AI’s automatic theme tagging is convenient but imperfect. Spot-check by manually coding a sample of sessions and comparing to the AI’s tags. Significant disagreement signals a need for prompt refinement or manual coding.
4. Run parallel human-moderated sessions
For high-stakes studies, run a small sample (5 to 10) human-moderated sessions in parallel. Compare findings to validate that AI moderation captured the same insights. If significant differences emerge, the AI moderation is missing something important.
5. Track participant feedback
Many platforms allow a brief participant feedback survey at the end of each session. Look for signals about whether the AI felt natural, whether participants felt heard, and whether they would participate again.
Common mistakes
Mistake 1: Treating AI moderation as a free upgrade. AI moderation is a different methodology, not a faster version of human moderation. It excels at different things and fails at different things. Plan studies to play to AI’s strengths.
Mistake 2: Skipping the pilot. Launching a 200-session study without piloting is a fast way to collect 200 sessions of bad data. Pilot 5 to 10 sessions first, every time.
Mistake 3: Over-scripting the AI. AI moderators work better with high-level guidance and probing instructions than with rigid scripts. Trust the AI to follow up; don’t try to script every possible response.
Mistake 4: Under-constraining the AI. Without explicit constraints, AI moderators drift toward sycophancy, leading questions, and excessive agreement. Constrain explicitly.
Mistake 5: Ignoring privacy and compliance. AI moderation tools handle real participant data. They need the same privacy and compliance treatment as any research tool. See the user research compliance checklist for industry-specific requirements.
Mistake 6: Not validating against human moderation. For new use cases, run a parallel human-moderated study to validate that AI is capturing the same insights. Skipping validation is how teams discover they have been collecting bad data after the fact.
Mistake 7: Using AI moderation for sensitive topics. Mental health, financial distress, trauma, and similar sensitive contexts need human moderators with trauma-informed research training. AI is inappropriate for these contexts.
When AI-moderated interviews are the right tool
Use AI moderation when:
- The research question is structured and clear
- You need to scale beyond what human moderators can support
- Consistency across sessions matters more than nuance in any single session
- The audience is comfortable with AI interaction
- Speed and cost are significant constraints
- You can validate critical findings with smaller-sample human research
Avoid AI moderation when:
- The topic is sensitive (health, mental health, trauma, financial vulnerability)
- The research is exploratory and the direction may shift unpredictably
- The audience is unfamiliar with or uncomfortable with AI
- Stakeholders require human researcher credibility
- The sample is small enough that human moderation is feasible
- Body language and emotional cues are central data
For broader context on AI in research, see the guides on synthetic respondents, synthetic personas creation, synthetic vs real participants, and AI in user research. AI-moderated interviews are one of the most operationally mature applications of AI in research today, but they remain a complement to human-moderated work, not a replacement for it.