How to run AI-moderated interviews: a complete guide for research teams

A complete guide to running AI-moderated user interviews. Covers setup methodology, conversational AI vs static surveys, AI vs human moderator comparison, vendor landscape, prompt design, and validation techniques.

How to run AI-moderated interviews: a complete guide for research teams

AI-moderated interviews use conversational AI to autonomously conduct user research sessions, dynamically probing responses and adapting in real time. Unlike static survey forms, they sustain natural dialogue with participants, generating responses 2.5 to 8 times longer and richer than traditional questionnaires. Unlike human-moderated interviews, they scale to hundreds of sessions in hours instead of weeks. This guide covers how AI-moderated interviews work, when to use them versus human moderation or static surveys, the step-by-step setup methodology, prompt design best practices, vendor landscape, and validation techniques.

Frequently asked questions

What is an AI-moderated interview?

An AI-moderated interview is a research session where a conversational AI conducts the interview with a real human participant, asking questions, listening to responses, and dynamically following up based on what the participant says. Unlike static surveys (which deliver pre-scripted questions in a fixed order with no adaptation) and unlike fully human-moderated interviews (which require a human researcher to conduct each session), AI-moderated interviews combine the scalability of automation with the dynamic probing of conversational research. They are most useful for high-volume validation, survey augmentation, and research where consistency across sessions matters more than human rapport.

How are AI-moderated interviews different from static surveys?

Static surveys deliver fixed questions in a fixed order, with no adaptation to participant responses. Participants click through forms, often satisficing on long surveys and abandoning when fatigued. AI-moderated interviews instead conduct dynamic dialogues: the AI listens to each response, asks follow-up questions based on what was said, probes for deeper insight, and adapts the conversation in real time. Research from conversational AI vendors shows that participants give responses 2.5 to 8 times longer in conversational AI interviews than in equivalent static surveys, with higher completion rates and richer qualitative depth.

How are AI-moderated interviews different from human-moderated interviews?

AI-moderated interviews are faster, cheaper, and more scalable than human moderation. They run hundreds of sessions in hours instead of weeks, cost 70 to 90% less per session, and deliver consistent question framing across every interview. Human-moderated interviews retain the advantage of empathy, body language reading, and creative pivots in unexpected directions. The difference is consistency vs nuance: AI moderation excels at structured probing at scale, while humans excel at rapport, sensitive topics, and exploratory research where the path is unpredictable.

When should I use AI-moderated interviews?

Use AI-moderated interviews for high-volume validation studies, frequent product testing where consistency matters, survey augmentation where you need richer responses than static forms allow, and exploratory research with broad audiences where speed is critical. Avoid AI moderation for sensitive topics requiring empathy (health, mental health, trauma-informed research), highly exploratory research where unexpected pivots matter, and small-sample studies where the human moderator advantage outweighs speed gains. The synthetic respondents vs real participants comparison covers the broader decision framework for AI-assisted research.

How accurate are AI-moderated interviews compared to human-moderated?

AI-moderated interviews produce comparable quantitative results to human-moderated interviews on structured questions, with better consistency due to identical question framing across sessions. On qualitative depth, AI moderators capture more verbatim content per participant (longer, more thoughtful responses) but miss the emotional nuance and unexpected directions that skilled human moderators surface. The Nielsen Norman Group’s analysis of AI moderation found that AI works well for evaluative research with clear questions and falls short for open-ended discovery research that depends on moderator judgment.

What tools do I need to run AI-moderated interviews?

You need three things: a conversational AI platform that can conduct interviews (Outset.ai, Strella, Marvin AI, GroupSolver, Maze conversational AI, Qualtrics XM conversational, CleverX dialogue AI), a participant recruitment source (your customer panel, a recruitment platform, or in-product intercepts), and an analysis tool that handles the output (Dovetail, Marvin, or your existing research repository). Most AI-moderated interview platforms include transcription, theme tagging, and basic analysis built in, reducing the need for separate analysis tools for early-stage work.

How AI-moderated interviews work

AI-moderated interviews are technically simple but operationally sophisticated. Understanding the architecture helps you evaluate platforms and design effective studies.

The four core components

1. Conversational AI moderator. A large language model (typically GPT-4, Claude, Gemini, or similar) is configured with the role of a research interviewer. The model receives a discussion guide, understands the research objectives, and conducts the interview with participants in real time.

2. Participant interface. Participants interact with the AI through a chat interface (text), voice interface (real-time audio), or asynchronous voice (record-and-respond). Most platforms support multiple formats and let participants choose.

3. Probing logic. The AI is configured with instructions for when and how to follow up: probe for specific examples, ask “tell me more about that,” request clarification on ambiguous answers, and recognize when a topic is exhausted. This logic is what distinguishes AI moderation from static survey delivery.

4. Output processing. Sessions are recorded, transcribed, and tagged automatically. The AI may also generate session summaries, theme classifications, and sentiment scores. Output flows to research repositories like Dovetail, Marvin, or directly into reports.

The session lifecycle

A single AI-moderated interview session typically runs:

  1. Welcome and consent: AI introduces itself, explains the study, and captures consent
  2. Context-setting questions: AI asks background questions to establish participant context
  3. Core research questions: AI works through the discussion guide, probing as needed
  4. Adaptive follow-ups: AI follows interesting threads based on participant responses
  5. Wrap-up and thanks: AI summarizes, asks any closing questions, and ends the session

A typical session runs 10 to 20 minutes, shorter than a human-moderated equivalent but with more concentrated content because there is no small talk, scheduling overhead, or moderator pacing inefficiency.

Step-by-step methodology for running AI-moderated interviews

Step 1: Set goals and draft a discussion guide

Define what you want to learn from the interviews. AI moderation works best when objectives are specific and the discussion guide is tightly scoped.

Good objective: “Test usability of the new patient handoff feature in our telehealth app and identify the top 3 friction points for nurses on shift change.”

Bad objective: “Understand nurses better.”

Draft 6 to 10 core questions. AI moderators can handle longer discussion guides, but participant fatigue and session quality drop after question 10. Each question should:

  • Be open-ended (not yes/no)
  • Have a clear research purpose
  • Allow for follow-up probing
  • Avoid leading language

The AI will generate dynamic follow-ups based on participant responses, so you don’t need to script every possible probe. Provide guidance on what to probe for (specific examples, emotional reactions, workarounds) rather than scripting every follow-up.

Step 2: Choose the format

FormatBest forTrade-offs
Real-time voice chatHighest engagement, captures emotion, faster sessionsParticipants must be available in real time
Asynchronous voice (record-and-respond)Time-zone flexibility, participants think between responsesLess natural flow, longer to complete
Text chatEasiest to scale, lowest tech requirements, participant comfortLess emotional nuance, potentially shorter responses

Voice formats produce richer data; text formats scale more easily. Most teams start with text and move to voice for studies where emotion matters.

Step 3: Configure the AI moderator

Configure the AI with:

  • Role and tone: “You are a friendly, neutral research interviewer focused on understanding how nurses experience patient handoffs.”
  • Discussion guide: The 6-10 core questions with brief context
  • Probing instructions: When to ask for examples, when to follow up on emotion, when to move on
  • Constraints: What NOT to do (avoid leading questions, don’t share opinions, don’t agree or disagree with participants)
  • Compliance: For regulated work, restrict the AI from collecting PHI in chat windows
  • Voice and subtitles: For voice formats, choose voice characteristics and enable subtitles for accessibility

Step 4: Recruit participants and launch

Recruit participants through your usual channels: existing customer panel, recruitment platforms, in-product intercepts, or external panels. AI-moderated interviews scale easily, so you can recruit larger samples than human moderation supports.

For regulated industries, ensure:

  • Recruitment complies with relevant regulations (HIPAA, COPPA, GDPR)
  • AI vendor has appropriate Business Associate Agreements (BAAs) for healthcare
  • Consent forms cover AI moderation specifically
  • Data handling meets your retention and access requirements

Launch the study and let it run. AI moderators handle hundreds of sessions in parallel across time zones, completing studies that would take human moderators weeks in hours.

Step 5: Monitor and analyze

While the study runs, monitor early sessions for:

  • Question clarity: Are participants understanding what’s being asked?
  • AI behavior: Is the AI probing appropriately or asking irrelevant follow-ups?
  • Response quality: Are responses substantive or shallow?
  • Bias signals: Is the AI leading participants toward certain answers?

Pause and adjust if early sessions reveal problems. The advantage of AI moderation is that you can iterate the discussion guide between batches without losing time to reschedule moderators.

After data collection:

  • Review automated transcription for accuracy
  • Review AI-generated theme tags for accuracy
  • Look for outliers and unexpected findings (the AI may have missed these)
  • Export to your research repository for synthesis

Step 6: Iterate before scaling

Run a pilot of 5 to 10 sessions before scaling to full sample size. Use the pilot to:

  • Refine prompts for neutrality
  • Test edge cases (skeptical participants, unclear answers, off-topic responses)
  • Validate that the AI captures the depth you need
  • Adjust the discussion guide for clarity

The cost of piloting is low (10 sessions in an afternoon), and the cost of scaling a flawed study is high (hundreds of sessions with bad data).

AI-moderated vs human-moderated interviews

DimensionAI-moderatedHuman-moderated
Speed to resultsHours for 100+ sessionsDays to weeks for 10-20 sessions
Cost per session$5-$25 platform cost + incentive$100-$300/hour moderator + incentive
ConsistencyIdentical framing across all sessionsVaries by moderator, fatigue, day
Probing qualityGood for structured questions; mechanicalExcellent; reads context and emotion
Empathy and rapportLimited; functional politenessHigh; genuine human connection
Body language readingNoneStrong (in person or video)
Creative pivotsRare; sticks to guideCommon; pursues unexpected threads
ScalingTrivially scalable to thousandsBound by moderator capacity
Time-zone coverage24/7Bound by moderator schedule
Note-taking and transcriptionAutomaticRequires separate tools/effort
Sensitive topicsGenerally inappropriateRequired for trauma-informed work
Best forValidation, frequent testing, broad audiencesDiscovery, sensitive topics, complex contexts
Cost savings vs human70-90% cheaperBaseline
Quality on novel topicsWeak (no judgment)Strong (moderator adapts)
Consistency advantageMajor (no moderator drift)Major weakness (moderator variance)

When AI moderation wins

AI moderation is the better choice when:

  • You need to run many sessions quickly (weekly or biweekly testing cycles)
  • Consistency across sessions matters more than depth in any single session
  • The research question is well-defined and structured
  • You have a broad audience and want to maximize sample size
  • Cost is a significant constraint
  • You need 24/7 availability across time zones

When human moderation wins

Human moderation remains essential when:

  • The research involves sensitive topics (health, mental health, trauma, financial vulnerability)
  • The research is exploratory and the path may shift mid-session
  • Building rapport is critical for honest responses
  • Body language and emotional cues are important data
  • Stakeholder credibility requires human researcher involvement
  • The audience is unfamiliar with technology or uncomfortable with AI

Conversational AI vs static surveys

The most underappreciated comparison is between AI-moderated interviews and traditional static surveys. The differences are dramatic.

Why conversational AI outperforms static surveys

Response length: Participants give responses 2.5 to 8 times longer in conversational AI interviews compared to static survey free-text fields. The dialogue format encourages elaboration; the form format encourages brevity.

Completion rates: Conversational AI typically delivers higher completion rates for studies of equivalent length, because the dialogue feels less tedious than clicking through forms.

Depth of insight: Conversational AI can probe ambiguous answers in real time; static surveys cannot. The result is qualitative depth that static surveys cannot match.

Engagement quality: Participants treat conversational AI more like a real conversation, leading to less satisficing and more thoughtful answers.

Bias reduction: The conversational format reduces some forms of survey bias (response order effects, anchoring on multiple choice options) but introduces others (social desirability shifts, AI sycophancy if not configured carefully).

When static surveys remain better

Static surveys are still the right choice for:

  • Pure quantitative research: Likert scales, rating tasks, structured data collection where you want clean comparable data across thousands of respondents
  • Massive sample sizes: Surveys delivered to 10,000+ respondents where conversational AI cost would be prohibitive
  • Highly structured tasks: A/B testing of design variants where participant input is limited to a few clicks
  • Audiences uncomfortable with AI conversation: Some demographics prefer the predictability of forms

The hybrid approach

The most effective programs use both: static surveys for clean quantitative data on structured questions, and AI-moderated interviews for the qualitative depth that surveys cannot capture. Sending the same audience through both channels often produces complementary findings: the survey shows what people think, the conversation reveals why.

Vendor landscape

The AI-moderated interview space has matured rapidly since 2024. Here are the leading platforms in 2026.

PlatformFocusNotable strengths
Outset.aiResearch-focused AI moderationMost established research-vertical platform; strong probing logic
StrellaCustomer research with AI moderationVoice-first; integrated analysis
Marvin AI (heymarvin)AI-moderated interviewer at scale”Scale interviews 1000x” positioning; integrated with research repository
Maze conversational AIUsability testing with AI moderationBuilt into broader Maze platform
Qualtrics XM conversationalEnterprise survey augmentationConversational AI layer on traditional XM
GroupSolverAI-moderated chat surveysChat format; structured + open output
CleverX dialogue AIAI-moderated interviews with verified panelCombines AI moderation with verified participant panel for B2B research
Anthropic InterviewerAnthropic’s research interview productBuilt on Claude; experimental and research-grade

What to evaluate when choosing a platform

1. Probing quality. Run a pilot interview yourself. Does the AI ask intelligent follow-ups, or does it just deliver scripted questions?

2. Output quality. How accurate is the transcription? Are the auto-generated themes useful or generic?

3. Compliance and security. Does the vendor sign BAAs for HIPAA work? Where is data stored? What’s the retention policy?

4. Format support. Does the platform support voice, text, and async formats? Can participants choose?

5. Integration with your research stack. Does it export to Dovetail, Marvin, your CRM, or wherever you store research?

6. Pricing model. Per-session, subscription, or enterprise? Match to your study volume.

7. Verified participant panels. Some platforms include access to verified participant panels, reducing the need for separate recruitment.

Prompt design best practices

The single biggest determinant of AI moderation quality is prompt design. These practices distinguish good prompts from bad.

1. Specify probing depth explicitly

Don’t assume the AI will probe well by default. Tell it:

For each main question, ask 2-3 follow-up questions to deepen the response. Probe for:
- Specific examples ("Can you tell me about a time when...")
- Emotional reactions ("How did that make you feel?")
- Workarounds and alternatives ("What did you do instead?")
- Causation ("Why do you think that happens?")
Move on when you've captured a clear answer or the participant indicates they have nothing more to add.

2. Constrain leading behavior

LLMs naturally drift toward agreement and positive framing. Counteract this:

Important constraints:
- Do NOT express opinions on the participant's responses
- Do NOT validate or invalidate their experiences with words like "great" or "interesting"
- Do NOT lead the participant toward specific answers
- Treat skepticism, criticism, and frustration as valuable data, not problems to solve
- If a participant disagrees with the product or feature, follow that thread; don't deflect

3. Define when to move on

Without explicit guidance, AI moderators may probe too long or move on too quickly. Specify:

Move to the next question when:
- The participant has given a substantive answer with at least one specific example
- The participant indicates they have nothing more to add
- The conversation is going in circles
- 3-4 follow-ups have been asked on the current question

Stay on the current question when:
- The answer is vague or generic
- A specific phrase suggests an interesting thread to pursue
- The participant mentions something contradictory to a prior answer

4. Handle edge cases

Tell the AI how to handle:

  • Off-topic responses: Gently redirect to the research question
  • Refusal to answer: Accept gracefully and move on
  • Confusion: Clarify the question without leading
  • Emotional distress: Acknowledge, offer to skip, and provide resources if appropriate
  • Profanity or inappropriate content: Acknowledge and continue, or end the session per policy

5. Audit prompt outputs

For regulated work, log every prompt and response. Review periodically to catch drift, bias, or compliance issues. The AI may behave differently across model updates, audiences, or topics.

Validation and quality assurance

A few sessions of bad data can poison an entire study. These practices catch problems early.

1. Pilot with team members first

Run 3 to 5 pilot sessions with team members or friendly testers before launching to real participants. Walk through the AI moderator interaction yourself. Does it feel natural? Does it probe meaningfully?

2. Manual review of early sessions

For the first 10 to 20 real sessions, manually review every transcript. Look for:

  • AI asking irrelevant follow-ups
  • AI missing obvious threads
  • AI leading participants
  • AI failing to handle unusual responses
  • Participant confusion or frustration

3. Compare AI tagging to manual review

The AI’s automatic theme tagging is convenient but imperfect. Spot-check by manually coding a sample of sessions and comparing to the AI’s tags. Significant disagreement signals a need for prompt refinement or manual coding.

4. Run parallel human-moderated sessions

For high-stakes studies, run a small sample (5 to 10) human-moderated sessions in parallel. Compare findings to validate that AI moderation captured the same insights. If significant differences emerge, the AI moderation is missing something important.

5. Track participant feedback

Many platforms allow a brief participant feedback survey at the end of each session. Look for signals about whether the AI felt natural, whether participants felt heard, and whether they would participate again.

Common mistakes

Mistake 1: Treating AI moderation as a free upgrade. AI moderation is a different methodology, not a faster version of human moderation. It excels at different things and fails at different things. Plan studies to play to AI’s strengths.

Mistake 2: Skipping the pilot. Launching a 200-session study without piloting is a fast way to collect 200 sessions of bad data. Pilot 5 to 10 sessions first, every time.

Mistake 3: Over-scripting the AI. AI moderators work better with high-level guidance and probing instructions than with rigid scripts. Trust the AI to follow up; don’t try to script every possible response.

Mistake 4: Under-constraining the AI. Without explicit constraints, AI moderators drift toward sycophancy, leading questions, and excessive agreement. Constrain explicitly.

Mistake 5: Ignoring privacy and compliance. AI moderation tools handle real participant data. They need the same privacy and compliance treatment as any research tool. See the user research compliance checklist for industry-specific requirements.

Mistake 6: Not validating against human moderation. For new use cases, run a parallel human-moderated study to validate that AI is capturing the same insights. Skipping validation is how teams discover they have been collecting bad data after the fact.

Mistake 7: Using AI moderation for sensitive topics. Mental health, financial distress, trauma, and similar sensitive contexts need human moderators with trauma-informed research training. AI is inappropriate for these contexts.

When AI-moderated interviews are the right tool

Use AI moderation when:

  • The research question is structured and clear
  • You need to scale beyond what human moderators can support
  • Consistency across sessions matters more than nuance in any single session
  • The audience is comfortable with AI interaction
  • Speed and cost are significant constraints
  • You can validate critical findings with smaller-sample human research

Avoid AI moderation when:

  • The topic is sensitive (health, mental health, trauma, financial vulnerability)
  • The research is exploratory and the direction may shift unpredictably
  • The audience is unfamiliar with or uncomfortable with AI
  • Stakeholders require human researcher credibility
  • The sample is small enough that human moderation is feasible
  • Body language and emotional cues are central data

For broader context on AI in research, see the guides on synthetic respondents, synthetic personas creation, synthetic vs real participants, and AI in user research. AI-moderated interviews are one of the most operationally mature applications of AI in research today, but they remain a complement to human-moderated work, not a replacement for it.