Study Design Guide

How to compose effective study scripts for UserTold.ai. Covers mode selection, segment sequencing, field-by-field guidance, anti-patterns, annotated examples, and the full SegmentV2 field reference.


Quick Schema Reference

Before writing a script, make sure you have the required fields for each segment mode.

FieldRequiredTypeNotes
versionyes2Must be exactly 2
goalsyesarray[{ id, description }] objects
segmentsyesarraySegment objects
segments[].idyesstringUnique within script
segments[].modeyesstringtalk | speak | observe
segments[].titleyesstringDisplay label
segments[].speak_textyes for speakstringSpoken text delivered by AI
segments[].talkrecommended for talkobject{ system_prompt?, goals? }
segments[].instructionrequired for observestringTask instruction shown to participant
segments[].conductor_contextrequired for observestringAI-only context for interpretation and later debrief

Missing speak_text on a speak segment or missing both instruction and conductor_context on an observe segment will fail validation. Production scripts only support deterministic advancement: max_duration_s, user Done / step_done, url:<substring>, action:<selector-or-pattern>, complete_segment for talk segments, and scripted speak completion.


Modes

Every segment runs in one of three modes. Picking the right mode is the single most important decision per segment.

talk — Conversational Interview

The AI conducts a natural voice conversation: asks questions, listens, follows up.

Use when you need:

  • Open-ended discovery (triggers, motivations, decision criteria)
  • Follow-up probing on specific answers
  • Rapport building at interview start/end
  • Debrief after observation

Key fields: talk.system_prompt, talk.goals

The system_prompt shapes the interviewer's personality, question style, and focus. Goals tell the AI which research objectives this segment should pursue.

speak — Scripted Transition

The AI delivers a scripted one-way transition message, then advances when the spoken line completes. No back-and-forth.

Use when you need:

  • Task instructions before an observe segment
  • Welcome/intro messages
  • Consent language or disclaimers
  • Transitions between study phases

Key fields: speak_text, speak_submode

Keep speak_text concise. Participants tune out after ~30 seconds of monologue. Speak mode is output-only: microphone VAD is not used to interrupt the assistant because it cannot reliably distinguish participant speech from assistant playback echo. If interruption is needed later, use an explicit participant control rather than automatic voice barge-in.

observe — Silent Observation

The AI watches the participant use your product. It stays quiet and preserves what the participant says and does.

Use when you need:

  • Usability testing (watch a task end-to-end)
  • Workflow observation (see how they actually work)
  • Any scenario where interruption would bias behavior

Key fields: instruction, conductor_context, max_duration_s, deterministic advance_when

The instruction is shown to the participant. The conductor_context is AI-only background knowledge that helps later interpretation and planned talk debriefs understand expected behavior and friction.


Segment Sequencing

Order matters. The right sequence produces richer data than any individual segment.

The Core Pattern: speak → observe → talk → speak/end

Most usability studies follow this arc:

  1. speak — Set up the task ("I'd like you to complete a purchase...")
  2. observe — Watch them do it (silent, no leading)
  3. talk — Debrief on what happened ("What were you thinking when you paused on the payment page?")
  4. speak/end — Thank the participant and close the interview

Why this order works:

  • Speak gives clear instructions without a conversation that might bias behavior
  • Observe captures natural behavior before you ask about it
  • Talk references concrete moments the participant just experienced
  • The closing speak segment ends cleanly without starting a new conversation

Talk-Only Pattern: talk → talk → talk → talk

Pure interview. Each segment narrows focus:

  1. Rapport & context — Establish the recent event
  2. Trigger & need — What happened, what they needed
  3. Friction & workarounds — Where things broke, what they did instead
  4. Wrap-up — Confirm understanding, capture the key moment

Exploration Pattern: talk → observe → talk

Start with context, then watch, then probe:

  1. Context — Understand their routine and tools
  2. Demo — Watch them do the thing
  3. Probe — Dig into what you observed

Principles

  • Never start with observe. Participants need context first — at minimum a speak segment with instructions.
  • Never end with observe. Always debrief. The richest insights come from asking "why did you do X?" after watching X happen.
  • Use speak for transitions, not conversations. If you need back-and-forth, use talk.
  • Limit observe segments to 5-7 minutes. Beyond that, participants lose focus and data quality drops. Use max_duration_s.

Field-by-Field Guidance

system_prompt (talk mode)

The system_prompt defines the interviewer's behavior for a talk segment. It's the most impactful field in the entire script.

Good system_prompt traits:

  • Specifies question style (one question at a time, concrete, no leading)
  • Names the evidence to pursue (behaviors, not opinions)
  • Sets boundaries (no solution selling, no roadmap talk)
  • Includes 2-3 example follow-up questions

Example:

Ask only about concrete recent behavior, not hypothetical futures.
For each friction point mentioned, ask:
- "What did you click or type next?"
- "What did you expect to happen?"
- "What happened instead?"
Do not move on until you capture behavior + consequence.

Avoid:

  • Generic prompts ("Ask good questions about the user experience")
  • Long preambles that dilute the core instruction
  • Contradictory rules ("Be concise" + "Always ask 3 follow-ups")

conductor_context (observe mode)

Background knowledge for interpretation and planned talk debriefs — NOT shown to the participant.

Good conductor_context traits:

  • Describes expected vs. stuck behavior
  • Mentions UI elements that are commonly missed
  • Provides domain-specific context

Example:

The "Submit" button is below the fold on mobile. Users frequently scroll past it.
Expected flow: fill form → scroll down → tap Submit → see confirmation.
If the user scrolls up and down repeatedly, they are likely stuck.

Avoid:

  • Empty string (wastes an opportunity to preserve useful interpretation context)
  • Participant-facing language (this is AI-only)

instruction (observe mode)

Shown to the participant. Tells them what to do.

Good instruction traits:

  • One clear task
  • Concrete start and end points
  • No hints about how to complete it

Example: "Complete a purchase of any item, from product page through to the confirmation screen."

Avoid:

  • Multiple tasks in one instruction
  • Hints: "Click the blue button to check out" (biases behavior)
  • Vague: "Use the product" (no clear success criteria)

advance_when

Tells the conductor when to auto-advance to the next segment. Production scripts only accept deterministic rules.

Supported rules:

url:https://example.com/confirmation
action:#submit-order

URL rules advance when the participant navigates to a URL containing the value. Action rules advance when a matching participant action is observed. Goals guide analysis and planned debriefs, not live advancement.

Tips:

  • Prefer URL-based or action-based rules when possible.
  • For talk segments, prefer the built-in complete_segment tool over advance_when.

goals (study-level and talk-level)

Study-level goals define what the entire study should learn. Talk-level goals (talk.goals) tell a specific segment which study goals to pursue.

Good goals:

  • Observable and specific: "Capture the exact trigger event and context"
  • Outcome-oriented: "Identify workarounds used when the primary flow fails"

Avoid:

  • Vague: "Understand the user" (understand what, specifically?)
  • Too many: 3-5 goals per study is ideal. More than 7 dilutes focus.
  • Duplicated: Don't repeat the same goal in every segment. Assign goals to the segments where they're most relevant.

max_duration_s

Safety valve for observe segments. Auto-advances after this many seconds.

Recommendations:

  • Observe segments: 300-420s (5-7 minutes)
  • Speak segments: rarely needed (they're short by nature)
  • Talk segments: rarely needed (conversation has natural endings)

skip_if

Natural language condition for skipping this segment entirely.

Example: "The participant already described their trigger event in the previous segment."


Common Anti-Patterns

1. All-talk studies with no observation

Problem: Five talk segments in a row. You're collecting opinions, not behavior. Fix: Add at least one observe segment. Watch them do the thing, then talk about it.

2. Missing conductor_context

Problem: Observe segment with empty conductor_context. The AI has no idea what "stuck" looks like for your specific task. Fix: Always describe expected behavior, common failure points, and what stuck looks like.

3. Vague goals

Problem: Goals like "Understand the user experience" or "Learn about needs." Fix: Make goals concrete and observable: "Capture the exact workaround used when export fails."

4. Starting with observe

Problem: First segment is observe. Participant has no idea what they're supposed to do. Fix: Always precede observe with speak (task instructions) or talk (context gathering).

5. No debrief after observe

Problem: Observe segment followed by interview end. You captured behavior but never asked why. Fix: Always follow observe with a talk debrief that references what you just watched.

6. Giant system_prompt

Problem: 500-word system_prompt that tries to cover every scenario. The AI loses focus. Fix: Keep prompts to 3-5 concrete rules. Use talk.goals to direct focus rather than long prompts.

7. Too many goals

Problem: 10 goals across 3 segments. None get adequate coverage. Fix: 3-5 goals per study. Each goal assigned to 1-2 segments where it's most relevant.

8. No max_duration_s on observe

Problem: Observe segment runs indefinitely. Participant wanders for 15 minutes. Fix: Set max_duration_s to 300-420 for observe segments. Use a deterministic advance_when rule for early completion.


Annotated Examples

Talk Interview

Why this works: Pure talk study is appropriate when you are asking about past behavior instead of observing current behavior. Each segment narrows the aperture from context to trigger to friction to alternatives.

{
  "version": 2,
  "defaults": {
    "system_prompt": "You are a product interviewer. Ask one concrete question at a time. Prefer behavior evidence over opinions."
  },
  "goals": [
    { "id": "g_trigger", "description": "Capture the exact trigger event and context" },
    { "id": "g_outcome", "description": "Capture desired outcome and success criteria" },
    { "id": "g_friction", "description": "Capture specific friction points during execution" },
    { "id": "g_workaround", "description": "Capture workarounds or alternate paths used" },
    { "id": "g_decision", "description": "Capture tradeoffs and stop/go decisions" }
  ],
  "segments": [
    {
      "id": "seg_rapport",
      "title": "Rapport & Context",
      "mode": "talk",
      "talk": {
        "system_prompt": "Set a 1-sentence boundary and ask for the last real time the participant did the target task end-to-end.",
        "goals": ["g_trigger"]
      }
    },
    {
      "id": "seg_trigger",
      "title": "Trigger & Job",
      "mode": "talk",
      "talk": {
        "system_prompt": "Ask: what happened right before they started, what they needed done, what success meant. Follow with one strict probe per answer.",
        "goals": ["g_trigger", "g_outcome"]
      }
    },
    {
      "id": "seg_friction",
      "title": "Friction & Workarounds",
      "mode": "talk",
      "talk": {
        "system_prompt": "For every friction mention, ask: what did you click next, what obstacle appeared, what did you do to keep going, what did it cost? Do not move on until you capture behavior + consequence.",
        "goals": ["g_friction", "g_workaround", "g_decision"]
      }
    },
    {
      "id": "seg_compare",
      "title": "Alternatives",
      "mode": "talk",
      "talk": {
        "system_prompt": "Ask only about alternatives already used, not hypothetical futures. Push for one concrete example per alternative.",
        "goals": ["g_outcome", "g_decision"]
      }
    },
    {
      "id": "seg_wrap",
      "title": "Wrap Up",
      "mode": "talk",
      "talk": {
        "system_prompt": "Summarize what you heard. Ask: is there one concrete moment that shows what matters most?",
        "goals": ["g_decision"]
      }
    }
  ]
}

Design notes:

  • defaults.system_prompt sets the baseline interviewer persona — individual segments override with specific focus areas.
  • Goals are distributed across segments. g_trigger is covered in rapport and trigger segments; g_decision spans friction, compare, and wrap-up.
  • Each segment's talk.system_prompt includes concrete example questions, not just topic descriptions.

Usability Test

Why this works: The speak → observe → talk → speak arc captures natural behavior (observe) with clear setup, focused reflection, and a scripted close. The opening speak segment prevents bias by delivering instructions without conversation.

{
  "version": 2,
  "goals": [
    { "id": "g_completion", "description": "Evaluate task completion and ease of use" },
    { "id": "g_friction", "description": "Identify points of confusion or friction" }
  ],
  "segments": [
    {
      "id": "seg_intro",
      "title": "Task Instructions",
      "mode": "speak",
      "speak_text": "Hi there — thanks for joining. I'll give you a task to complete. While you work, speak out loud about what you're doing and anything that feels confusing. There are no right or wrong answers.",
      "speak_submode": "speak_balanced"
    },
    {
      "id": "seg_task",
      "title": "Complete the Task",
      "mode": "observe",
      "instruction": "Complete a purchase from product page through to confirmation.",
      "conductor_context": "Expected flow: browse → add to cart → checkout → payment → confirmation. The payment form requires scrolling on mobile. Users who tap 'Back' from payment often cannot find their cart again.",
      "max_duration_s": 420,
      "advance_when": "url:https://example.com/confirmation"
    },
    {
      "id": "seg_debrief",
      "title": "Debrief",
      "mode": "talk",
      "talk": {
        "system_prompt": "Ask what was easy, what was confusing, and what outcome they expected. Reference specific moments you observed. Follow with one concrete improvement question.",
        "goals": ["g_completion", "g_friction"]
      }
    },
    {
      "id": "seg_thanks",
      "title": "Thanks",
      "mode": "speak",
      "speak_text": "Thanks for walking through that task and sharing your thoughts."
    }
  ]
}

Design notes:

  • conductor_context tells interpretation and debrief what friction can mean in this specific task.
  • advance_when uses URL matching for deterministic advancement when the task is done.
  • max_duration_s: 420 (7 minutes) prevents indefinite observation.
  • The debrief prompt says "reference specific moments" — the AI has the observation transcript and can ask about concrete behavior.

Exploration Study

Why this works: Starts with talk to understand context, moves to observe to see reality (not just what they say), then probes the gap between what they described and what you observed.

{
  "version": 2,
  "goals": [
    { "id": "g_workflow", "description": "Understand daily routines and workflows" },
    { "id": "g_needs", "description": "Discover unmet needs and workarounds" }
  ],
  "segments": [
    {
      "id": "seg_context",
      "title": "Context & Routine",
      "mode": "talk",
      "talk": {
        "system_prompt": "Explore the participant's daily routine, tools, habits. Ask about the last time they did the target task. Get concrete, recent examples — not general descriptions.",
        "goals": ["g_workflow"]
      }
    },
    {
      "id": "seg_demo",
      "title": "Show Current Workflow",
      "mode": "observe",
      "instruction": "Show how you currently do this task from start to finish.",
      "conductor_context": "We are watching their current workflow to identify friction and workarounds. Note any copy-paste between tools, manual steps that could be automated, or moments of hesitation.",
      "max_duration_s": 360
    },
    {
      "id": "seg_probe",
      "title": "Pain Points & Needs",
      "mode": "talk",
      "talk": {
        "system_prompt": "Dig into frustrations and workarounds observed in the demo. Ask: why do you do it that way? What breaks? What would you change? Look for unmet needs behind stated preferences.",
        "goals": ["g_needs"]
      }
    }
  ]
}

Design notes:

  • The talk → observe → talk pattern lets you compare what people say (context) with what they do (demo), then probe the differences.
  • conductor_context in the demo segment primes the AI to watch for specific signals (copy-paste, manual steps, hesitation).
  • The probe segment explicitly references the demo: "Dig into frustrations and workarounds observed."

SegmentV2 Field Reference

Required Fields

FieldTypeDescription
idstringUnique segment identifier (e.g. "seg_intro"). Used in deterministic advancement, skip conditions, and logging.
titlestringHuman-readable segment name. Shown in the dashboard and logs.
mode"talk" | "speak" | "observe"Interaction mode for this segment.

Talk Mode Fields

FieldTypeDefaultDescription
talk.system_promptstringInherits from defaults.system_promptInterviewer persona and question strategy for this segment.
talk.goalsstring[][]IDs of study goals this segment should pursue.
talk.toolsLLMTool[][]Custom tools available to the interviewer LLM.

Speak Mode Fields

FieldTypeDefaultDescription
speak_textstringText for the AI to speak. Required for speak segments.
speak_submode"speak_fast" | "speak_balanced" | "speak_rich""speak_balanced"Voice quality/speed tradeoff.
speak_interruptiblebooleanfalseDeprecated legacy field. Ignored by the runtime; speak segments are output-only.

Observe Mode Fields

FieldTypeDefaultDescription
instructionstringTask instruction shown to the participant.
conductor_contextstring""AI-only background knowledge for interpretation and planned debriefs.

Segment Flow Control

FieldTypeDefaultDescription
advance_whenstringDeterministic auto-advance rule: url:<substring> or action:<selector-or-pattern>.
skip_ifstringCondition to skip this segment entirely.
max_duration_snumberAuto-advance after this many seconds.

Host Actions

FieldTypeDefaultDescription
host_actions_on_enterHostAction[][]Scripted host-page actions executed when entering this segment: highlight, navigate, or scroll_to.
host_actions_on_exitHostAction[][]Scripted host-page actions executed when leaving this segment.

Host actions are deterministic segment-boundary steps. Scripted highlight and scroll_to actions use the selectors in their action payloads; do not rely on allowed_selectors as widget-runtime enforcement for those actions. Study-level allowed_origins allows conductor runtime requests from those origins; it does not authorize cross-origin host navigation. When allowed_origins is non-empty, include the embed origin that runs the widget. Scripted navigate actions should target the current host origin. Observation still stays silent; advancement uses max_duration_s, user Done / step_done, url:<substring>, action:<selector-or-pattern>, complete_segment for talk, or scripted speak completion.

StudyScriptV2 Top-Level Fields

FieldTypeDescription
version2Always 2.
segmentsSegmentV2[]Ordered list of segments.
goalsStudyGoal[]Study-level goals: { id: string, description: string }.
defaults.system_promptstringFallback system_prompt for talk segments that don't specify one.
defaults.voicestringDefault TTS voice.
defaults.speak_submodeSpeakSubmodeDefault speak quality/speed.
defaults.languagestringInterview language hint.

See also

  • Studies — create and manage studies
  • Quickstart — end-to-end setup including study creation