Framework evaluation

A score only counts if it's anchored
in the competency framework.

Every roleplay template declares which competencies each scenario tests. The AI scores exactly those criteria, no keyword heuristics, no platform-wide catalogue imposed on you.

Request Demo Back to overview

Session report

Medical Visit, Sceptical Cardiologist

Trainee: Marcela R. · Channel: Voice · 12 min

passed

Competencies pinned when the session starts

PROD-001

Product mastery

OBJ-003

Objection handling

COMP-014

Label compliance

Evaluated criteria (rubric)

Arguments grounded in clinical evidence 92

Understanding of the HCP routine 85

Recovery after a strong objection 78

Closing with a clear next step 88

Label compliance RDC 658 (compliance blocker) 95

AI insights · Strengths

Anchored the pitch in the HCP's hypertensive patient profile by 1:15. Cited a phase-3 study when challenged on efficacy.

Areas to improve

At 4:32 the HCP asked about interaction with beta-blockers and the response was vague ("I'll check and get back to you"). Recommendation: targeted training on drug interactions.

Your company's framework

Every company has its own catalogue of competencies and criteria. Cloned from the central catalogues by vertical at onboarding, then fully editable, you add competencies specific to your business that don't exist in any catalogue.

The AI scores. Code decides.

The AI owns scoring. The pass/fail rule is auditable code, including "compliance blockers" that fail the session even with a high score (e.g. violating the label → failed, even with 95 overall).

Frozen for audit

Criteria pinned when the session starts. Prompt pinned to a specific version. Transcript, audio and report stored with configurable retention. Audit comes out of the box.

From framework to report.

The entire chain is deterministic and auditable.

Framework curation

Tenant admin edits competencies, criteria, and scenario contexts. Add, edit, deactivate, everything is versioned.

Template declares

In the wizard, the author picks which competencies each scenario of the template tests. Each criterion's weight is configurable.

Roleplay freezes

At dispatch, the criteria are snapshotted on the roleplay. Even if the template is edited later, the session runs against the snapshot.

AI scores, code decides

Async job: builds the prompt + transcript, asks the AI for structured JSON, parses it, applies pass/fail rules, persists the full aggregate.

Framework-driven evaluation

Scoring starts with your company's competencies.

Each company defines its own competency framework: abilities, criteria, weights. Every evaluation always compares against that framework, not a generic rubric.

Every scenario declares which competencies it tests, with a versioned prompt vetted against the rubric. Reproducible, debuggable, comparable across sessions.

Off-the-shelf generic criteria