Framework evaluation

A score only counts if it's anchored
in the competency framework.

Every roleplay template declares which competencies each scenario tests. The AI scores exactly those criteria, no keyword heuristics, no platform-wide catalogue imposed on you.

Session report

Medical Visit, Sceptical Cardiologist

Trainee: Marcela R. · Channel: Voice · 12 min

87

passed

Competencies pinned when the session starts

PROD-001

Product mastery

92

OBJ-003

Objection handling

78

COMP-014

Label compliance

95

Evaluated criteria (rubric)

Arguments grounded in clinical evidence 92
Understanding of the HCP routine 85
Recovery after a strong objection 78
Closing with a clear next step 88
Label compliance RDC 658 (compliance blocker) 95

AI insights · Strengths

Anchored the pitch in the HCP's hypertensive patient profile by 1:15. Cited a phase-3 study when challenged on efficacy.

Areas to improve

At 4:32 the HCP asked about interaction with beta-blockers and the response was vague ("I'll check and get back to you"). Recommendation: targeted training on drug interactions.

Your company's framework

Every company has its own catalogue of competencies and criteria. Cloned from the central catalogues by vertical at onboarding, then fully editable, you add competencies specific to your business that don't exist in any catalogue.

The AI scores. Code decides.

The AI owns scoring. The pass/fail rule is auditable code, including "compliance blockers" that fail the session even with a high score (e.g. violating the label → failed, even with 95 overall).

Frozen for audit

Criteria pinned when the session starts. Prompt pinned to a specific version. Transcript, audio and report stored with configurable retention. Audit comes out of the box.

From framework to report.

The entire chain is deterministic and auditable.

01

Framework curation

Tenant admin edits competencies, criteria, and scenario contexts. Add, edit, deactivate, everything is versioned.

02

Template declares

In the wizard, the author picks which competencies each scenario of the template tests. Each criterion's weight is configurable.

03

Roleplay freezes

At dispatch, the criteria are snapshotted on the roleplay. Even if the template is edited later, the session runs against the snapshot.

04

AI scores, code decides

Async job: builds the prompt + transcript, asks the AI for structured JSON, parses it, applies pass/fail rules, persists the full aggregate.

Framework-driven evaluation

Scoring starts with your company's competencies.

Each company defines its own competency framework: abilities, criteria, weights. Every evaluation always compares against that framework, not a generic rubric.

Every scenario declares which competencies it tests, with a versioned prompt vetted against the rubric. Reproducible, debuggable, comparable across sessions.

Off-the-shelf generic criteria

  • ✗ Don't reflect your company's language
  • ✗ The same scoring across every industry
  • ✗ Can't adjust weights or add new criteria
  • ✗ Auditors don't know how the score was decided

Your company's competency framework

  • ✓ Uses your own abilities and vocabulary
  • ✓ Tunable weights by industry and role
  • ✓ Versioned and auditable prompt
  • ✓ Consistent comparison across sessions

Ready to transform how your team trains?

For organisations with 50+ employees. Book 45 minutes and we'll think the setup through with you.