Competency framework evaluation

A score only counts when it's anchored
in the competency framework.

Every roleplay template declares which competencies each scenario tests. The AI scores exactly those criteria, no keyword guessing, no one-size-fits-all catalog imposed on you.

Session report

Medical Visit, Skeptical Cardiologist

Trainee: Marcela R. · Channel: Voice · 12 min

87

passed

Competencies locked when the session started

PROD-001

Product mastery

92

OBJ-003

Objection handling

78

COMP-014

Label compliance

95

Evaluated criteria

Arguments grounded in clinical evidence 92
Understanding of the HCP routine 85
Recovery after a strong objection 78
Closing with a clear next step 88
Label compliance RDC 658 (compliance blocker) 95

AI insights · Strengths

Anchored the pitch in the HCP's hypertensive patient profile by 1:15. Cited a phase-3 study when challenged on efficacy.

Areas to improve

At 4:32 the HCP asked about interaction with beta-blockers and the response was vague ("I'll check and get back to you"). Recommendation: targeted training on drug interactions.

Your company's competency framework

Every company has its own catalog of competencies and criteria. It starts from ready-made industry templates at onboarding, then it's fully editable: you add competencies specific to your business that don't exist in any catalog.

The AI scores. Code decides.

The AI owns scoring. The pass or fail rule is auditable code, including "compliance blockers" that fail the session even with a high score (for example, violating the label fails the session even with 95 overall).

Everything locked for audit

Criteria locked when the session starts. AI instructions pinned to a specific version. Transcript, audio and report stored with configurable retention. Audit comes ready.

From the framework to the report.

The entire chain is predictable and auditable.

01

Framework curation

The company's admin edits competencies, criteria and scenario contexts. Add, edit, deactivate, all versioned.

02

The template declares

In the step-by-step, the author picks which competencies each scenario tests. Each criterion's weight is configurable.

03

Roleplay locks the snapshot

When the session starts, the criteria are snapshotted on the roleplay. Even if the template is edited later, the session runs against that original version.

04

AI scores, code decides

The AI receives the transcript and instructions, returns a structured score per criterion, and the pass-or-fail rules are applied. The full result is persisted for audit.

Why not run several AIs in parallel

Several AIs together don't add up, they diverge.

We tried running four models in parallel and averaging the result. The problem: each model has a different systematic bias, and the average dilutes the signal from whichever one got it right.

Instead we use a single model picked for each function, with versioned instructions measured against the rubric. Predictable, investigable, and comparable across sessions.

Running several AIs in parallel

  • ✗ 4x the cost without 4x the confidence
  • ✗ Dilutes divergent bias
  • ✗ Hard to investigate a single score
  • ✗ Inconsistent comparison across sessions

One model per function

  • ✓ Cost controlled per call
  • ✓ Versioned and auditable instructions
  • ✓ Reproducible result
  • ✓ Consistent comparison across sessions

Ready to transform how your team trains?

For organizations with 50+ employees. Book 45 minutes and we'll think the setup through with you.