North Star in Learning Analytics: 12 Metrics and a Decision Guide for L&D (Beyond Completion Rate)

A training may not be good just because it’s “90% completed”; another may not be bad just because it’s “40%”—because when a metric loses its context, it becomes just a number. In corporate learning, this is the most common blind spot I see: reports get squeezed into three numbers, and then everyone fights around those three numbers.

I find something interesting about people: the same manager can say, in the same week, “if completion is low, the training failed,” and the next day say, “if nobody watches it, let’s shorten the duration.” The first is an outcome metric; the second is a design decision. Both can be true—but not on the same dashboard, in the same sentence.

In this article, I’ll group metrics into 4 layers: operations, engagement/experience, evidence of learning, business impact. Then I’ll connect 12 metrics one by one to “which decision does this support?” Because the North Star in learning analytics isn’t a single metric; it’s decision quality.

“Not everything that can be counted counts, and not everything that counts can be counted.” [William Bruce Cameron, 1963]

1) Why completion rate is misleading on its own

Completion rate is the easiest thing to measure—and also the easiest to misinterpret.

In mandatory HSE/GDPR training, high completion is often not “learning” but the success of the tracking mechanism.
In dynamic teams like sales, low completion sometimes doesn’t mean “lack of interest” but operational friction (bad timing, long module, poor device compatibility).
If a course has 95% completion + low scores, you get a sad-shaped picture: “engagement exists but learning doesn’t.”

For me, completion rate is only meaningful together with these questions:

Who completed? (segment)
How long did it take to complete? (speed/delay)
Where did they struggle? (click/answer/time traces)
What happened next? (behavior/performance)

So I’m not throwing “completion” away. I’m just placing it inside a bigger decision set.

2) The four-layer metric model: Operations → Experience → Evidence → Impact

A training program is four things at once: an operation, an experience, a learning claim, and (hopefully) a business outcome.

I think of the table below as a “dashboard architecture”: each layer feeds the one above it, but it does not prove the upper layer by itself.

Layer	What does it measure?	Typical question	Risk of misuse
Operations	Process flow and tracking	“Who is late, where are they stuck?”	Blaming people for being “late”
Engagement/Experience	Behavior and friction	“Where do they drop off, why don’t they return?”	Mistaking entertainment for learning
Evidence of learning	Knowledge/decision quality	“Did they actually understand?”	Teaching to the test
Business impact	Performance/KPI linkage	“What did this training change?”	Treating correlation as causation

What I like about this model is this: L&D’s day-to-day operational decisions (reminders, flow, content revisions) and executive questions (investment, risk, performance) can be discussed in the same frame.

3) 12 metrics: Definition + what decision does it enable?

Read the 12 metrics below not as “one list,” but as a decision guide. For each metric: what it measures, how to interpret it, what action it connects to.

A) Operations layer (1–4)

1) Delay (deadline slip / overdue rate)

Definition: The share of people who complete after the deadline, or the average number of days overdue.
Decision: Reminder timing, escalation, workload conflicts.
Tip: In compliance trainings like HSE/GDPR, this metric is a “risk radar.” If delay rises, it’s often not the content—it’s the calendar.

2) Time-to-competency

Definition: Time until reaching the target level for a role (e.g., passing a specific assessment threshold).
Decision: Onboarding design, role-based journey length, prerequisites.
Caution: Misreading this as “faster is better” is wrong. Some competencies should be learned slowly (especially in risky operations).

3) Journey step drop-off rate (step drop-off)

Definition: In a multi-step program, where participants are lost.
Decision: Which step should be redesigned? Which step needs preparation placed before it?
Interpretation: Drop-off alone doesn’t mean “bad step”; sometimes that step serves as a natural filter (gate).

4) At-risk courses / at-risk participants (operational risk flag)

Definition: A population that is in progress but far from completion, close to overdue, signaling problems.
Decision: Who to intervene with, and which course to intervene on.
Note: This needs a systematic approach rather than “one-by-one chasing,” otherwise L&D turns into a call center.

B) Engagement / experience layer (5–7)

5) Content friction (content friction index – practical definition)

Definition: Traces showing users are struggling unnecessarily in a module: excessive time, rewatching, getting stuck on a specific screen, multi-click loops.
Decision: Not shortening content; most often restructuring (add examples, clarify, change step order).
What’s interesting: People sometimes like “hard” content; they don’t like “blurry” content. Friction is not the same as difficulty.

6) Rewatch rate (rewatch / retry rate)

Definition: The rate of rewatching/retrying the same section.
Decision: Is there a need for reinforcement, or is it unclear?
Interpretation: High rewatch + high success = reinforcement. High rewatch + low success = design problem.

7) Active learners rate (active learners)

Definition: The share of users who actually perform learning activity on the platform in a given period.
Decision: Campaign design, communication channel, timing, motivation mechanisms.
Caution: Being “active” doesn’t mean “learned”; but if they’re not active, you can’t even make a learning claim.

C) Evidence of learning layer (8–10)

8) Gate success rate (checkpoint / gate pass rate)

Definition: The share of people who pass the success threshold at checkpoints.
Decision: Is the threshold right, is the content sufficient, which subtopic is collapsing?
Fine-tuning: If gates are too easy, they create false confidence; too hard, and the system feels like a “punishment machine.”

9) First-attempt accuracy (first-attempt accuracy)

Definition: Success on the first attempt at questions/decision points.
Decision: Is this true knowledge level, or guessing?
Interpretation: If first-attempt accuracy is low but rises after retries, the training may be “teaching.” The reverse—high first attempt, then decline—can sometimes be a question quality issue.

10) Forgetting signal (spaced decay proxy)

Definition: Performance dropping on the same concept as time passes (via re-measurement).
Decision: Reinforcement interval, periodic refreshers, micro-repetition.
Science note: The forgetting curve idea says memory weakens over time (Ebbinghaus, 1885). Organizations act like they know this, but don’t build calendars around it—which is a small contradiction.

D) Business impact layer (11–12)

11) Relationship with performance indicators (KPI correlation, segment-based)

Definition: Co-movement between training metrics and business metrics.
Decision: Which programs “speak the language of the business”?
Warning: Correlation is not causation. I’ll also discuss this, because most mistakes happen here.

12) Compliance risk indicator (compliance risk posture)

Definition: In mandatory trainings like HSE/GDPR: delay + non-completion + breaks in renewal cycles.
Decision: Audit readiness, manager visibility, periodic planning.
Clarity: In compliance training, the goal is sometimes not “learning” but a provable process. That’s not a bad thing; it’s just a different purpose.

4) Segmentation: escaping the “average” trap

The average is the most dangerous fairy tale in corporate life. Because it tells a story where everyone is a bit good and a bit bad—while in real life there are usually two separate worlds.

I insist on segmentation along these cuts:

Role
Location / branch / region
Seniority (new–mid–senior)
Team / manager
Period (campaign wave, quarter, season)

An example pattern (hypothetical but very familiar):

Average completion: 70%
Segments:
- New hires: 92%
- Seniors: 41%

In that case, saying “the content is bad” is premature. Maybe seniors start with “I already know this,” then the content wastes their time. Or the opposite: the content is clear for new hires, but “missing details” and irritating for seniors.

Without segmentation, you don’t optimize content design—you optimize the ghost of the average.

5) Causality warnings: Correlation, pilots, and A/B trials

When I reach the business impact layer, an automatic brake kicks in. Because training data is intertwined with human behavior; and human behavior is like Borges’ labyrinths: when you enter the same door twice, you don’t end up in the same corridor. (I don’t find this analogy “perfect”; in a labyrinth the corridor is fixed, in humans it isn’t. But the analogy still works.)

I see these three mistakes a lot:

“People who took the training perform better → the training worked.”
Maybe high performers were simply completing the training faster anyway.
“Scores went up → behavior changed in the field.”
Improving on a test is not the same as improving at work.
“There’s a drop in one region → the content is bad.”
Maybe shift schedules changed there, device access dropped, or the manager changed.

A more robust approach:

Controlled pilot: Roll out in Unit A, hold back a similar Unit B for a short time; observe the difference.
A/B trial: Same goal, two different content/flows; which design produces better “evidence”?
Before-after + segment: Don’t put everyone in the same bag.

These methods aren’t for “academic rigor”; they’re necessary because the cost of a wrong decision is high.

6) Analytics automation in Nextrain: write the question, get closer to insight

My job is to turn data from “something waiting on a dashboard” into something that approaches a decision.

In Nextrain, I do this with three practical behaviors:

Natural-language querying: Without setting filters, you ask the question as a sentence. For example, when you ask “Who are the employees in the Istanbul branch who haven’t completed their training?”, I present the result clearly—and you can save and reuse that query.
Course health view: Instead of digging through reports one by one, you see whether trainings are problematic on a color-coded health map; then you drill down.
Deepening with breakdowns: In course analysis and participant lists, you break down by corporate fields like branch/region/department and “split” the average.

Here, I hear the same sentence that Saadet hears most often in the field: “I want the report, but my real problem isn’t the report; tomorrow morning my manager will ask ‘what are we doing?’” Saadet’s job is to calm that question; my job is to tie that question to data. Both happen on the same day, at the same customer—sometimes five minutes apart.

A short note on GDPR: when I produce analytics, I don’t see personal data by name; I work with behavioral patterns. This keeps the line between “decisioning with data” and “surveillance with data” clearer—at least architecturally.

7) Quick decision guide: Which metric, which action?

I wrote this section so you can open it before a meeting. Matching “what’s the problem?” → “which metric?” → “which action?”

If the problem is "not completing":
  - Delay + drop-off + content friction + active learners rate
  - Action: timing/reminders, simplify steps, restructure modules

If the problem is "completing but not learning":
  - Gate success rate + first-attempt accuracy + rewatch rate
  - Action: add examples/feedback, adjust gate threshold, build branching based on mistakes

If the problem is "learning but not translating to work":
  - KPI relationship (segment-based) + controlled pilot/A-B
  - Action: clarify target behavior, design field transfer, tie measurement to the workflow

If the problem is "audit risk":
  - Compliance risk indicator + delay + breaks in periodic renewals
  - Action: renewal calendar, manager visibility, intervene with the critical population

The North Star here is: not “looking good” on a single metric, but connecting metrics to a decision chain. Completion rate is only one link in that chain.

Notes

Hermann Ebbinghaus, Über das Gedächtnis (1885) — early experimental memory studies on the forgetting curve and the effect of repetition.
William Bruce Cameron, Informal Sociology: A Casual Introduction to Sociological Thinking (1963) — a frequently quoted line on measurement and meaning.