Assessment Pipeline¶
The assessment pipeline is a LangGraph state machine that adaptively evaluates a candidate's knowledge across multiple topics and Bloom taxonomy levels. This page covers every node, routing decision, and data structure in the pipeline.
Source: backend/app/graph/pipeline.py, backend/app/graph/router.py, backend/app/graph/state.py
Pipeline Overview¶
The pipeline has 12 nodes organized into three phases:
graph TD
START([Start]) --> BA[build_agenda]
BA --> GQ[generate_question]
GQ --> AR[await_response]
AR --> ER[evaluate_response]
ER --> UG[update_graph]
UG -->|deeper| HD[handle_deeper]
UG -->|probe| PQ[probe_question]
UG -->|pivot| HP[handle_pivot]
UG -->|conclude| AG[analyze_gaps]
HD --> GQ
HP --> GQ
PQ --> APR[await_probe_response]
APR --> ER
AG --> EG[enrich_gaps]
EG --> GP[generate_plan]
GP --> END([End])
Phase 1: Agenda Building¶
The pipeline begins by building the topic agenda and generating the first assessment question directly.
Nodes¶
| Node | Agent | Description |
|---|---|---|
build_agenda |
pipeline.build_agenda_node |
Builds the topic agenda from the knowledge base using the target_level and selects the first topic |
Target Level → Starting Bloom Mapping¶
The target_level provided by the user determines the starting Bloom level for the assessment:
| Target Level | Starting Bloom Level |
|---|---|
| Junior | Understand |
| Mid | Apply |
| Senior | Analyze |
| Staff | Evaluate |
Phase 2: Assessment Loop¶
The core loop generates questions, evaluates responses, updates the knowledge graph, and routes to the next action.
Nodes¶
| Node | Agent | Description |
|---|---|---|
generate_question |
question_generator.generate_question |
Generates a question for the current topic and Bloom level |
await_response |
— | Interrupts for user input |
evaluate_response |
response_evaluator.evaluate_response |
Evaluates accuracy, depth, and demonstrated Bloom level |
update_graph |
knowledge_mapper.update_knowledge_graph |
Updates knowledge graph node with weighted confidence |
The assessment loop also uses routing nodes (
handle_deeper,handle_pivot) and probe nodes (probe_question,await_probe_response) described in Phase 3 below.
Question Generation¶
The question generator receives:
- Current topic and Bloom level
- Target level (for difficulty targeting)
- Number of questions already asked on this topic
- Previously used question types (to avoid repetition)
- Last 5 questions (for context)
It returns a question with one of four types:
| Type | Description |
|---|---|
conceptual |
Explain a concept, definition, or relationship |
scenario |
Apply knowledge to a real-world situation |
debugging |
Identify and fix issues in a described system |
design |
Design a solution or architecture |
Response Evaluation¶
The evaluator scores each response on:
confidence(0.0–1.0) — How well the candidate demonstrated understandingbloom_level— The actual Bloom level demonstrated (may differ from the target)evidence— Specific observations supporting the score
Knowledge Graph Update¶
When a response is evaluated, the knowledge graph is updated:
- Existing node: Weighted merge —
new_confidence = 0.7 * old + 0.3 * new - New node: Created with evaluation confidence, prerequisites from target graph
- Bloom level: Takes the higher of existing vs. newly demonstrated level
- Evidence: Appended to existing evidence list
- Edges: Prerequisite edges are added for new nodes
Phase 3: Routing¶
After each knowledge graph update, the router determines the next action.
Source: backend/app/graph/router.py
Nodes¶
| Node | Description |
|---|---|
handle_deeper |
Advances Bloom level before generating next question on the same topic |
handle_pivot |
Marks topic as evaluated and switches to the next unevaluated topic |
probe_question |
Generates a probing follow-up question on the same topic at the same Bloom level |
await_probe_response |
Interrupts for user response to the probe question |
Decision Tree¶
graph TD
A{Topics >= 10 OR<br/>Questions >= max?}
A -->|Yes| CONCLUDE[conclude]
A -->|No| B{Confidence > 0.7 AND<br/>Bloom < Create AND<br/>Questions on topic < 4?}
B -->|Yes| DEEPER[deeper]
B -->|No| C{Confidence > 0.4 AND<br/>Evidence < 2 AND<br/>Questions on topic < 4?}
C -->|Yes| PROBE[probe]
C -->|No| D{Node confidence > 0.7 OR<br/>Questions on topic >= 4?}
D -->|Yes| PIVOT[pivot]
D -->|No| E{Confidence <= 0.4?}
E -->|Yes| PIVOT2[pivot]
E -->|No| PROBE2[probe]
Routing Constants¶
| Constant | Value | Purpose |
|---|---|---|
MAX_TOPICS |
10 | Maximum topics to evaluate before concluding |
MAX_TOTAL_QUESTIONS |
dynamic | min(agenda_length, MAX_TOPICS) * max_questions_per_topic |
HIGH_CONFIDENCE |
0.7 | Threshold for "high confidence" routing |
MAX_QUESTIONS_PER_TOPIC |
4 | Maximum questions on a single topic |
MIN_EVIDENCE_FOR_CONFIDENCE |
2 | Minimum evidence items before trusting confidence |
Route Actions¶
| Route | What Happens | Next Node |
|---|---|---|
deeper |
Advance to next Bloom level on same topic | handle_deeper → generate_question |
probe |
Generate follow-up question on same topic at same level | probe_question → await_probe_response → evaluate_response |
pivot |
Mark topic as evaluated, select next unevaluated topic | handle_pivot → generate_question |
conclude |
Assessment complete, proceed to analysis | analyze_gaps |
Topic Selection (Pivot)¶
When pivoting, the router selects the next topic by:
- Getting all topics from the knowledge base for the candidate's domain and target level
- Filtering out already-evaluated topics
- Picking the first unevaluated topic (preserving prerequisite order from the YAML)
The starting Bloom level for the new topic is determined by the target level (same mapping as Phase 1).
Conclusion¶
Gap Analysis¶
Source: backend/app/agents/gap_analyzer.py → analyze_gaps()
Pure Python (no LLM call). Compares the current knowledge graph against the target graph:
- A gap exists where
current_confidence < target_confidence - Gaps are topologically sorted by prerequisites so foundational concepts come first
Gap Enrichment¶
Source: backend/app/agents/gap_enricher.py → enrich_gaps()
Enriches the raw gap nodes with priority tiers, overall readiness, and LLM-generated recommendations:
overall_readiness(0–100) — Weighted average of current/target confidence across all conceptspriority— Per-gap tier based on confidence gap:critical(>0.6),high(>0.4),medium(>0.2),lowrecommendation— LLM-generated actionable learning recommendation per gap (viaGapEnrichmentOutputschema)- Gaps are sorted by priority (critical first), then by gap size
Returns an EnrichedGapAnalysis stored in the enriched_gap_analysis state field.
Learning Plan Generation¶
Source: backend/app/agents/plan_generator.py → generate_plan()
Claude generates a phased learning plan from the gap nodes:
- 3–5 phases respecting prerequisite order
- Mixed resource types: articles, videos, projects, exercises
- Realistic hour estimates per phase
- Rationale for grouping and ordering
Bloom Taxonomy Integration¶
The six Bloom levels form the core difficulty progression:
remember → understand → apply → analyze → evaluate → create
Defined in: backend/app/graph/state.py as BloomLevel(StrEnum)
The pipeline uses Bloom levels in three ways:
- Question targeting — Questions are generated at a specific Bloom level
- Response evaluation — The evaluator identifies the demonstrated Bloom level
- Difficulty progression — The "deeper" route advances to the next Bloom level
Human-in-the-Loop¶
The pipeline uses LangGraph's interrupt() at every point where user input is needed:
await_responsenode (main assessment)await_probe_responsenode (follow-up questions)
Each interrupt sends metadata to the frontend:
Assessment interrupt (type: "assessment", used by both await_response and await_probe_response):
{
"type": "assessment",
"question": { "id": "...", "topic": "...", "bloomLevel": "...", "text": "...", "questionType": "..." },
"topics_evaluated": 3,
"total_questions": 12,
"max_questions": 25
}
The pipeline state is persisted to PostgreSQL via AsyncPostgresSaver, so assessments survive server restarts.
LLM Integration¶
All agent LLM calls go through a shared integration layer that provides structured output, automatic retry, and observability.
Source: backend/app/services/ai.py, backend/app/agents/schemas.py
Structured Output¶
Agents use LangChain's with_structured_output() instead of regex-based JSON parsing. Each agent defines a Pydantic schema in backend/app/agents/schemas.py that the LLM must conform to:
| Schema | Agent | Purpose |
|---|---|---|
QuestionOutput |
Question Generator | Assessment question |
EvaluationOutput |
Response Evaluator | Response confidence, Bloom level, evidence |
PlanOutput |
Plan Generator | Full learning plan with phases and resources |
GapEnrichmentOutput |
Gap Enricher | Summary and per-gap recommendations |
GapRecommendationOutput |
Gap Enricher | Single concept recommendation (nested in GapEnrichmentOutput) |
Agents call ainvoke_structured(schema, prompt), which returns a validated Pydantic instance. The agent then maps this to the pipeline's CamelModel state types.
Retry with Exponential Backoff¶
Transient LLM failures (timeouts, rate limits, 5xx errors) are retried automatically:
- Max attempts: 3
- Backoff: Exponential with jitter (
wait_exponential_jitter=True) - Retryable errors:
APITimeoutError,RateLimitError,InternalServerError - Non-retryable:
AuthenticationErrorand validation errors fail immediately
Structured Logging¶
Every LLM call emits structured log entries via the openlearning.llm logger:
- Success: agent name, duration (ms), schema name
- Retry: agent name, attempt number, error type and message
- Failure: agent name, duration (ms), schema name, error type and message