Learning Content Pipeline¶
The learning content pipeline is a LangGraph StateGraph that automatically generates personalized learning material from assessment results. It consumes the structured output of the assessment pipeline — knowledge gaps, Bloom levels, confidence scores, and evidence — and produces validated, Bloom-aligned content for each gap.
Source: backend/app/graph/content_pipeline.py, backend/app/graph/content_state.py, backend/app/agents/content_nodes.py
Pipeline Overview¶
The pipeline has 5 nodes organized into four phases:
graph LR
A[Load Assessment Result] --> B[Prioritize Gaps]
B --> C[Generate Objectives]
C --> D[Generate Content]
D --> E{Validate Quality}
E -->|Pass| F[Save Materials]
E -->|Retry| D
E -->|Max Retries| G[Flag & Save]
Nodes 1–3 (
input_reader,gap_prioritizer,objective_generator) run sequentially per session. Node 4 (generate_all_content) runs in parallel across all gaps viaasyncio.gatherwith a semaphore of 5. Node 5 (validate_all_content) runs per gap with a retry loop (up to 3 iterations).
Components¶
| Component | Technology | Role |
|---|---|---|
| State management | LangGraph StateGraph + PostgreSQL checkpointer | Persists pipeline state across nodes |
| Gap input | AssessmentResult (JSONB) |
Source of gap_nodes, knowledge graph, evidence |
| Taxonomy lookup | TaxonomyIndex (in-memory singleton from YAML) |
bloom_target, target_confidence, prerequisites |
| IRT config | concept_config (PostgreSQL table) |
Per-concept difficulty weights |
| Knowledge retrieval | pgvector | Domain knowledge for RAG generation |
| LLM calls | Claude claude-sonnet-4-6 via Anthropic API | Content generation and Bloom validation |
| Output storage | material_result (PostgreSQL JSONB) |
Final generated material per gap |
This pipeline does not replace the assessment system. It is a downstream consumer of
AssessmentResultrecords. The two systems share a PostgreSQL database and the domain taxonomy YAML, but operate on separate LangGraph threads.
Scientific Foundations¶
Each node is grounded in a specific body of educational or ML research:
| Framework | Where Used | Purpose |
|---|---|---|
| Bloom's Taxonomy (Anderson & Krathwohl, 2001) | Objective Generator, Bloom Validator | Action verb selection, cognitive level targeting, alignment validation |
| Knowledge Space Theory (Doignon & Falmagne, 1985) | Objective Generator | Prerequisite topological sort via Kahn's algorithm |
| Cognitive Load Theory (Sweller, 1988) | Content Planner | Chunk sizing, scaffolding depth, worked example density |
| Zone of Proximal Development (Vygotsky, 1978) | Content Planner | Generates material one Bloom level above current, not jumping to target |
| Item Response Theory (Lord, 1980) | Gap Prioritizer | Difficulty weighting via irt_weight from concept_config table |
| Retrieval-Augmented Generation (Lewis et al., 2020) | RAG Content Generator | Vector store retrieval prevents hallucination in domain content |
| RLAIF (Bai et al., 2022) | Bloom Validator | LLM-as-judge quality gating with structured rubric scoring |
The
priority_scoreformula in Node 2 combines three of these frameworks into a single ranking metric: IRT (irt_weight), Bloom distance (ZPD), and gap analysis (gap_severity).
Phase 1: Input & Prioritization¶
Nodes¶
| Node | Function | Description |
|---|---|---|
input_reader |
load_assessment_result |
Loads AssessmentResult from PostgreSQL, initializes TaxonomyIndex, validates concept_ids |
gap_prioritizer |
prioritize_gaps |
Computes priority_score per gap, sorts descending |
Input Validation¶
The Input Reader loads the AssessmentResult for a given session_id and the in-memory TaxonomyIndex singleton (cached from domain YAML at app startup). It verifies that every concept_id in the result's gap_nodes maps to a known taxonomy entry. Mismatches raise a ValueError and abort the pipeline.
Priority Score¶
Each gap is scored to determine processing order:
priority_score = gap_severity * bloom_distance * irt_weight
| Factor | Source | Description |
|---|---|---|
gap_severity |
Assessment result | target_confidence - current_confidence (0.0–1.0) |
bloom_distance |
Taxonomy | BLOOM_INT[target_bloom] - BLOOM_INT[current_bloom] (1–5) |
irt_weight |
concept_config table |
Item Response Theory difficulty weight (default 1.0) |
Gaps are sorted by priority_score descending. The IRT weight is loaded from the concept_config PostgreSQL table, with a tier-based fallback if absent: junior=0.9, mid=1.2, senior=1.5, staff=1.9.
Phase 2: Objective Generation¶
Nodes¶
| Node | Function | Description |
|---|---|---|
objective_generator |
generate_objectives |
Topological sort, Bloom verb selection, learning objective text generation |
Prerequisite Ordering¶
Gap concept IDs are topologically sorted using Kahn's algorithm based on prerequisite edges from the TaxonomyIndex. This ensures material for foundational concepts is always generated and presented before dependent concepts.
Bloom Action Verbs¶
The generator selects a canonical action verb for each Bloom level. For gaps spanning multiple levels, one objective is generated per intermediate level — a learner at Remember (1) targeting Analyze (4) gets three sequential pieces of material (Understand, Apply, Analyze) rather than jumping directly to the target.
| Bloom Level | Int | Action Verbs |
|---|---|---|
| Remember | 1 | define, list, recall, identify, name, state |
| Understand | 2 | explain, describe, summarize, paraphrase, classify |
| Apply | 3 | implement, use, demonstrate, execute, solve, write |
| Analyze | 4 | compare, differentiate, examine, deconstruct, trace |
| Evaluate | 5 | assess, critique, justify, argue, appraise, defend |
| Create | 6 | design, construct, formulate, architect, compose |
Objective Format¶
Each objective specifies what the learner will do, not what they will know:
{
"concept_id": "event_loop",
"bloom_level": 2,
"verb": "explain",
"objective_text": "Explain the ordering of microtasks vs macrotasks in the JS event loop",
"prereq_concept_ids": []
}
Phase 3: Content Planning & Generation¶
Nodes¶
| Node | Function | Description |
|---|---|---|
content_planner |
plan_content |
Computes Cognitive Load Theory parameters per gap |
rag_content_generator |
generate_all_content |
Vector store retrieval + LLM generation, parallel via asyncio.gather |
Cognitive Load Parameters¶
The Content Planner applies CLT to calibrate material difficulty. Senior and staff-level concepts receive smaller chunks and more scaffolding because their intrinsic cognitive load is higher.
| Parameter | Logic | Purpose |
|---|---|---|
chunk_count |
ceil(bloom_distance * CLT_CHUNK_FACTOR[tier]) |
Number of content sub-chunks |
example_count |
2 for junior/mid, 3 for senior/staff | Worked examples before practice |
scaffolding_depth |
high for senior/staff, medium otherwise |
Step-by-step guidance depth |
format_hints |
Always explanation. Add code_example if implementation-oriented. Add analogy if bloom_distance > 2. Add quiz as final format. |
Content format types |
CLT chunk factors by tier:
| Tier | Factor |
|---|---|
| Junior | 1.0 |
| Mid | 1.2 |
| Senior | 1.5 |
| Staff | 2.0 |
RAG Content Generation¶
For each gap, the generator queries the pgvector store for the top-RAG_TOP_K (default 5) most relevant domain chunks using the query {concept_id} {bloom_verb} {level_tier}. Retrieved chunks are injected into the generation prompt alongside the learning objective, CLT parameters, and assessment evidence anchors. Claude claude-sonnet-4-6 generates the content.
All gaps are processed in parallel via asyncio.gather with a concurrency limit of PARALLEL_GAP_LIMIT (default 10).
Generation Prompt¶
SYSTEM:
You are an expert instructional designer and software engineer.
You generate precise, technically accurate learning material.
You always ground explanations in concrete code examples and real evidence.
USER:
Generate learning material for the following gap.
CONCEPT: {concept_id}
DOMAIN: {domain} / {level_tier} level
LEARNING OBJECTIVE: {objective_text}
TARGET BLOOM LEVEL: {target_bloom_label} ({target_bloom_int}/6)
LEARNER CONTEXT (from assessment evidence):
{evidence_anchors joined by newline}
RETRIEVED DOMAIN KNOWLEDGE:
{retrieved_chunks joined by separator}
CONTENT PLAN:
- Sections: {chunk_count}
- Worked examples: {example_count}
- Scaffolding depth: {scaffolding_depth}
- Required formats: {format_hints joined by comma}
{if iteration > 0}
PREVIOUS CRITIQUE (address these issues in this version):
{critique}
{end if}
Generate the material as structured JSON matching the ContentSection schema.
Each section must include: type, title, body, and (if code) a code_block field.
Do not include material that falls below the target Bloom level.
Phase 4: Validation & Quality Gate¶
Nodes¶
| Node | Function | Description |
|---|---|---|
bloom_validator |
validate_bloom_alignment |
Critic LLM scores generated material on four rubric dimensions |
quality_gate |
route_quality |
Conditional routing: pass, retry with feedback, or emit with flag |
Bloom Validator Rubric¶
A separate Claude call acts as an RLAIF judge, scoring the generated material against a structured rubric:
| Dimension | Role | Description |
|---|---|---|
bloom_alignment |
Primary | Does the material require the learner to operate at the target Bloom level? |
accuracy |
Secondary | Is the technical content factually correct? |
clarity |
Secondary | Is the material well-structured and clearly written? |
evidence_alignment |
Secondary | Does the material address the specific gaps from assessment evidence? |
Scores are 0.0–1.0 per dimension. bloom_score = bloom_alignment. quality_score = mean(accuracy, clarity, evidence_alignment).
Validator Prompt¶
SYSTEM:
You are a strict educational quality assessor. You evaluate learning material
against Bloom's Taxonomy levels and instructional quality criteria.
Respond ONLY with valid JSON. No preamble or explanation outside the JSON.
USER:
Evaluate the following learning material.
TARGET BLOOM LEVEL: {target_bloom_label} ({target_bloom_int}/6)
CONCEPT: {concept_id}
LEARNING OBJECTIVE: {objective_text}
MATERIAL TO EVALUATE:
{generated_material}
Score each criterion from 0.0 to 1.0.
bloom_alignment: Does engaging with this material REQUIRE the learner to
operate at {target_bloom_label} level? (1.0 = fully requires it,
0.0 = requires only lower levels)
accuracy: Is the technical content factually correct for the domain?
clarity: Is the material clearly written and well-structured?
evidence_alignment: Does the material address the specific gaps identified
in the learner's assessment evidence?
Respond with:
{
"bloom_alignment": 0.0-1.0,
"accuracy": 0.0-1.0,
"clarity": 0.0-1.0,
"evidence_alignment": 0.0-1.0,
"critique": "specific actionable critique if any score < 0.75"
}
Quality Gate Routing¶
graph TD
A{bloom_score >= 0.75 AND<br/>quality_score >= 0.70?}
A -->|Yes| PASS[Emit final material]
A -->|No| B{iteration_count < 3?}
B -->|Yes| RETRY[Retry with critique feedback]
B -->|No| FLAG["Emit with quality_flag = 'max_iterations_reached'"]
On retry, the validator's critique text is appended to the generation prompt, giving the generator specific feedback to address. The iteration_count is incremented and the pipeline routes back to the Content Planner.
After 3 iterations without passing, the material is emitted with quality_flag = 'max_iterations_reached' and flagged for human review. The material is still usable — it simply did not meet the automated quality bar.
State Schema¶
The shared state flows through every node in the pipeline and is persisted via the PostgreSQL checkpointer:
class LearningMaterialState(TypedDict):
# Input
session_id: str
assessment_result: AssessmentResult
taxonomy: TaxonomyIndex
# Node 2: Gap Prioritizer
prioritized_gaps: list[PrioritizedGap]
# Node 3: Objective Generator
objectives: list[LearningObjective]
prereq_order: list[str] # topologically sorted concept_ids
# Node 4: Content Planner
content_plan: ContentPlan
# Node 5: RAG Content Generator
raw_content: dict[str, GeneratedContent]
# Node 6: Bloom Validator
bloom_score: float
quality_score: float
critique: str
# Node 7: Quality Gate
iteration_count: int
final_material: dict[str, LearningMaterial] | None
Data Contracts¶
The following Pydantic models define the data contracts between nodes:
class PrioritizedGap(BaseModel):
concept_id: str
current_bloom: int # 1-6
target_bloom: int # 1-6
bloom_distance: int # target - current
gap_severity: float # target_confidence - current_confidence
irt_weight: float # from concept_config table
priority_score: float # gap_severity * bloom_distance * irt_weight
evidence: list[str] # from assessment knowledge graph
prerequisites: list[str] # from taxonomy
class LearningObjective(BaseModel):
concept_id: str
bloom_level: int
verb: str # Bloom action verb for this level
objective_text: str # Full objective statement
prereq_concept_ids: list[str]
class ContentPlan(BaseModel):
concept_id: str
target_bloom: int
chunk_count: int # Number of sub-chunks (CLT-derived)
example_count: int # Worked examples before practice
scaffolding_depth: str # 'high' | 'medium' | 'low'
format_hints: list[str] # e.g. ['code_example', 'analogy', 'quiz']
evidence_anchors: list[str] # From assessment evidence, for grounding
class GeneratedContent(BaseModel):
concept_id: str
bloom_level: int
sections: list[ContentSection]
raw_prompt: str # For debugging/audit
class LearningMaterial(BaseModel):
concept_id: str
target_bloom: int
bloom_score: float
quality_score: float
sections: list[ContentSection]
iteration_count: int
generated_at: datetime
Data Layer¶
Two new PostgreSQL tables are required. The assessment system's schema does not change — all additions are additive.
concept_config¶
Stores per-concept IRT difficulty weights, editable without code deploys:
| Column | Type | Description |
|---|---|---|
concept_id |
TEXT PK |
Concept identifier (matches taxonomy YAML) |
domain |
TEXT |
Knowledge domain |
irt_weight |
FLOAT |
IRT difficulty weight (default 1.0) |
notes |
TEXT |
Optional operator notes |
updated_at |
TIMESTAMPTZ |
Last modification timestamp |
Initial seed values use a tier-based heuristic (junior: 0.9, mid: 1.2, senior: 1.5, staff: 1.9) and should be refined over time using real learner completion data:
INSERT INTO concept_config (concept_id, domain, irt_weight) VALUES
('event_loop', 'backend_engineering', 1.4),
('promises', 'backend_engineering', 1.3),
('distributed_systems', 'backend_engineering', 1.5),
('system_design', 'backend_engineering', 1.9);
TaxonomyIndex¶
The taxonomy YAML is loaded once at application startup as a module-level singleton initialized in the FastAPI lifespan context manager. All pipeline invocations share the same instance:
BLOOM_INT = {
'remember': 1, 'understand': 2, 'apply': 3,
'analyze': 4, 'evaluate': 5, 'create': 6
}
CLT_CHUNK_FACTOR = {
'junior': 1.0, 'mid': 1.2, 'senior': 1.5, 'staff': 2.0
}
class TaxonomyIndex:
def __init__(self, yaml_path: str):
raw = yaml.safe_load(open(yaml_path))
self._concepts: dict[str, dict] = {}
self._level_map: dict[str, str] = {}
for level, data in raw['levels'].items():
for c in data['concepts']:
self._concepts[c['concept']] = c | {'level': level}
self._level_map[c['concept']] = level
def get(self, concept_id: str) -> dict: ...
def bloom_target_int(self, concept_id: str) -> int: ...
def gap_severity(self, concept_id: str, current_confidence: float) -> float:
return max(0.0, target_confidence - current_confidence)
def irt_weight(self, concept_id: str, db_weight: float | None) -> float:
# Falls back to tier heuristic if db_weight is None
...
def prereqs(self, concept_id: str) -> list[str]: ...
def clt_params(self, concept_id: str, bloom_distance: int) -> dict:
# Returns {chunk_count, scaffolding_depth, example_count}
...
material_result¶
Stores final generated learning material per assessment session and concept:
| Column | Type | Description |
|---|---|---|
id |
SERIAL PK |
Auto-incrementing ID |
session_id |
TEXT FK |
References assessment_session.session_id |
concept_id |
TEXT |
Concept the material addresses |
domain |
TEXT |
Knowledge domain |
bloom_score |
FLOAT |
Bloom alignment score from validator |
quality_score |
FLOAT |
Composite quality score |
iteration_count |
INT |
Number of generation iterations |
quality_flag |
TEXT |
Set if emitted without passing quality gate |
material |
JSONB |
Full generated learning material (see output structure below) |
generated_at |
TIMESTAMPTZ |
Creation timestamp |
Constraints: UNIQUE (session_id, concept_id). Indexes: idx_material_session(session_id), idx_material_concept(concept_id, domain).
Output Structure¶
Each material JSONB field follows this structure:
{
"concept_id": "event_loop",
"domain": "backend_engineering",
"target_bloom": 2,
"target_bloom_label": "Understand",
"objective": "Explain the ordering of microtasks vs macrotasks in the JS event loop",
"sections": [
{
"type": "explanation",
"title": "What the event loop actually does",
"body": "..."
},
{
"type": "code_example",
"title": "Tracing execution order: setTimeout vs Promise",
"body": "...",
"code_block": "console.log('1'); setTimeout(...); Promise.resolve()..."
},
{
"type": "quiz",
"title": "Check your understanding",
"body": "What will this code output and why?",
"code_block": "...",
"answer": "..."
}
],
"bloom_score": 0.91,
"quality_score": 0.88,
"iteration_count": 1
}
Execution Model¶
Trigger¶
The pipeline is triggered automatically when an AssessmentSession transitions to status = 'completed'. The triggering payload contains only the session_id — all other data is loaded from the database.
Parallelism¶
For a given session, one pipeline run is created per gap node:
- Nodes 1–3 (Input Reader, Gap Prioritizer, Objective Generator) — sequential per session
- Nodes 4–5 (Content Planner, RAG Content Generator) — parallel across all gaps via
asyncio.gather - Nodes 6–7 (Bloom Validator, Quality Gate) — per gap, with retry loop
All runs share the same TaxonomyIndex singleton and database connection pool.
Failure Handling¶
| Failure | Handling |
|---|---|
| Vector store unreachable | Fallback to prompt-only generation with fallback_flag annotation |
| LLM timeout / rate limit | Exponential backoff, up to LLM_RETRY_ATTEMPTS (default 3) |
| Quality gate exhaustion | Emit with quality_flag = 'max_iterations_reached' after MAX_ITERATIONS |
Invalid concept_id |
Abort pipeline with ValueError |
All node failures are logged to a pipeline_run_log table with the LangGraph thread_id, node name, error message, and timestamp.
Configuration¶
The following constants are defined in a pipeline configuration module:
| Constant | Default | Description |
|---|---|---|
BLOOM_PASS_THRESHOLD |
0.75 | Minimum bloom_alignment score to pass quality gate |
QUALITY_PASS_THRESHOLD |
0.70 | Minimum composite quality_score to pass |
MAX_ITERATIONS |
3 | Maximum retry attempts before emitting with quality flag |
RAG_TOP_K |
5 | Number of vector store chunks to retrieve per concept |
PARALLEL_GAP_LIMIT |
10 | Max concurrent gap node runs per session |
LLM_RETRY_ATTEMPTS |
3 | LLM call retries on rate limit or timeout |
LLM_RETRY_BACKOFF_BASE |
2.0 | Exponential backoff base (seconds) |
These constants are separated from node logic to allow tuning without code changes.