Skip to content

Learning Content Pipeline

The learning content pipeline is a LangGraph StateGraph that automatically generates personalized learning material from assessment results. It consumes the structured output of the assessment pipeline — knowledge gaps, Bloom levels, confidence scores, and evidence — and produces validated, Bloom-aligned content for each gap.

Source: backend/app/graph/content_pipeline.py, backend/app/graph/content_state.py, backend/app/agents/content_nodes.py

Pipeline Overview

The pipeline has 5 nodes organized into four phases:

graph LR
    A[Load Assessment Result] --> B[Prioritize Gaps]
    B --> C[Generate Objectives]
    C --> D[Generate Content]
    D --> E{Validate Quality}
    E -->|Pass| F[Save Materials]
    E -->|Retry| D
    E -->|Max Retries| G[Flag & Save]

Nodes 1–3 (input_reader, gap_prioritizer, objective_generator) run sequentially per session. Node 4 (generate_all_content) runs in parallel across all gaps via asyncio.gather with a semaphore of 5. Node 5 (validate_all_content) runs per gap with a retry loop (up to 3 iterations).

Components

Component Technology Role
State management LangGraph StateGraph + PostgreSQL checkpointer Persists pipeline state across nodes
Gap input AssessmentResult (JSONB) Source of gap_nodes, knowledge graph, evidence
Taxonomy lookup TaxonomyIndex (in-memory singleton from YAML) bloom_target, target_confidence, prerequisites
IRT config concept_config (PostgreSQL table) Per-concept difficulty weights
Knowledge retrieval pgvector Domain knowledge for RAG generation
LLM calls Claude claude-sonnet-4-6 via Anthropic API Content generation and Bloom validation
Output storage material_result (PostgreSQL JSONB) Final generated material per gap

This pipeline does not replace the assessment system. It is a downstream consumer of AssessmentResult records. The two systems share a PostgreSQL database and the domain taxonomy YAML, but operate on separate LangGraph threads.

Scientific Foundations

Each node is grounded in a specific body of educational or ML research:

Framework Where Used Purpose
Bloom's Taxonomy (Anderson & Krathwohl, 2001) Objective Generator, Bloom Validator Action verb selection, cognitive level targeting, alignment validation
Knowledge Space Theory (Doignon & Falmagne, 1985) Objective Generator Prerequisite topological sort via Kahn's algorithm
Cognitive Load Theory (Sweller, 1988) Content Planner Chunk sizing, scaffolding depth, worked example density
Zone of Proximal Development (Vygotsky, 1978) Content Planner Generates material one Bloom level above current, not jumping to target
Item Response Theory (Lord, 1980) Gap Prioritizer Difficulty weighting via irt_weight from concept_config table
Retrieval-Augmented Generation (Lewis et al., 2020) RAG Content Generator Vector store retrieval prevents hallucination in domain content
RLAIF (Bai et al., 2022) Bloom Validator LLM-as-judge quality gating with structured rubric scoring

The priority_score formula in Node 2 combines three of these frameworks into a single ranking metric: IRT (irt_weight), Bloom distance (ZPD), and gap analysis (gap_severity).

Phase 1: Input & Prioritization

Nodes

Node Function Description
input_reader load_assessment_result Loads AssessmentResult from PostgreSQL, initializes TaxonomyIndex, validates concept_ids
gap_prioritizer prioritize_gaps Computes priority_score per gap, sorts descending

Input Validation

The Input Reader loads the AssessmentResult for a given session_id and the in-memory TaxonomyIndex singleton (cached from domain YAML at app startup). It verifies that every concept_id in the result's gap_nodes maps to a known taxonomy entry. Mismatches raise a ValueError and abort the pipeline.

Priority Score

Each gap is scored to determine processing order:

priority_score = gap_severity * bloom_distance * irt_weight
Factor Source Description
gap_severity Assessment result target_confidence - current_confidence (0.0–1.0)
bloom_distance Taxonomy BLOOM_INT[target_bloom] - BLOOM_INT[current_bloom] (1–5)
irt_weight concept_config table Item Response Theory difficulty weight (default 1.0)

Gaps are sorted by priority_score descending. The IRT weight is loaded from the concept_config PostgreSQL table, with a tier-based fallback if absent: junior=0.9, mid=1.2, senior=1.5, staff=1.9.

Phase 2: Objective Generation

Nodes

Node Function Description
objective_generator generate_objectives Topological sort, Bloom verb selection, learning objective text generation

Prerequisite Ordering

Gap concept IDs are topologically sorted using Kahn's algorithm based on prerequisite edges from the TaxonomyIndex. This ensures material for foundational concepts is always generated and presented before dependent concepts.

Bloom Action Verbs

The generator selects a canonical action verb for each Bloom level. For gaps spanning multiple levels, one objective is generated per intermediate level — a learner at Remember (1) targeting Analyze (4) gets three sequential pieces of material (Understand, Apply, Analyze) rather than jumping directly to the target.

Bloom Level Int Action Verbs
Remember 1 define, list, recall, identify, name, state
Understand 2 explain, describe, summarize, paraphrase, classify
Apply 3 implement, use, demonstrate, execute, solve, write
Analyze 4 compare, differentiate, examine, deconstruct, trace
Evaluate 5 assess, critique, justify, argue, appraise, defend
Create 6 design, construct, formulate, architect, compose

Objective Format

Each objective specifies what the learner will do, not what they will know:

{
  "concept_id": "event_loop",
  "bloom_level": 2,
  "verb": "explain",
  "objective_text": "Explain the ordering of microtasks vs macrotasks in the JS event loop",
  "prereq_concept_ids": []
}

Phase 3: Content Planning & Generation

Nodes

Node Function Description
content_planner plan_content Computes Cognitive Load Theory parameters per gap
rag_content_generator generate_all_content Vector store retrieval + LLM generation, parallel via asyncio.gather

Cognitive Load Parameters

The Content Planner applies CLT to calibrate material difficulty. Senior and staff-level concepts receive smaller chunks and more scaffolding because their intrinsic cognitive load is higher.

Parameter Logic Purpose
chunk_count ceil(bloom_distance * CLT_CHUNK_FACTOR[tier]) Number of content sub-chunks
example_count 2 for junior/mid, 3 for senior/staff Worked examples before practice
scaffolding_depth high for senior/staff, medium otherwise Step-by-step guidance depth
format_hints Always explanation. Add code_example if implementation-oriented. Add analogy if bloom_distance > 2. Add quiz as final format. Content format types

CLT chunk factors by tier:

Tier Factor
Junior 1.0
Mid 1.2
Senior 1.5
Staff 2.0

RAG Content Generation

For each gap, the generator queries the pgvector store for the top-RAG_TOP_K (default 5) most relevant domain chunks using the query {concept_id} {bloom_verb} {level_tier}. Retrieved chunks are injected into the generation prompt alongside the learning objective, CLT parameters, and assessment evidence anchors. Claude claude-sonnet-4-6 generates the content.

All gaps are processed in parallel via asyncio.gather with a concurrency limit of PARALLEL_GAP_LIMIT (default 10).

Generation Prompt

SYSTEM:
You are an expert instructional designer and software engineer.
You generate precise, technically accurate learning material.
You always ground explanations in concrete code examples and real evidence.

USER:
Generate learning material for the following gap.

CONCEPT: {concept_id}
DOMAIN: {domain} / {level_tier} level
LEARNING OBJECTIVE: {objective_text}
TARGET BLOOM LEVEL: {target_bloom_label} ({target_bloom_int}/6)

LEARNER CONTEXT (from assessment evidence):
{evidence_anchors joined by newline}

RETRIEVED DOMAIN KNOWLEDGE:
{retrieved_chunks joined by separator}

CONTENT PLAN:
- Sections: {chunk_count}
- Worked examples: {example_count}
- Scaffolding depth: {scaffolding_depth}
- Required formats: {format_hints joined by comma}

{if iteration > 0}
PREVIOUS CRITIQUE (address these issues in this version):
{critique}
{end if}

Generate the material as structured JSON matching the ContentSection schema.
Each section must include: type, title, body, and (if code) a code_block field.
Do not include material that falls below the target Bloom level.

Phase 4: Validation & Quality Gate

Nodes

Node Function Description
bloom_validator validate_bloom_alignment Critic LLM scores generated material on four rubric dimensions
quality_gate route_quality Conditional routing: pass, retry with feedback, or emit with flag

Bloom Validator Rubric

A separate Claude call acts as an RLAIF judge, scoring the generated material against a structured rubric:

Dimension Role Description
bloom_alignment Primary Does the material require the learner to operate at the target Bloom level?
accuracy Secondary Is the technical content factually correct?
clarity Secondary Is the material well-structured and clearly written?
evidence_alignment Secondary Does the material address the specific gaps from assessment evidence?

Scores are 0.0–1.0 per dimension. bloom_score = bloom_alignment. quality_score = mean(accuracy, clarity, evidence_alignment).

Validator Prompt

SYSTEM:
You are a strict educational quality assessor. You evaluate learning material
against Bloom's Taxonomy levels and instructional quality criteria.
Respond ONLY with valid JSON. No preamble or explanation outside the JSON.

USER:
Evaluate the following learning material.

TARGET BLOOM LEVEL: {target_bloom_label} ({target_bloom_int}/6)
CONCEPT: {concept_id}
LEARNING OBJECTIVE: {objective_text}

MATERIAL TO EVALUATE:
{generated_material}

Score each criterion from 0.0 to 1.0.

bloom_alignment: Does engaging with this material REQUIRE the learner to
  operate at {target_bloom_label} level? (1.0 = fully requires it,
  0.0 = requires only lower levels)

accuracy: Is the technical content factually correct for the domain?

clarity: Is the material clearly written and well-structured?

evidence_alignment: Does the material address the specific gaps identified
  in the learner's assessment evidence?

Respond with:
{
  "bloom_alignment": 0.0-1.0,
  "accuracy": 0.0-1.0,
  "clarity": 0.0-1.0,
  "evidence_alignment": 0.0-1.0,
  "critique": "specific actionable critique if any score < 0.75"
}

Quality Gate Routing

graph TD
    A{bloom_score >= 0.75 AND<br/>quality_score >= 0.70?}
    A -->|Yes| PASS[Emit final material]
    A -->|No| B{iteration_count < 3?}
    B -->|Yes| RETRY[Retry with critique feedback]
    B -->|No| FLAG["Emit with quality_flag = 'max_iterations_reached'"]

On retry, the validator's critique text is appended to the generation prompt, giving the generator specific feedback to address. The iteration_count is incremented and the pipeline routes back to the Content Planner.

After 3 iterations without passing, the material is emitted with quality_flag = 'max_iterations_reached' and flagged for human review. The material is still usable — it simply did not meet the automated quality bar.

State Schema

The shared state flows through every node in the pipeline and is persisted via the PostgreSQL checkpointer:

class LearningMaterialState(TypedDict):
    # Input
    session_id: str
    assessment_result: AssessmentResult
    taxonomy: TaxonomyIndex

    # Node 2: Gap Prioritizer
    prioritized_gaps: list[PrioritizedGap]

    # Node 3: Objective Generator
    objectives: list[LearningObjective]
    prereq_order: list[str]              # topologically sorted concept_ids

    # Node 4: Content Planner
    content_plan: ContentPlan

    # Node 5: RAG Content Generator
    raw_content: dict[str, GeneratedContent]

    # Node 6: Bloom Validator
    bloom_score: float
    quality_score: float
    critique: str

    # Node 7: Quality Gate
    iteration_count: int
    final_material: dict[str, LearningMaterial] | None

Data Contracts

The following Pydantic models define the data contracts between nodes:

class PrioritizedGap(BaseModel):
    concept_id: str
    current_bloom: int        # 1-6
    target_bloom: int         # 1-6
    bloom_distance: int       # target - current
    gap_severity: float       # target_confidence - current_confidence
    irt_weight: float         # from concept_config table
    priority_score: float     # gap_severity * bloom_distance * irt_weight
    evidence: list[str]       # from assessment knowledge graph
    prerequisites: list[str]  # from taxonomy

class LearningObjective(BaseModel):
    concept_id: str
    bloom_level: int
    verb: str                 # Bloom action verb for this level
    objective_text: str       # Full objective statement
    prereq_concept_ids: list[str]

class ContentPlan(BaseModel):
    concept_id: str
    target_bloom: int
    chunk_count: int          # Number of sub-chunks (CLT-derived)
    example_count: int        # Worked examples before practice
    scaffolding_depth: str    # 'high' | 'medium' | 'low'
    format_hints: list[str]   # e.g. ['code_example', 'analogy', 'quiz']
    evidence_anchors: list[str]  # From assessment evidence, for grounding

class GeneratedContent(BaseModel):
    concept_id: str
    bloom_level: int
    sections: list[ContentSection]
    raw_prompt: str           # For debugging/audit

class LearningMaterial(BaseModel):
    concept_id: str
    target_bloom: int
    bloom_score: float
    quality_score: float
    sections: list[ContentSection]
    iteration_count: int
    generated_at: datetime

Data Layer

Two new PostgreSQL tables are required. The assessment system's schema does not change — all additions are additive.

concept_config

Stores per-concept IRT difficulty weights, editable without code deploys:

Column Type Description
concept_id TEXT PK Concept identifier (matches taxonomy YAML)
domain TEXT Knowledge domain
irt_weight FLOAT IRT difficulty weight (default 1.0)
notes TEXT Optional operator notes
updated_at TIMESTAMPTZ Last modification timestamp

Initial seed values use a tier-based heuristic (junior: 0.9, mid: 1.2, senior: 1.5, staff: 1.9) and should be refined over time using real learner completion data:

INSERT INTO concept_config (concept_id, domain, irt_weight) VALUES
  ('event_loop', 'backend_engineering', 1.4),
  ('promises', 'backend_engineering', 1.3),
  ('distributed_systems', 'backend_engineering', 1.5),
  ('system_design', 'backend_engineering', 1.9);

TaxonomyIndex

The taxonomy YAML is loaded once at application startup as a module-level singleton initialized in the FastAPI lifespan context manager. All pipeline invocations share the same instance:

BLOOM_INT = {
    'remember': 1, 'understand': 2, 'apply': 3,
    'analyze': 4, 'evaluate': 5, 'create': 6
}

CLT_CHUNK_FACTOR = {
    'junior': 1.0, 'mid': 1.2, 'senior': 1.5, 'staff': 2.0
}

class TaxonomyIndex:
    def __init__(self, yaml_path: str):
        raw = yaml.safe_load(open(yaml_path))
        self._concepts: dict[str, dict] = {}
        self._level_map: dict[str, str] = {}
        for level, data in raw['levels'].items():
            for c in data['concepts']:
                self._concepts[c['concept']] = c | {'level': level}
                self._level_map[c['concept']] = level

    def get(self, concept_id: str) -> dict: ...
    def bloom_target_int(self, concept_id: str) -> int: ...
    def gap_severity(self, concept_id: str, current_confidence: float) -> float:
        return max(0.0, target_confidence - current_confidence)
    def irt_weight(self, concept_id: str, db_weight: float | None) -> float:
        # Falls back to tier heuristic if db_weight is None
        ...
    def prereqs(self, concept_id: str) -> list[str]: ...
    def clt_params(self, concept_id: str, bloom_distance: int) -> dict:
        # Returns {chunk_count, scaffolding_depth, example_count}
        ...

material_result

Stores final generated learning material per assessment session and concept:

Column Type Description
id SERIAL PK Auto-incrementing ID
session_id TEXT FK References assessment_session.session_id
concept_id TEXT Concept the material addresses
domain TEXT Knowledge domain
bloom_score FLOAT Bloom alignment score from validator
quality_score FLOAT Composite quality score
iteration_count INT Number of generation iterations
quality_flag TEXT Set if emitted without passing quality gate
material JSONB Full generated learning material (see output structure below)
generated_at TIMESTAMPTZ Creation timestamp

Constraints: UNIQUE (session_id, concept_id). Indexes: idx_material_session(session_id), idx_material_concept(concept_id, domain).

Output Structure

Each material JSONB field follows this structure:

{
  "concept_id": "event_loop",
  "domain": "backend_engineering",
  "target_bloom": 2,
  "target_bloom_label": "Understand",
  "objective": "Explain the ordering of microtasks vs macrotasks in the JS event loop",
  "sections": [
    {
      "type": "explanation",
      "title": "What the event loop actually does",
      "body": "..."
    },
    {
      "type": "code_example",
      "title": "Tracing execution order: setTimeout vs Promise",
      "body": "...",
      "code_block": "console.log('1'); setTimeout(...); Promise.resolve()..."
    },
    {
      "type": "quiz",
      "title": "Check your understanding",
      "body": "What will this code output and why?",
      "code_block": "...",
      "answer": "..."
    }
  ],
  "bloom_score": 0.91,
  "quality_score": 0.88,
  "iteration_count": 1
}

Execution Model

Trigger

The pipeline is triggered automatically when an AssessmentSession transitions to status = 'completed'. The triggering payload contains only the session_id — all other data is loaded from the database.

Parallelism

For a given session, one pipeline run is created per gap node:

  • Nodes 1–3 (Input Reader, Gap Prioritizer, Objective Generator) — sequential per session
  • Nodes 4–5 (Content Planner, RAG Content Generator) — parallel across all gaps via asyncio.gather
  • Nodes 6–7 (Bloom Validator, Quality Gate) — per gap, with retry loop

All runs share the same TaxonomyIndex singleton and database connection pool.

Failure Handling

Failure Handling
Vector store unreachable Fallback to prompt-only generation with fallback_flag annotation
LLM timeout / rate limit Exponential backoff, up to LLM_RETRY_ATTEMPTS (default 3)
Quality gate exhaustion Emit with quality_flag = 'max_iterations_reached' after MAX_ITERATIONS
Invalid concept_id Abort pipeline with ValueError

All node failures are logged to a pipeline_run_log table with the LangGraph thread_id, node name, error message, and timestamp.

Configuration

The following constants are defined in a pipeline configuration module:

Constant Default Description
BLOOM_PASS_THRESHOLD 0.75 Minimum bloom_alignment score to pass quality gate
QUALITY_PASS_THRESHOLD 0.70 Minimum composite quality_score to pass
MAX_ITERATIONS 3 Maximum retry attempts before emitting with quality flag
RAG_TOP_K 5 Number of vector store chunks to retrieve per concept
PARALLEL_GAP_LIMIT 10 Max concurrent gap node runs per session
LLM_RETRY_ATTEMPTS 3 LLM call retries on rate limit or timeout
LLM_RETRY_BACKOFF_BASE 2.0 Exponential backoff base (seconds)

These constants are separated from node logic to allow tuning without code changes.