Data Models¶
This page covers all data structures in the system — Pydantic API models, LangGraph state, database schema, and knowledge graph structures.
Pydantic API Models¶
All API models extend CamelModel, which serializes field names to camelCase for the frontend.
Source: backend/app/models/
Skills¶
class Skill(CamelModel):
id: str
name: str
category: str
icon: str
description: str
sub_skills: list[str]
class SkillsResponse(CamelModel):
skills: list[Skill]
categories: list[str]
Assessment¶
class ProficiencyScore(CamelModel):
skill_id: str
skill_name: str
score: int # 0-100
confidence: float # 0.0-1.0
reasoning: str
User Dashboard¶
Source: backend/app/models/user.py
class UserAssessmentSummary(CamelModel):
session_id: str
status: str # "active", "completed", or "timed_out"
skill_ids: list[str]
target_level: str
role_id: str | None = None # knowledge base domain ID
role_name: str | None = None # resolved display name from KB
created_at: datetime
completed_at: datetime | None = None
overall_readiness: int | None = None # 0-100, only set if completed
skill_count: int
Gap Analysis¶
class GapItem(CamelModel):
skill_id: str
skill_name: str
current_level: int
target_level: int
gap: int
priority: Literal["critical", "high", "medium", "low"]
recommendation: str
class GapAnalysis(CamelModel):
overall_readiness: int
summary: str
gaps: list[GapItem]
class GapAnalysisRequest(CamelModel):
proficiency_scores: list[ProficiencyScore]
Learning Plan¶
class LearningModule(CamelModel):
id: str
title: str
description: str
type: Literal["theory", "quiz", "lab"]
phase: int
skill_ids: list[str]
duration_hours: int
objectives: list[str]
resources: list[str]
class Phase(CamelModel):
phase: int
name: str
description: str
modules: list[LearningModule]
class LearningPlan(CamelModel):
title: str
summary: str
total_hours: int
total_weeks: int
phases: list[Phase]
class LearningPlanRequest(CamelModel):
gap_analysis: GapAnalysis
Roles¶
class RoleLevelSummary(CamelModel):
name: str
concept_count: int
class RoleSummary(CamelModel):
id: str
name: str
description: str
skill_count: int
levels: list[str]
class RoleDetail(CamelModel):
id: str
name: str
description: str
mapped_skill_ids: list[str]
levels: list[RoleLevelSummary]
Source: backend/app/models/roles.py
Assessment API Models¶
Simplified output projections for the assessment API. They differ from the pipeline state types in graph/state.py.
Source: backend/app/models/assessment_api.py
# Request/Response for /assessment/start
class AssessmentStartRequest(CamelModel):
skill_ids: list[str]
target_level: str = "mid"
role_id: str | None = None # Validated: must be in list_domains() or None
thoroughness: str = "standard" # "quick", "standard", or "thorough"
class AssessmentRespondRequest(CamelModel):
response: str
class AssessmentStartResponse(CamelModel):
session_id: str
question: str
estimated_questions: int | None = None
# Response for /assessment/{id}/graph
class KnowledgeNodeOut(CamelModel):
concept: str
confidence: float
bloom_level: str
prerequisites: list[str]
class KnowledgeGraphOut(CamelModel):
nodes: list[KnowledgeNodeOut]
class ProficiencyScore(CamelModel): # shared with gap_analysis module
skill_id: str
skill_name: str
score: int
confidence: float
reasoning: str
class ResourceOut(CamelModel):
type: str
title: str
url: str | None = None
class LearningPhaseOut(CamelModel):
phase_number: int
title: str
concepts: list[str]
rationale: str
resources: list[ResourceOut]
estimated_hours: float
# Response for /assessment/{id}/report
class LearningPlanOut(CamelModel):
summary: str
total_hours: float
phases: list[LearningPhaseOut]
class AssessmentReportResponse(CamelModel):
knowledge_graph: KnowledgeGraphOut
gap_analysis: GapAnalysis # reuses GapAnalysis from gap_analysis module
learning_plan: LearningPlanOut
proficiency_scores: list[ProficiencyScore]
Auth Models¶
Source: backend/app/models/auth.py
class AuthMeResponse(CamelModel):
user_id: str
display_name: str
avatar_url: str
has_api_key: bool
email: str | None = None
class RegisterRequest(CamelModel):
email: EmailStr
password: str
class LoginRequest(CamelModel):
email: EmailStr
password: str
class ApiKeySetRequest(CamelModel):
api_key: str
class ApiKeyResponse(CamelModel):
api_key_preview: str
class ValidateKeyResponse(CamelModel):
valid: bool
error: str | None = None
class OkResponse(CamelModel):
ok: bool
Knowledge Base Schema¶
These models validate YAML knowledge base files on load. They use BaseModel (not CamelModel) because they are internal, not API-facing.
LEVEL_ORDER: list[str] = ["junior", "mid", "senior", "staff"]
class ConceptSchema(BaseModel):
concept: str
target_confidence: float
bloom_target: str
prerequisites: list[str] = []
class LevelSchema(BaseModel):
concepts: list[ConceptSchema]
class KnowledgeBaseSchema(BaseModel):
domain: str
display_name: str
description: str
mapped_skill_ids: list[str]
levels: dict[str, LevelSchema]
@field_validator("levels")
@classmethod
def must_have_all_levels(cls, v: dict) -> dict:
"""Validates that all four levels (junior, mid, senior, staff) are present."""
required = set(LEVEL_ORDER)
missing = required - v.keys()
if missing:
raise ValueError(f"Missing levels: {missing}")
return v
Source: backend/app/knowledge_base/schema.py
Health¶
class HealthResponse(BaseModel): # Note: extends BaseModel, not CamelModel
status: str
database: str | None = None
Source: backend/app/models/health.py
LangGraph State¶
The assessment pipeline state is a TypedDict that flows through all nodes.
Source: backend/app/graph/state.py
class AssessmentState(TypedDict, total=False):
# Session
candidate_id: str
skill_ids: list[str]
skill_domain: str
target_level: str
# Assessment loop
question_history: list[Question]
response_history: list[Response]
current_topic: str
current_bloom_level: BloomLevel
topics_evaluated: list[str]
questions_on_current_topic: int
assessment_complete: bool
# Evaluation
latest_evaluation: EvaluationResult
# Knowledge
knowledge_graph: KnowledgeGraph
target_graph: KnowledgeGraph
gap_nodes: list[KnowledgeNode]
enriched_gap_analysis: EnrichedGapAnalysis
learning_plan: LearningPlan # state.py LearningPlan (see below)
# Human-in-the-loop
pending_question: Question | None
Pipeline vs API LearningPlan
The learning_plan field in AssessmentState uses the pipeline LearningPlan defined in graph/state.py, which has different fields from the API model in models/learning_plan.py:
# graph/state.py — used inside the pipeline
class LearningPlan(CamelModel):
phases: list[LearningPhase]
total_hours: float
summary: str
class LearningPhase(CamelModel):
phase_number: int
title: str
concepts: list[str]
rationale: str
resources: list[Resource]
estimated_hours: float
class Resource(CamelModel):
type: str # "video", "article", "project", "exercise"
title: str
url: str | None = None
The API LearningPlan (in models/learning_plan.py) has title, total_weeks, total_hours: int, and phases: list[Phase]. The pipeline version omits title/total_weeks, uses float for hours, and structures phases differently.
EnrichedGapAnalysis¶
Populated by gap_enricher.enrich_gaps() between gap analysis and plan generation:
class EnrichedGapItem(CamelModel):
skill_id: str
skill_name: str
current_level: int # 0-100
target_level: int # 0-100
gap: int # target - current
priority: Literal["critical", "high", "medium", "low"]
recommendation: str
class EnrichedGapAnalysis(CamelModel):
overall_readiness: int # 0-100
summary: str
gaps: list[EnrichedGapItem]
Source: backend/app/graph/state.py
State Data Types¶
Question¶
class Question(CamelModel):
id: str
topic: str
bloom_level: BloomLevel
text: str
question_type: str # "conceptual", "scenario", "debugging", "design"
Response¶
class Response(CamelModel):
question_id: str
text: str
EvaluationResult¶
class EvaluationResult(CamelModel):
question_id: str
confidence: float # 0.0-1.0
bloom_level: BloomLevel
evidence: list[str]
BloomLevel¶
class BloomLevel(StrEnum):
remember = "remember"
understand = "understand"
apply = "apply"
analyze = "analyze"
evaluate = "evaluate"
create = "create"
Knowledge Graph¶
The knowledge graph tracks what the candidate knows (and doesn't know) about each concept.
Source: backend/app/graph/state.py
class KnowledgeNode(CamelModel):
concept: str
confidence: float # 0.0-1.0
bloom_level: BloomLevel
prerequisites: list[str] = []
evidence: list[str] = []
class KnowledgeGraph(CamelModel):
nodes: list[KnowledgeNode] = []
edges: list[tuple[str, str]] = [] # (prerequisite, dependent)
Graph Operations¶
| Method | Description |
|---|---|
get_node(concept) |
Find a node by concept name, returns None if not found |
upsert_node(node) |
Update existing node or append new one |
Two Graphs¶
The pipeline maintains two knowledge graphs:
knowledge_graph— The candidate's current understanding, built from evaluation resultstarget_graph— The expected understanding for their target level, loaded from the knowledge base YAML
Gap analysis diffs these two graphs.
LLM Output Schemas¶
These Pydantic models define the contract between the LLM and the assessment agents. They use plain BaseModel (not CamelModel) because they are internal to the LLM integration — the LLM returns data in this shape, and agents map it to the CamelModel state types.
Source: backend/app/agents/schemas.py
Question Generation¶
class QuestionOutput(BaseModel):
topic: str # Technical concept being tested
bloom_level: str # Target Bloom taxonomy level
text: str # The question text
question_type: str # "conceptual", "scenario", "debugging", "design"
Response Evaluation¶
class EvaluationOutput(BaseModel):
confidence: float # 0.0 = wrong, 0.5 = partial, 1.0 = excellent
bloom_level: str # Bloom level actually demonstrated
evidence: list[str] # Specific observations
reasoning: str # Brief overall assessment
Learning Plan¶
class PlanResourceOutput(BaseModel):
type: str # "video", "article", "project", "exercise"
title: str
url: str | None
class PlanPhaseOutput(BaseModel):
phase_number: int
title: str
concepts: list[str]
rationale: str
resources: list[PlanResourceOutput]
estimated_hours: float
class PlanOutput(BaseModel):
summary: str
total_hours: float
phases: list[PlanPhaseOutput]
Content Generation¶
class ContentSectionOutput(BaseModel):
type: str # "explanation", "code_example", "analogy", "quiz"
title: str
body: str
code_block: str | None
answer: str | None
class ContentGeneratorOutput(BaseModel):
sections: list[ContentSectionOutput]
class BloomValidatorOutput(BaseModel):
bloom_alignment: float # 0.0-1.0
accuracy: float # 0.0-1.0
clarity: float # 0.0-1.0
evidence_alignment: float # 0.0-1.0
critique: str # Actionable feedback if any score < 0.75
Database Schema¶
PostgreSQL database with six tables for persisting users, authentication methods, assessment sessions, results, concept configuration, and generated learning materials.
Source: backend/app/db.py
users¶
| Column | Type | Description |
|---|---|---|
id |
String(36) PK |
UUID user identifier |
display_name |
String(100) |
User display name |
avatar_url |
String(500) |
Avatar URL |
encrypted_api_key |
String(500) NULL |
Fernet-encrypted Anthropic API key |
created_at |
DateTime |
Creation timestamp |
updated_at |
DateTime |
Last update timestamp |
auth_methods¶
| Column | Type | Description |
|---|---|---|
id |
Integer PK |
Auto-incrementing ID |
user_id |
String(36) FK |
References users.id |
provider |
String(20) |
Auth provider: "github" or "email" |
provider_id |
String(320) |
Provider-specific ID (GitHub user ID or email) |
credential |
String(500) NULL |
Hashed password (email provider only) |
created_at |
DateTime |
Creation timestamp |
Unique constraint: (provider, provider_id)
assessment_sessions¶
| Column | Type | Description |
|---|---|---|
session_id |
String(36) PK |
UUID session identifier |
thread_id |
String(36) UNIQUE |
LangGraph thread identifier |
skill_ids |
JSONB |
List of selected skill IDs |
target_level |
String(20) |
Target career level (default: "mid") |
status |
String(20) |
Session status: "active", "completed", or "timed_out" (default: "active") |
user_id |
String(36) FK NULL |
References users.id |
created_at |
DateTime |
Creation timestamp |
updated_at |
DateTime |
Last update timestamp |
assessment_results¶
| Column | Type | Description |
|---|---|---|
id |
Integer PK |
Auto-incrementing ID |
session_id |
String(36) FK |
References assessment_sessions.session_id |
knowledge_graph |
JSONB |
Final knowledge graph snapshot |
gap_nodes |
JSONB |
Identified knowledge gaps |
learning_plan |
JSONB |
Generated learning plan |
proficiency_scores |
JSONB |
Per-skill proficiency scores |
enriched_gap_analysis |
JSONB NULL |
Enriched gap analysis with readiness, priorities, and recommendations |
completed_at |
DateTime |
Completion timestamp |
concept_config¶
| Column | Type | Description |
|---|---|---|
concept_id |
String PK |
Concept identifier (matches taxonomy YAML) |
domain |
String |
Knowledge domain |
irt_weight |
Float |
IRT difficulty weight (default: 1.0) |
notes |
String NULL |
Optional operator notes |
updated_at |
DateTime |
Last modification timestamp |
material_results¶
| Column | Type | Description |
|---|---|---|
id |
Integer PK |
Auto-incrementing ID |
session_id |
String(36) FK |
References assessment_sessions.session_id |
concept_id |
String |
Concept the material addresses |
domain |
String |
Knowledge domain |
bloom_score |
Float |
Bloom alignment score from validator |
quality_score |
Float |
Composite quality score |
iteration_count |
Integer |
Number of generation iterations |
quality_flag |
String NULL |
Set if emitted without passing quality gate |
material |
JSONB |
Full generated learning material |
generated_at |
DateTime |
Creation timestamp |
Unique constraint: (session_id, concept_id)
Relationships¶
User→AuthMethod: one-to-many viauser_id(cascade delete-orphan)User→AssessmentSession: one-to-many viauser_idAssessmentSession→AssessmentResult: one-to-one viasession_id
Repository Layer¶
Database query logic is abstracted from route handlers into dedicated repository modules in backend/app/repositories/. Each function takes an AsyncSession as its first parameter and provides a clean separation between data access and request handling.
| Module | Key Functions |
|---|---|
session_repo |
get_session, get_session_with_ownership, list_user_sessions, delete_session_cascade, timeout_stale_sessions |
result_repo |
get_result_by_session |
user_repo |
get_user_by_id, get_user_or_404, get_auth_method, get_auth_method_by_user, get_user_by_auth_method |
material_repo |
get_materials_by_session |
Convenience variants (e.g. get_session_with_ownership, get_user_or_404) raise HTTPException directly, eliminating repeated 404/403 boilerplate in route handlers.
LangGraph Checkpoints¶
Pipeline state is persisted in the same PostgreSQL database via AsyncPostgresSaver. LangGraph manages its own checkpoint tables and stores the full state at each interrupt point, enabling resumption of assessments across server restarts.