Text-to-Animation Engine (TTA)
A production-grade multi-agent system that transforms raw academic topics into fully narrated, animated educational videos with a 99.4% success rate — deployed at Edza.ai.
Client
Edza.ai
Year
2024
Category
Generative AI
Built at
NatrajX

Impact
99.4% animation success rate in production
P95 end-to-end latency of 85 seconds
<100ms audio-video synchronisation
Covers Physics, Chemistry, Mathematics (PCM)
Key Metrics
success Rate
99.4%
latency P95
85s end-to-end
audio Sync
<100ms
subjects
PCM (Physics, Chemistry, Maths)
Tech Stack
1. The Problem
Creating a single high-quality educational animation requires subject matter experts, structured scriptwriting, animation engineering, and voice + post-production — a largely sequential, human-intensive workflow that is impossible to scale.
2. The Insight: LLMs Lack Spatial Awareness
Early experiments asking LLMs to write Manim scripts directly failed ~60% of the time. Models called non-existent functions, placed text labels over diagrams, and produced invalid syntax. Pure generation is unreliable for structured animation code.
3. Architecture: Neuro-Symbolic Pipeline
The solution is a hybrid approach: use LLMs for high-level reasoning and content synthesis, but confine them within strict, deterministic code scaffolds.
- Subject Classifier — routes topic to domain-specific template engine (confidence > 85%)
- Template Engine — subject-specific Manim scaffolds for Maths, Physics, Chemistry, Organic Chemistry, CS
- Wikipedia Fallback Agent — grounded generation for low-confidence or unsupported topics
- Orchestration Layer — manages routing, validation, regeneration, rendering, and storage
- TTS + Sync Layer — audio narration with <100ms video synchronisation
4. Confidence-Gated Routing
def route_topic(topic: str, subject: str, confidence: float):
if confidence >= 0.85:
return use_domain_template(subject, topic)
else:
return wikipedia_fallback_pipeline(topic)
5. Template Architecture
def _get_template_for_subject(self, subject: str) -> str:
template_map = {
"mathematics": MATH_TEMPLATE,
"physics": PHYSICS_TEMPLATE,
"chemistry": PHYSICAL_CHEMISTRY_TEMPLATE,
"organic_chemistry": ORGANIC_CHEMISTRY_TEMPLATE,
"computer_science": COMPUTER_SCIENCE_TEMPLATE,
}
return template_map.get(subject, MATH_TEMPLATE)
6. Results
- 99.4% success rate (up from ~40% with pure generation)
- P95 latency: 85 seconds for a fully rendered, narrated video
- Deployed across PCM subjects at Edza.ai
7. Key Learnings
- Neuro-symbolic is more robust than pure generation for structured outputs
- Confidence-gated routing prevents template mismatch failures
- Isolating failure boundaries enables independent component testing
This project was built at NatrajX — an AI/IT engineering agency.
Full engineering write-up, system architecture, and production metrics available on the agency site.