AI Notes Generator
A production-grade pipeline that converts raw curriculum data into structured, visually rich PDF textbooks using multi-layer caching and layout-aware rendering.
Client
EdTech Platform
Year
2024
Category
Generative AI
Built at
NatrajX

Impact
Automated textbook generation from raw curriculum data
Multi-layer caching reduces repeat generation cost
Layout-aware PDF rendering with visual hierarchy
Deployed on Google Cloud Storage
Key Metrics
output
Structured PDF textbooks
caching
Multi-layer (in-memory + GCS)
rendering
Layout-aware, visual hierarchy
Tech Stack
1. Problem
Creating structured, print-ready study notes from raw curriculum data requires significant editorial effort. The goal was to automate this end-to-end with consistent formatting, visual hierarchy, and layout quality.
2. Pipeline Architecture
- Content Ingestion — parse raw curriculum data via BeautifulSoup4
- Template Rendering — Jinja2 HTML templates with layout-aware structure
- PDF Engine — WeasyPrint for pixel-perfect HTML-to-PDF conversion
- Cache Layer — multi-layer caching (in-memory + GCS) to avoid redundant generation
- Storage — Google Cloud Storage with signed URL delivery
3. Async Generation
async def generate_notes(topic: str, curriculum: dict) -> str:
cache_key = build_cache_key(topic, curriculum)
if cached := await cache.get(cache_key):
return cached
html = render_template("notes.html", curriculum=curriculum)
pdf_bytes = weasyprint.HTML(string=html).write_pdf()
url = await gcs.upload(pdf_bytes, cache_key)
await cache.set(cache_key, url)
return url
4. Results
- Fully automated, consistent textbook output
- Multi-layer caching significantly reduces GCS egress costs
- Layout-aware rendering matches professional editorial quality
This project was built at NatrajX — an AI/IT engineering agency.
Full engineering write-up, system architecture, and production metrics available on the agency site.