From 10,000 daily crawled pages to a streaming AI answer in under a second — with 70% lower API costs.
FundsForNGOs needed NGO professionals to ask natural language questions and get precise, sourced answers from a living corpus of 10,000+ grant pages updated daily.
A Rust/Axum backend with a Tokio-powered async crawler, content-hash-based deduplication, pgvector HNSW semantic search, and a WordPress→JWT→Rust auth bridge. Two platforms — Next.js web and Flutter mobile — served from a single API.
Build a RAG pipeline that stays accurate as grants expire daily, scales to 1,000+ concurrent users, enforces subscription tiers at the API layer, and does it at a fraction of what a naive LLM integration would cost.
Tokio-powered async crawler ingesting 10,000+ grant pages daily. Content-based hashing skips unchanged pages entirely.
Chunks fingerprinted and compared before embedding. Semantically duplicate content collapsed to one canonical embedding, cutting API calls ~45%.
HNSW approximate nearest-neighbour search with re-ranking by recency and source authority before LLM context assembly.
Custom WP plugin issues short-lived JWTs. Rust /auth/callback validates, creates sessions, enforces subscription tiers server-side.
Rust API streams completions via SSE through Next.js API routes. Flutter mobile uses the same API with native streaming UI.
OpenAI API spend vs naive RAG via content-hash skipping on crawl, chunk fingerprint deduplication before embedding, and a Redis embedding cache — validated in production billing dashboards month over month.
OpenAI API spend vs naive RAG via content-hash skipping on crawl, chunk fingerprint deduplication before embedding, and a Redis embedding cache — validated in production billing dashboards month over month.
NGO professionals use the AI assistant daily for grant discovery and drafting; autoscaling Rust + connection pooling kept p95 latency flat during campaign spikes without customer-visible outages.
Rust Axum streams completion tokens over SSE through Next.js API routes (and analogous Flutter streaming clients), so users see answers begin almost immediately after hitting send.
Fully automated crawl pipeline with sitemap and WordPress REST awareness — zero manual re-index jobs — so the corpus stays aligned with expiring grants without a content ops team babysitting uploads.
Chunk deduplication and content-hash skips eliminate redundant OpenAI embedding calls when pages or paragraphs unchanged, directly lowering variable cost per crawl cycle.
Next.js web and Flutter mobile both consume the same Rust Axum API, JWT/session bridge, and rate limits — one auth and billing story instead of divergent backends per client.
Memory safety without GC pauses matters when streaming LLM responses to 1,000+ concurrent users. Axum tower middleware gives per-route rate limiting with zero overhead.
The Rust backend handles auth, crawling, embedding pipeline, vector search, and streaming — all from a single binary.
pgvector lives in the same PostgreSQL instance as user and subscription data. JOIN-based filtering is trivial. Cross-service calls eliminated.
HNSW gives ~95% recall at 10× the query speed of exact kNN at this corpus size.
Content-hash crawl skipping, chunk fingerprinting deduplication, and Redis embedding cache (24h TTL) for popular query vectors.
Most RAG pipelines re-embed everything every crawl. Incremental approach cut our bill from ~$2,400/mo to ~$720/mo.
Rather than migrating auth — a 6-month project — a custom WP plugin issues JWT, Rust validates it, creates server-side session, maps WP tier to API permissions.
Session cookies are HttpOnly and bound to the Rust session store. Next.js frontend has no knowledge of subscription limits.
We take on a small number of projects at a time. If the problem is hard, we're interested.