Funds for NGOs — AI Chatbot & RAG Platform

The Brief

THE
PROBLEM

FundsForNGOs needed NGO professionals to ask natural language questions and get precise, sourced answers from a living corpus of 10,000+ grant pages updated daily.

A Rust/Axum backend with a Tokio-powered async crawler, content-hash-based deduplication, pgvector HNSW semantic search, and a WordPress→JWT→Rust auth bridge. Two platforms — Next.js web and Flutter mobile — served from a single API.

Core Engineering Challenge

Build a RAG pipeline that stays accurate as grants expire daily, scales to 1,000+ concurrent users, enforces subscription tiers at the API layer, and does it at a fraction of what a naive LLM integration would cost.

70%

Cost Reduction

1K+

Daily Active Users

1s<

Time to First Token

10K+

Pages Indexed Daily

Extended stack

backend

Rust / Axum

backend

Tokio

database

pgvector

database

PostgreSQL

frontend

Next.js

mobile

Flutter

ai

OpenAI GPT-4

auth

JWT

database

Redis

How we built it

THE ARCHITECTURE

01

Async Rust Crawler

Tokio-powered async crawler ingesting 10,000+ grant pages daily. Content-based hashing skips unchanged pages entirely.

RustTokioAxumContent Hashing

02

Chunk Deduplication

Chunks fingerprinted and compared before embedding. Semantically duplicate content collapsed to one canonical embedding, cutting API calls ~45%.

Chunk FingerprintingpgvectorCosine Similarity

03

pgvector Semantic Search

HNSW approximate nearest-neighbour search with re-ranking by recency and source authority before LLM context assembly.

pgvectorHNSW IndexRe-ranking

04

WordPress → JWT → Rust Auth

Custom WP plugin issues short-lived JWTs. Rust /auth/callback validates, creates sessions, enforces subscription tiers server-side.

WordPress IdPJWTRust SessionsRate Limiting

05

Streaming Chat Interface

Rust API streams completions via SSE through Next.js API routes. Flutter mobile uses the same API with native streaming UI.

SSENext.jsFlutterGPT-4

System Data Flow

Ingestion

CrawlerContent Hash10K+ pages

↓

Processing

ChunkerDeduplicatorEmbedding API

↓

Storage

pgvectorPostgreSQLRedis Cache

↓

Auth

Identity ProviderJWT BridgeSessions

↓

Query

HNSW SearchRe-rankerLLM Stream

↓

Delivery

Web AppMobile AppUsers

What we delivered

THE RESULTS

70%

OpenAI API spend vs naive RAG via content-hash skipping on crawl, chunk fingerprint deduplication before embedding, and a Redis embedding cache — validated in production billing dashboards month over month.

70%

Cost Reduction

OpenAI API spend vs naive RAG via content-hash skipping on crawl, chunk fingerprint deduplication before embedding, and a Redis embedding cache — validated in production billing dashboards month over month.

1K+

Daily Active Users

NGO professionals use the AI assistant daily for grant discovery and drafting; autoscaling Rust + connection pooling kept p95 latency flat during campaign spikes without customer-visible outages.

1s<

Time to First Token

Rust Axum streams completion tokens over SSE through Next.js API routes (and analogous Flutter streaming clients), so users see answers begin almost immediately after hitting send.

10K+

Pages Indexed Daily

Fully automated crawl pipeline with sitemap and WordPress REST awareness — zero manual re-index jobs — so the corpus stays aligned with expiring grants without a content ops team babysitting uploads.

45%

Embedding Calls Cut

Chunk deduplication and content-hash skips eliminate redundant OpenAI embedding calls when pages or paragraphs unchanged, directly lowering variable cost per crawl cycle.

2×

Platforms from One API

Next.js web and Flutter mobile both consume the same Rust Axum API, JWT/session bridge, and rate limits — one auth and billing story instead of divergent backends per client.

Visual documentation

SCREENS &
INTERFACES

Next.js Web — Grant Discovery

Admin Dashboard

Usage Analytics

Flutter — Search Results

Engineering decisions

TECH
DEEP DIVE

RUST

Why Rust for the Backend

Memory safety without GC pauses matters when streaming LLM responses to 1,000+ concurrent users. Axum tower middleware gives per-route rate limiting with zero overhead.

The Rust backend handles auth, crawling, embedding pipeline, vector search, and streaming — all from a single binary.

VEC

pgvector over Pinecone

pgvector lives in the same PostgreSQL instance as user and subscription data. JOIN-based filtering is trivial. Cross-service calls eliminated.

HNSW gives ~95% recall at 10× the query speed of exact kNN at this corpus size.

COST

The 70% Cost Reduction

Content-hash crawl skipping, chunk fingerprinting deduplication, and Redis embedding cache (24h TTL) for popular query vectors.

Most RAG pipelines re-embed everything every crawl. Incremental approach cut our bill from ~$2,400/mo to ~$720/mo.

AUTH

The WordPress Auth Bridge

Rather than migrating auth — a 6-month project — a custom WP plugin issues JWT, Rust validates it, creates server-side session, maps WP tier to API permissions.

Session cookies are HttpOnly and bound to the Rust session store. Next.js frontend has no knowledge of subscription limits.

Start a project

LET'S
BUILD
SOMETHING.

We take on a small number of projects at a time. If the problem is hard, we're interested.

Email

hello@techmusketeers.com

Response time

Within 24 hours

Availability

Open for new projects · 2025

Name

Email

Domain

What are you building?

FUNDSFORNGOS

THEPROBLEM

THE ARCHITECTURE

THE RESULTS

SCREENS &INTERFACES

TECHDEEP DIVE

LET'SBUILDSOMETHING.

THE
PROBLEM

SCREENS &
INTERFACES

TECH
DEEP DIVE

LET'S
BUILD
SOMETHING.