85%
validation accuracy
manually reviewed samples
Node+React
frontend/backend
async architecture
Gemini+TF
ai models
multi-model pipeline
MongoDB
persistence
document store
architecture evolution
V1 → V2: how the design changed
V1 — INITIAL DESIGN
Synchronous Validation
Single HTTP request through the full AI pipeline. Simple to implement. Gemini + TF inference took 3–8s per receipt — worked on single uploads, broke under concurrent load.
WHAT BROKE
Timeout Failures
Concurrent uploads exhausted HTTP workers. Requests queued behind AI processing. Frontend timed out with no recovery path. Inconsistent completion under realistic load conditions.
V2 — CURRENT
Async Job Orchestration
Upload returns job_id immediately. Validation runs in async worker. Frontend polls for completion. HTTP response decoupled from AI processing — stable regardless of inference duration.
system architecture
How the system is structured
The system separates concerns across three distinct layers: a React frontend for document capture and results display, a Node.js API layer handling routing, session management, and async job orchestration, and a Python validation backend that runs the Gemini + TensorFlow pipeline.
The key architectural decision was treating validation as an async job rather than a synchronous request. Long-running AI inference would time out in a synchronous HTTP model — the queue-based approach lets the frontend poll for results while the backend processes at its own pace.
presentation layer
React UI
→
upload_receipt()
→
job_id returned
→
poll_result()
api / orchestration layer
Node.js API
→
job_queue
→
validation_worker
→
result_store
ai validation pipeline
receipt_image
→
Gemini vision
→
TF classifier
→
confidence_score
→
validation_result
persistence layer
MongoDB
← stores →
receipts
job_results
validation_history
user_sessions
request lifecycle
Step-by-step validation flow
STEP 01
Document Capture
User uploads receipt via React UI. Frontend validates file type and size client-side, then sends multipart form to Node.js API. Returns a job_id immediately.
STEP 02
Queue + Worker Dispatch
Node.js API writes job to the queue with receipt reference. Validation worker picks it up, decoupling HTTP response time from AI processing time.
STEP 03
Gemini Vision Extraction
Google Gemini Vision extracts structured data from the receipt image: vendor, date, amounts, line items. Output is a typed JSON object for downstream validation.
STEP 04
TensorFlow Classification
TF classifier validates the extracted data against known fraud patterns. Outputs a confidence score [0–1] alongside flagged anomalies and reasoning.
STEP 05
Result Persistence
Validation result, confidence score, extracted fields, and raw receipt reference are written to MongoDB. Job status updated. TTL index cleans up raw uploads.
STEP 06
Frontend Poll → Display
React frontend polls the job endpoint every 2s. On completion, renders the validation result: confidence score, extracted fields, any flagged anomalies.
api payload examples
What the backend actually returns
# POST /validate — upload receipt, get job_id immediately
{ "job_id": "job_1847", "status": "queued", "poll_url": "/jobs/job_1847" }
# GET /jobs/:id — mid-processing
{
"job_id": "job_1847",
"status": "processing",
"stage": "tf_classify",
"progress": 0.72
}
# GET /jobs/:id — on completion
{
"job_id": "job_1847",
"status": "complete",
"valid": true,
"score": 0.91,
"extracted": { "vendor": "Shoppers Drug Mart", "total": 24.99, "date": "2026-05-12" },
"flags": []
}
validation pipeline core
How the pipeline is orchestrated
async def validate_receipt(image_path: str) -> ValidationResult:
# Stage 1: extract structured data via Gemini vision
extracted = await gemini_extract(image_path)
# returns: { vendor, date, total, line_items, currency }
# Stage 2: classify against fraud patterns
confidence, flags = tf_classify(extracted)
# trained on real-world receipt samples
# Stage 3: business rule validation
rule_result = validate_rules(extracted)
# checks: date sanity, total == sum(items), currency consistent
return ValidationResult(
valid = confidence > 0.75 and rule_result.passed,
score = confidence,
extracted = extracted,
flags = flags,
)
pipeline execution log
What observability looks like in practice
# structured log output — single validation job
[2026-05-12T14:33:01Z] job_id=job_1847 event=queued
[2026-05-12T14:33:01Z] job_id=job_1847 event=worker_pickup queue_wait=43ms
[2026-05-12T14:33:02Z] job_id=job_1847 event=preprocess_start
[2026-05-12T14:33:02Z] job_id=job_1847 event=preprocess_complete duration=94ms
[2026-05-12T14:33:02Z] job_id=job_1847 event=gemini_start
[2026-05-12T14:33:04Z] job_id=job_1847 event=gemini_complete duration=1842ms fields=6
[2026-05-12T14:33:04Z] job_id=job_1847 event=tf_classify_start
[2026-05-12T14:33:04Z] job_id=job_1847 event=tf_classify_complete duration=312ms score=0.91
[2026-05-12T14:33:04Z] job_id=job_1847 event=rule_check_complete passed=true
[2026-05-12T14:33:04Z] job_id=job_1847 event=result_written total=2291ms valid=true
architecture tradeoffs
Design decisions and why
decision
Polling vs. WebSockets for result delivery
Chose HTTP polling (every 2s) over WebSockets. For a low-concurrency validation tool, maintaining persistent connections adds infrastructure complexity without meaningful UX benefit. Polling is simpler to implement, debug, and scale independently from the validation backend.
↳ Trade-off: minor result latency vs. operational simplicity — acceptable at this scale
decision
MongoDB vs. Postgres for document persistence
Chose MongoDB because the validation result schema evolved during development — flexible document storage avoided migrations while the data shape was still changing. Postgres would be preferable for a stable schema with relational queries, but schema iteration speed during prototyping made MongoDB the better fit.
↳ Trade-off: schema flexibility over relational guarantees — schema versioning used to mitigate
decision
Gemini extraction before TF classification
Gemini runs first to extract structured data (vendor, date, totals, line items) from the raw image. The TF classifier then operates on structured JSON fields rather than raw pixels — a smaller, faster model running on typed data reduces inference cost and makes flagged anomalies easier to reason about.
↳ Result: clean stage separation, structured intermediate representation, faster classification
constraints faced
What the system had to work around
constraint
Gemini API latency variability
Extraction time ranged from ~800ms to 6s depending on image complexity, OCR difficulty, and server load. This variance made synchronous HTTP completely impractical and was the direct driver of the async architecture decision.
constraint
No labeled dataset — manual validation agreement
The ~85% figure represents agreement measured against manually reviewed receipt samples, not a pre-labeled benchmark. Limited data and no ground-truth labeling pipeline meant the metric reflects observed consistency under real-world conditions, not academic benchmark performance.
constraint
Local compute — no distributed inference
The inference service ran locally throughout development. Concurrent upload testing was limited by available compute. Current design optimized for correctness and architecture clarity; worker scaling and queue persistence are next-iteration problems.
engineering challenges
Problems solved
problem
AI extraction and validation pipeline exceeding synchronous request limits
Combined AI pipeline took 3–8 seconds per receipt. Synchronous HTTP would exceed client timeouts and create poor UX. Solution: async job queue — Node.js returns job_id immediately, React polls for completion, backend processes at its own pace.
↳ HTTP response time: <100ms. AI pipeline: runs async, no timeout risk.
problem
Receipt image quality variance degrading OCR accuracy
Receipts vary wildly: thermal paper, photos, scans, low contrast. Implemented a preprocessing step before Gemini — contrast normalisation, deskew, and resolution upscale — which improved OCR reliability and moved baseline agreement from ~62% to ~85%.
↳ Validation agreement: ~62% → ~85% after preprocessing layer addition.
problem
MongoDB document schema evolving mid-development
Validation result schema changed twice during development. Used explicit version fields on every MongoDB document and wrote a migration function run on startup — zero downtime schema evolution without breaking existing stored results.
↳ Zero data loss across 3 schema versions. Backward-compatible reads.
deployment layout & edge cases
How it runs and what it handles
DEPLOYMENT
Service breakdown
/frontend → React UI
/api → Node.js orchestration
/inference → Python validation service
/workers → async job processors
docker-compose.yml
EDGE CASES HANDLED
What the system accounts for
corrupt or unsupported upload formats · duplicate receipt submissions · malformed Gemini extraction output · missing vendor or total fields · TTL-based cleanup of raw upload files · schema version mismatch on stored results
next iteration
What I'd change at the next scale
Current design handles low-volume async orchestration well. Scaling to real production concurrency would require a different set of decisions.
INFRASTRUCTURE
Redis-backed persistent queue
Replace in-memory queue with Redis. Survives restarts, enables distributed workers, supports back-pressure handling and queue depth monitoring natively.
OBSERVABILITY
Distributed tracing via OpenTelemetry
Add spans across the pipeline boundary. Currently structured logging is per-service — not traceable end-to-end across the Node.js → Python worker boundary.
FRONTEND
WebSocket result streaming
Replace polling with WebSocket push. Current 2s polling adds unnecessary latency and creates chatty HTTP load. WebSockets would enable real-time stage progress updates.
SCALING
Worker autoscaling on queue depth
Containerize the inference service and scale workers based on queue depth. Current single-worker model creates a throughput bottleneck under concurrent upload load.
what I learned
Technical reflection
lesson
Async systems require lifecycle visibility from the start
Moving to async workers made job state invisible by default. Adding explicit states (queued → processing → complete → failed) and structured logging was necessary before the system was debuggable, not after. Observability infrastructure belongs in v1.
↳ Build the instrumentation before you need to debug it
lesson
Preprocessing quality drives accuracy more than model choice
Adjusting Gemini model parameters moved accuracy by ~3%. Adding the preprocessing normalization layer moved it by ~23%. Input quality was the dominant variable throughout — the model was not the bottleneck.
↳ Fix the data pipeline before tuning the model
lesson
Schema versioning is not optional
The validation result document schema changed twice mid-development. Without explicit version fields on each MongoDB document, old stored results would have been unreadable. Retrofitting versioning mid-project was painful; building it first would have been free.
↳ Version every persisted document shape from the beginning