Architecture

Charles Dana · Monce SAS · May 2026

Two boxes. One bearer token. The rest is rendering.

← Home Paper Economics Architecture Install Dashboard Full pitch →

1. The split

Two AWS-hosted surfaces. Each does one thing — but the add-in box now also runs a small dispatcher, so it is no longer "compute = none."

Surface	Box	Purpose	Compute
`excel.aws.monce.ai`	t4g.micro (this)	Add-in chrome and Snake dispatcher: routes `=PREDICT` & siblings to local or cloud based on row count.	In-process `algorithmeai` for <500 rows. HTTPS forwards to snakebatch above.
`snakebatch.aws.monce.ai`	t3.small + Lambdas	Distributed v6 Snake training + predict. Authoritative model storage.	Lambda fan-out (v6-worker, 10GB).

The Excel add-in calls same-origin /v6/* on this box. The dispatcher (under app/routes/snake.py) inspects the payload and decides: under 500 rows the call runs in-process here (no Lambda fee, ~30 ms–3 s wall-clock); at or above 500 rows it forwards to snakebatch.aws.monce.ai/v5/train with test_items inline (the only path the SDK currently supports end-to-end — see §9). The threshold is one env var: CLOUD_THRESHOLD.

2. Request flow — one cell, end to end

║  User types =PREDICT(A1:E120, "Délai", G1:K30) in cell L1
║
║  Office.js runtime calls our registered custom function
║       functions.js hosted at excel.aws.monce.ai/functions.js
║
║  function awaits Office.onReady, reads workbook setting
║       monceai_api_key  →  sk-monceai-...
║
║  POST https://excel.aws.monce.ai/v6/train           # same-origin, no CORS preflight
║       Authorization: Bearer sk-monceai-...
║       body: { data: [...], config: {target_index, n_layers}, test_items: [...] }
║
║  dispatcher (this box) reads len(data):
║       # <500 rows  →  in-process algorithmeai.Snake
║       # ≥500 rows  →  POST snakebatch.aws.monce.ai/v5/train (test_items inline)
║                                |
║                                v   (cloud path only)
║  snakebatch's api Lambda (eu-west-3) authenticates, fans out:
║       v6-worker(n=2000) splits binary → 2 children → ... → leaves call Snake()
║
║  predictions returned + telemetry:
║       { predictions: [...], _mode: "local"|"cloud", _elapsed_ms: 1623 }
║
║  custom function returns matrix; Excel spills L1:L30
║
║  taskpane "Last call" tile shows mode + ms

3. excel.aws.monce.ai — the box itself

Layer	Choice	Why
OS	Ubuntu 22.04 LTS arm64	Free, supported, ARM = cheaper compute.
Hardware	t4g.micro (2 burst vCPU, 1 GB RAM)	~$7/mo. The "potato" target. 5 concurrent users fits.
App	FastAPI + uvicorn workers (gunicorn supervisor)	Async, type-safe, identical pattern to snakebatch.
Workers	2 (capped at 1000 reqs/worker, 120s timeout)	Reduced from 4 to leave RAM for in-process Snake training. 120s timeout absorbs cloud cold-starts.
Snake runtimes	`algorithmeai 5.4.4` (local) + `monceai 1.2.0` (cloud SDK) in the venv	Local mode trains in-process; SDK forwards to snakebatch when above threshold.
Reverse proxy	nginx + certbot (Let's Encrypt)	HTTPS, gzip, rate limiting.
DNS	Route53 A record at zone `aws.monce.ai`	Same hosted zone as snakebatch.
IaC	Terraform (`hashicorp/aws ~> 5.0`)	One `main.tf`: EC2 + SG + IAM + Route53.
Deploy	rsync + `systemctl restart`, ~30s	No CodeDeploy, no Docker. Smallest moving parts.

4. Routes served

Path	Returns	Auth
`/`	Landing + install CTA	public
`/install`	Win/Mac/Web platform buttons	public
`/manifest.xml`	Office Add-in manifest	public
`/functions.json`	Custom Functions metadata	public
`/functions.js`	Custom Functions JS — calls same-origin `/v6/*`	public
`/taskpane.html`	Add-in sidebar (key paste, status, formula tips)	public
`/v6/{potential, train, candle, fill, audit}`	Snake dispatcher — local or cloud per row count	Bearer (personal key)
`/account`	Token balance, model storage, key rotation (v0.6)	public (UI), Bearer for API
`/dashboard`	Live usage stats from `snake-batch-usage` DDB	public
`/api/dashboard?period=`	JSON for the dashboard	public
`/auth/{signup,verify,poll,balance}`	Magic-link flow	stubbed pending v0.6 Lambda
`/paper` · `/economics` · `/architecture`	Documentation pages	public
`/health`	Service status JSON	public

5. Where the auth + billing Lambdas will live

v0.5 ships these in the snakebatch Terraform module — not here — because they touch snake-batch-* resources (DynamoDB tables, S3 buckets, IAM scopes) that already live there. This EC2 only reads balances and renders UIs.

Lambda	Memory / Timeout	Trigger	Purpose
`snake-batch-auth`	512 MB / 10s	HTTP via API Gateway	signup, verify, poll, rotate. Calls SES + DDB.
`snake-batch-billing`	256 MB / 5s	Inline from `api` Lambda	Atomic balance decrement (DDB `ConditionExpression`).
`snake-batch-meter`	1 GB / 60s	EventBridge cron, daily 00:30 UTC	Scan S3 model storage, charge storage tokens.

6. Security model in one diagram

                              api_key
                                 |
                                 |  bearer
                                 v
         │− HTTPS −│
         |                                                 |
   excel.aws                                          snakebatch.aws
   (chrome)                                            (compute)
         |                                                 |
         |  GET only                                       |  POST /v6/*
         |  (no DB writes)                                 |  with bearer
         v                                                 v
   DDB read                                          DDB read+write
   snake-batch-usage                                snake-batch-{users,usage}
        (dashboard)                                      (auth, billing)
                                                           |
                                                           v
                                                  S3 per-user prefix
                                              jobs/<sha256(email)>/...
                                                  IAM-scoped per key

The EC2 has no write access to user records or model storage — on purpose. If this box is ever compromised the blast radius is "dashboard reads stop"; balance and models stay safe behind the Lambda IAM boundary.

7. S3 model permanency

Every trained model lands in S3 under s3://snake-batch-monce/jobs/<sha256(email)>/<model_id>/model_stripped.json the moment training finishes. The retention policy is indefinite by default. We do not garbage-collect.

Property	Behavior
Retention	Forever, until the user explicitly deletes via `POST /v6/models/<id>/delete`.
Workbook deleted on disk	Model still in S3. Re-fetchable by `model_id`.
Workbook XML part stripped (Document Inspector)	Add-in falls back to S3 by `model_id` stored in workbook custom property; predictions resume.
Account paused	Storage retained, charges suspended. Re-activate → instant access.
Account closed (user request)	30-day soft-delete window, then permanent S3 prefix wipe.
List endpoint	`GET /v6/models` → every model the bearer key owns, with size + trained_at + last_used.
Rehydrate endpoint	`POST /v6/models/<id>/rehydrate` → returns the JSON; add-in re-embeds in the workbook.
Cross-user isolation	S3 IAM policy on the api Lambda restricts to `jobs/<sha256(this user)>/*` — user cannot read another user's prefix.

Why permanent. A model is the encoded version of a factory's history. Re-training is fast but not free. Re-training also produces a different model (Snake's local search is stochastic), so predictions on the same row may shift. Permanent storage = stable predictions over time. That's what makes a workbook auditable a year after training.

8. The monceai SDK and the v6/batch story

The published pip install monceai SDK exposes Snake.get_batch_prediction(items) which posts to snakebatch.aws.monce.ai/v6/batch/<model_id>. As of May 2026 that endpoint is not yet shipped — it 404s. The only working cloud predict path is /v5/train with test_items inline, which re-trains every call (correct results, suboptimal cost).

This box's dispatcher works around it with a thin CloudSnake shim (app/routes/snake.py) that calls /v5/train directly. When /v6/batch ships, that shim is deleted and replaced with a one-line from monceai import Snake. Until then, cloud-mode predictions cost an extra training pass per call — baked into the economics table as the cloud row.

9. What we deliberately did not build

No CodeDeploy / blue-green. One box, one rsync, 30s deploy. Reversible by re-rsync.
No Docker. Python venv on the OS. Smaller cognitive surface for a tiny service.
No Redis / queue. snakebatch's Lambda fan-out is the queue.
No CDN in front of static assets. nginx serves /manifest.xml and friends; AppSource will fetch it on listing approval, not at runtime, so cache hit rate is meaningless.
No SSO / OIDC for Excel auth. Magic-link via SES is enough for v0.6; Microsoft OAuth is a v2 add-on for enterprise tenants.
No multi-tenant DB in v0.2. The dispatcher gates on a single MONCEAI_PERSONAL_KEY env var (sha256-compared) instead of a users table. v0.6 swaps this for the snakebatch DDB-backed flow without touching the client.

The architectural bet. Compute is on Lambda. Chrome is on a $7/mo potato. The two are bound by
HTTPS and a bearer token. Everything else is just rendering — and rendering doesn't deserve a
beefy server.