Monce excel.aws.monce.ai  /  architecture

Architecture

Charles Dana · Monce SAS · May 2026

Two boxes. One bearer token. The rest is rendering.

1. The split

Two AWS-hosted surfaces. Each does one thing — but the add-in box now also runs a small dispatcher, so it is no longer "compute = none."

SurfaceBoxPurposeCompute
excel.aws.monce.ait4g.micro (this)Add-in chrome and Snake dispatcher: routes =PREDICT & siblings to local or cloud based on row count.In-process algorithmeai for <500 rows. HTTPS forwards to snakebatch above.
snakebatch.aws.monce.ait3.small + LambdasDistributed v6 Snake training + predict. Authoritative model storage.Lambda fan-out (v6-worker, 10GB).

The Excel add-in calls same-origin /v6/* on this box. The dispatcher (under app/routes/snake.py) inspects the payload and decides: under 500 rows the call runs in-process here (no Lambda fee, ~30 ms–3 s wall-clock); at or above 500 rows it forwards to snakebatch.aws.monce.ai/v5/train with test_items inline (the only path the SDK currently supports end-to-end — see §9). The threshold is one env var: CLOUD_THRESHOLD.

2. Request flow — one cell, end to end

║  User types =PREDICT(A1:E120, "Délai", G1:K30) in cell L1
║
║  Office.js runtime calls our registered custom function
║       functions.js hosted at excel.aws.monce.ai/functions.js
║
║  function awaits Office.onReady, reads workbook setting
║       monceai_api_key  →  sk-monceai-...
║
║  POST https://excel.aws.monce.ai/v6/train           # same-origin, no CORS preflight
║       Authorization: Bearer sk-monceai-...
║       body: { data: [...], config: {target_index, n_layers}, test_items: [...] }
║
║  dispatcher (this box) reads len(data):
║       # <500 rows  →  in-process algorithmeai.Snake# ≥500 rows  →  POST snakebatch.aws.monce.ai/v5/train (test_items inline)
║                                |
║                                v   (cloud path only)
║  snakebatch's api Lambda (eu-west-3) authenticates, fans out:
║       v6-worker(n=2000) splits binary → 2 children → ... → leaves call Snake()
║
║  predictions returned + telemetry:
║       { predictions: [...], _mode: "local"|"cloud", _elapsed_ms: 1623 }
║
║  custom function returns matrix; Excel spills L1:L30
║
║  taskpane "Last call" tile shows mode + ms

3. excel.aws.monce.ai — the box itself

LayerChoiceWhy
OSUbuntu 22.04 LTS arm64Free, supported, ARM = cheaper compute.
Hardwaret4g.micro (2 burst vCPU, 1 GB RAM)~$7/mo. The "potato" target. 5 concurrent users fits.
AppFastAPI + uvicorn workers (gunicorn supervisor)Async, type-safe, identical pattern to snakebatch.
Workers2 (capped at 1000 reqs/worker, 120s timeout)Reduced from 4 to leave RAM for in-process Snake training. 120s timeout absorbs cloud cold-starts.
Snake runtimesalgorithmeai 5.4.4 (local) + monceai 1.2.0 (cloud SDK) in the venvLocal mode trains in-process; SDK forwards to snakebatch when above threshold.
Reverse proxynginx + certbot (Let's Encrypt)HTTPS, gzip, rate limiting.
DNSRoute53 A record at zone aws.monce.aiSame hosted zone as snakebatch.
IaCTerraform (hashicorp/aws ~> 5.0)One main.tf: EC2 + SG + IAM + Route53.
Deployrsync + systemctl restart, ~30sNo CodeDeploy, no Docker. Smallest moving parts.

4. Routes served

PathReturnsAuth
/Landing + install CTApublic
/installWin/Mac/Web platform buttonspublic
/manifest.xmlOffice Add-in manifestpublic
/functions.jsonCustom Functions metadatapublic
/functions.jsCustom Functions JS — calls same-origin /v6/*public
/taskpane.htmlAdd-in sidebar (key paste, status, formula tips)public
/v6/{potential, train, candle, fill, audit}Snake dispatcher — local or cloud per row countBearer (personal key)
/accountToken balance, model storage, key rotation (v0.6)public (UI), Bearer for API
/dashboardLive usage stats from snake-batch-usage DDBpublic
/api/dashboard?period=JSON for the dashboardpublic
/auth/{signup,verify,poll,balance}Magic-link flowstubbed pending v0.6 Lambda
/paper · /economics · /architectureDocumentation pagespublic
/healthService status JSONpublic

5. Where the auth + billing Lambdas will live

v0.5 ships these in the snakebatch Terraform module — not here — because they touch snake-batch-* resources (DynamoDB tables, S3 buckets, IAM scopes) that already live there. This EC2 only reads balances and renders UIs.

LambdaMemory / TimeoutTriggerPurpose
snake-batch-auth512 MB / 10sHTTP via API Gatewaysignup, verify, poll, rotate. Calls SES + DDB.
snake-batch-billing256 MB / 5sInline from api LambdaAtomic balance decrement (DDB ConditionExpression).
snake-batch-meter1 GB / 60sEventBridge cron, daily 00:30 UTCScan S3 model storage, charge storage tokens.

6. Security model in one diagram

                              api_key
                                 |
                                 |  bearer
                                 v
         │− HTTPS −│
         |                                                 |
   excel.aws                                          snakebatch.aws
   (chrome)                                            (compute)
         |                                                 |
         |  GET only                                       |  POST /v6/*
         |  (no DB writes)                                 |  with bearer
         v                                                 v
   DDB read                                          DDB read+write
   snake-batch-usage                                snake-batch-{users,usage}
        (dashboard)                                      (auth, billing)
                                                           |
                                                           v
                                                  S3 per-user prefix
                                              jobs/<sha256(email)>/...
                                                  IAM-scoped per key

The EC2 has no write access to user records or model storage — on purpose. If this box is ever compromised the blast radius is "dashboard reads stop"; balance and models stay safe behind the Lambda IAM boundary.

7. S3 model permanency

Every trained model lands in S3 under s3://snake-batch-monce/jobs/<sha256(email)>/<model_id>/model_stripped.json the moment training finishes. The retention policy is indefinite by default. We do not garbage-collect.

PropertyBehavior
RetentionForever, until the user explicitly deletes via POST /v6/models/<id>/delete.
Workbook deleted on diskModel still in S3. Re-fetchable by model_id.
Workbook XML part stripped (Document Inspector)Add-in falls back to S3 by model_id stored in workbook custom property; predictions resume.
Account pausedStorage retained, charges suspended. Re-activate → instant access.
Account closed (user request)30-day soft-delete window, then permanent S3 prefix wipe.
List endpointGET /v6/models → every model the bearer key owns, with size + trained_at + last_used.
Rehydrate endpointPOST /v6/models/<id>/rehydrate → returns the JSON; add-in re-embeds in the workbook.
Cross-user isolationS3 IAM policy on the api Lambda restricts to jobs/<sha256(this user)>/* — user cannot read another user's prefix.
Why permanent. A model is the encoded version of a factory's history. Re-training is fast but not free. Re-training also produces a different model (Snake's local search is stochastic), so predictions on the same row may shift. Permanent storage = stable predictions over time. That's what makes a workbook auditable a year after training.

8. The monceai SDK and the v6/batch story

The published pip install monceai SDK exposes Snake.get_batch_prediction(items) which posts to snakebatch.aws.monce.ai/v6/batch/<model_id>. As of May 2026 that endpoint is not yet shipped — it 404s. The only working cloud predict path is /v5/train with test_items inline, which re-trains every call (correct results, suboptimal cost).

This box's dispatcher works around it with a thin CloudSnake shim (app/routes/snake.py) that calls /v5/train directly. When /v6/batch ships, that shim is deleted and replaced with a one-line from monceai import Snake. Until then, cloud-mode predictions cost an extra training pass per call — baked into the economics table as the cloud row.

9. What we deliberately did not build

The architectural bet. Compute is on Lambda. Chrome is on a $7/mo potato. The two are bound by HTTPS and a bearer token. Everything else is just rendering — and rendering doesn't deserve a beefy server.