Skip to content

Privacy and security

LifeFile holds documents that you'd rather a stranger didn't read. Here's what we do, and where we still have work to do.

What we store

  • Documents you upload. Files live on a Fly.io persistent volume in London. Today they're stored in plaintext on the volume — anyone with fly ssh access to the running container can read them. Encrypted-at-rest object storage (Cloudflare R2 with envelope encryption per household) is on the deferred list.
  • Document metadata — title, provider, dates, document type, the property or vehicle it's filed against, the topic. SQLite database on the same volume.
  • Your account — username, email, hashed password (Django's default PBKDF2-SHA256). If you signed in with Google, we also store your Google sub (subject identifier) so we can match you to your account on the next sign-in.
  • Gmail OAuth tokens if you connect Gmail — encrypted at rest with Fernet (AES-128 + HMAC) using a server-side key. Losing the key would render stored tokens unrecoverable; you'd reconnect.
  • Activity log — a timeline of what was filed, accepted, dismissed, auto-filed, and confirmed-as-reviewed. Used for the Undo affordances.

What we don't store

  • Gmail email bodies. We classify on subject + sender + body snippet + attachment, but we don't persist the body.
  • Attachment bytes from Gmail Triage. Today the Triage row records metadata only; the file is fetched lazily during classification. ("Path B" — store-on-accept — is on the roadmap.)
  • Anything you'd file under HR, payslips, or medical unless you upload it explicitly. The Gmail scan is gated to attachments matching the property / admin domain heuristics.

Who can see your data

  • You. And anyone you invite to your household via the invite link in the account dropdown. Invites expire after 14 days.
  • The LifeFile operator — that's me, today. I can fly ssh into the running container and read the volume. I commit not to do this without your explicit ask.
  • Anthropic — when a document is classified, the bytes are sent to Claude (Anthropic's API). Anthropic's commercial API terms say data is not used for training and is retained for 30 days for abuse-monitoring before deletion.
  • Google — when you connect Gmail, Google sees the API calls we make. Same scope as the OAuth grant; nothing more.
  • DVLA + DVSA — if you supply a vehicle registration, the plate is sent to the DVLA Vehicle Enquiry Service and DVSA MOT History API. They see the request, you see the response, we cache it on your record.

That's the entire list of human or organisational entities with potential access.

What you can control

  • Disconnect Gmail — from the dashboard's Get documents in panel. Existing triage items stay; new scans stop. The OAuth token is deleted from our database immediately.
  • Delete documents — individually from any document page, or in bulk from the topic / search pages.
  • Delete your account — there's no UI button yet; email me and I'll do it manually. Deletion removes the documents, Gmail OAuth, profile, schedule, vehicle and property profiles, activity log, and household membership.
  • Export your data — also a manual ask today; planned UI export coming.

Multi-tenant isolation

Every model in the database carries a household foreign key. Every Django manager auto-scopes to the current request's household. Cross-household reads are tested at the model layer and the view layer — there are dedicated isolation tests in apps/common/tests/ and per-app test files.

That's defence-in-depth: even a bug in a view that forgets to scope won't leak across households because the manager is scoped by default. The escape hatch (Document.all_tenants) is reserved for explicit cross-household admin / audit operations.

When we migrate to Postgres, we'll add row-level security on top — same logic, enforced by the database itself.

Known gaps

These are real and we're not pretending otherwise:

  • No cloud storage / encryption at rest for files yet. Files are on a Fly volume in plaintext.
  • Manual SQLite backups. The Fly volume is persistent across deploys but isn't automatically replicated. We take periodic dumps; litestream to R2 is planned.
  • No rate limiting on login or upload. Add django-ratelimit or front with Cloudflare before going public.
  • No account lockout on repeated failed logins.
  • No MFA — single password protects access today.
  • No DPIA — required before any cross-household aggregate-learning work, not before single-user use.

These are documented and tracked. For a personal-or-trusted-friends deploy the current security story is acceptable; for a public launch it isn't.