Skip to content

Gmail Triage

Most UK property documents arrive by email. Connecting Gmail lets LifeFile find what's already there and surface it for review — without you having to forward, save, or re-upload anything.

What we read, and what we don't

The OAuth scope is read-only access to messages with attachments. We never:

  • Send mail from your account.
  • Modify, label, archive, or delete anything in your inbox.
  • Read messages that have no attachments.
  • Look at messages older than the scan window (last 100 messages by default; we'll widen this once async scanning is in).

When you connect Gmail, the only thing we keep server-side is your OAuth refresh token (encrypted at rest with Fernet AES-128 + HMAC) and the metadata of the candidate documents we surface — sender, subject, date, attachment filename. Document bytes are fetched lazily during classification and (today) not stored.

You can disconnect at any time from the dashboard's Get documents in panel. Existing triage items stay; new scans stop.

How a scan works

Once connected, LifeFile runs a background scan immediately. The pipeline is:

  1. Fetch the most recent ~100 messages with attachments.
  2. Heuristic pre-filter — score each attachment by sender domain, filename keywords, household-context tokens (your registered house names + postcodes). Skip inline images, deduplicate by Gmail thread ID, deduplicate by SHA-1 hash of attachment bytes, deduplicate recurring marketing emails (same sender + normalised subject).
  3. Classify the survivors with Claude Sonnet — the same model that processes uploads. Claude sees the email subject + sender + body snippet for context; the file bytes for ≥10 KB attachments are fetched lazily so the classifier can read PDFs, not just guess from the filename.
  4. Bucket by lifecycle state.

You'll usually see 10–40 candidates from a year of typical UK family email. Some will be obvious wins (signed insurance schedules, council letters, HMRC). Some will be in-progress drafts that aren't ready to file. Some will be noise.

The four buckets

The Triage page groups findings by lifecycle state:

  • Signed and final — DocuSigned contracts, completed letters, any document where the prompt is sure it's the final version. Sweepable in one click via Accept all.
  • Needs your eye — anything ambiguous. Click each row to review the email + attachment side by side, then accept or dismiss.
  • Drafts — interim or unsigned versions, often superseded by something later in the same thread.
  • Low signal — auto-flagged junk (image001 inline images that slipped through, marketing newsletters, irrelevant attachments). Hidden by default; expand if you want to verify nothing important is in there.

Accepting

Each accept files the document — the same flow as if you'd uploaded it directly. The classifier's proposed metadata becomes the default; click into the row to edit before confirming if anything looks off.

If a document has a proposed_named_item (e.g. "16 High Street"), it auto-files against that property when you accept. Otherwise it lands in the Home topic's Unassigned block and the dashboard prompts you to auto-file or move manually.

Dismissing

Dismissed items stay marked dismissed forever — re-running the scan won't re-surface them. The Gmail message itself is untouched.

Re-running

The scan re-runs automatically each time you reconnect Gmail. There isn't a "rescan now" button yet (planned).

If you want a clean slate — say you've changed property names or want to test the pipeline — your developer can wipe your findings via:

fly ssh console -a lifefile -C 'python manage.py wipe_findings --user <username>'

Add --include-connection to also force a fresh OAuth flow.