Gmail Triage¶
Most UK property documents arrive by email. Connecting Gmail lets LifeFile find what's already there and surface it for review — without you having to forward, save, or re-upload anything.
What we read, and what we don't¶
The OAuth scope is read-only access to messages with attachments. We never:
- Send mail from your account.
- Modify, label, archive, or delete anything in your inbox.
- Read messages that have no attachments.
- Look at messages older than the scan window (last 100 messages by default; we'll widen this once async scanning is in).
When you connect Gmail, the only thing we keep server-side is your OAuth refresh token (encrypted at rest with Fernet AES-128 + HMAC) and the metadata of the candidate documents we surface — sender, subject, date, attachment filename. Document bytes are fetched lazily during classification and (today) not stored.
You can disconnect at any time from the dashboard's Get documents in panel. Existing triage items stay; new scans stop.
How a scan works¶
Once connected, LifeFile runs a background scan immediately. The pipeline is:
- Fetch the most recent ~100 messages with attachments.
- Heuristic pre-filter — score each attachment by sender domain, filename keywords, household-context tokens (your registered house names + postcodes). Skip inline images, deduplicate by Gmail thread ID, deduplicate by SHA-1 hash of attachment bytes, deduplicate recurring marketing emails (same sender + normalised subject).
- Classify the survivors with Claude Sonnet — the same model that processes uploads. Claude sees the email subject + sender + body snippet for context; the file bytes for ≥10 KB attachments are fetched lazily so the classifier can read PDFs, not just guess from the filename.
- Bucket by lifecycle state.
You'll usually see 10–40 candidates from a year of typical UK family email. Some will be obvious wins (signed insurance schedules, council letters, HMRC). Some will be in-progress drafts that aren't ready to file. Some will be noise.
The four buckets¶
The Triage page groups findings by lifecycle state:
- Signed and final — DocuSigned contracts, completed letters, any document where the prompt is sure it's the final version. Sweepable in one click via Accept all.
- Needs your eye — anything ambiguous. Click each row to review the email + attachment side by side, then accept or dismiss.
- Drafts — interim or unsigned versions, often superseded by something later in the same thread.
- Low signal — auto-flagged junk (image001 inline images that slipped through, marketing newsletters, irrelevant attachments). Hidden by default; expand if you want to verify nothing important is in there.
Accepting¶
Each accept files the document — the same flow as if you'd uploaded it directly. The classifier's proposed metadata becomes the default; click into the row to edit before confirming if anything looks off.
If a document has a proposed_named_item (e.g. "16 High Street"), it auto-files against that property when you accept. Otherwise it lands in the Home topic's Unassigned block and the dashboard prompts you to auto-file or move manually.
Dismissing¶
Dismissed items stay marked dismissed forever — re-running the scan won't re-surface them. The Gmail message itself is untouched.
Re-running¶
The scan re-runs automatically each time you reconnect Gmail. There isn't a "rescan now" button yet (planned).
If you want a clean slate — say you've changed property names or want to test the pipeline — your developer can wipe your findings via:
Add --include-connection to also force a fresh OAuth flow.