Scribe Docs — offline sync design

Scope. How the client reconciles a scribe doc's content from two local sources (y-indexeddb cache + PowerSync local SQLite) and one remote source (Hocuspocus), across all combinations of online/offline and cache freshness. Companion to the Slice 1 spec — final design after POC validation. Cadence-based diffs (per-checkpoint, not per-onChange). scribe_doc_versions and scribe_doc_contents collapsed into scribe_doc_updates via kind / label / created_by columns.

1. Architecture

flowchart TB
  subgraph Server
    HP["scribe-collab-server<br/>(Node + Hocuspocus)"]
    SB["scribe-backend<br/>(Java)"]
    SDU[("scribe_doc_updates<br/>append-only diffs")]
    MJ["Merge job (TODO)"]
  end
  subgraph Client
    SQ[("PowerSync local SQLite")]
    IDB[("y-indexeddb<br/>Y.Doc state +<br/>lastSeqApplied")]
    YD["Y.Doc"]
    HC["Hocuspocus client"]
    Editor["Editor"]
  end
  HP -->|"per-checkpoint diff"| SB
  SB -->|"INSERT"| SDU
  MJ -.->|"coalesces (post-slice-1)"| SDU
  SDU -->|"PowerSync replication"| SQ
  IDB -->|"phase 1"| YD
  SQ -->|"phase 2"| YD
  HC <-->|"phase 3 + 4"| YD
  YD -->|"persists"| IDB
  YD --> Editor
  HC <==>|"WebSocket"| HP

scribe_doc_updates (append-only): one row per server-side checkpoint cadence (~500 ops or 60s idle, plus on-last-disconnect). Body is a base64-encoded Yjs diff against the previous row's state vector. Replaces both the legacy scribe_doc_contents.snapshot and the separate scribe_doc_versions table — version-history facets fold in via kind / label / created_by columns.
scribe-collab-server: Node service, depends on @hocuspocus/server. Holds the in-memory Y.Doc, computes diffs (Y.encodeStateAsUpdate(ydoc, prevSV)), POSTs them to scribe-backend's /internal/scribe-doc-updates. Single instance for slice 1.
scribe-backend: Java service, persists what the sidecar sends. Treats update_b64 and state_vector_b64 as opaque text — never imports Yjs. Server-side dedup on state_vector_b64 equality skips duplicate flushes.
PowerSync: delivers scribe_doc_updates.update_b64 to client SQLite via the existing team_data / non_member_tasket_data streams. state_vector_b64 is excluded — server-only.
y-indexeddb: client-side cache of merged Y.Doc state plus a lastSeqApplied cursor used by phase-2 catch-up.

2. Schema

CREATE TABLE scribe.scribe_doc_updates (
  id                UUID PRIMARY KEY NOT NULL,
  seq               BIGSERIAL        NOT NULL UNIQUE,
  folder_item_id    UUID             NOT NULL REFERENCES scribe.folder_items(id) ON DELETE CASCADE,
  org_id            UUID             NOT NULL REFERENCES public.organizations(id),
  team_id           UUID                      REFERENCES public.teams(id),
  root_folder_id    UUID             NOT NULL REFERENCES scribe.folders(id),
  update_b64        TEXT             NOT NULL,
  state_vector_b64  TEXT             NOT NULL,       -- server-only; excluded from sync rules
  origin            VARCHAR          NOT NULL,       -- 'hocuspocus' | 'merge_job' | 'version_restore'
  kind              VARCHAR          NOT NULL DEFAULT 'auto',  -- 'auto' | 'session_end' | 'manual'
  label             VARCHAR,
  created_by        UUID,
  audit_info        JSONB,
  created_at        TIMESTAMPTZ      NOT NULL DEFAULT now()
);

CREATE INDEX ON scribe.scribe_doc_updates (folder_item_id, seq);
CREATE INDEX ON scribe.scribe_doc_updates (team_id) WHERE team_id IS NOT NULL;
CREATE INDEX ON scribe.scribe_doc_updates (root_folder_id);
CREATE INDEX ON scribe.scribe_doc_updates (folder_item_id, created_at) WHERE kind != 'auto';

id (UUID) is the PowerSync-required text-typed PK; seq is the monotonic ordering cursor for client phase-2 catch-up.
org_id / team_id / root_folder_id are percolated from folder_items (resolved server-side on append) — keeps PowerSync predicates join-free.
state_vector_b64 is always present: server uses it to compute the next diff and to dedup duplicate flushes. Excluded from PowerSync sync rules — clients never see it.
kind carries version-history facets — 'session_end' rows are user-visible save-points; 'auto' rows are machine checkpoints. Slice 1 only writes 'auto' and 'session_end'; 'manual' is reserved for an explicit "save version" UI.
scribe_doc_contents.snapshot and scribe_doc_versions from earlier drafts are removed — scribe_doc_updates is the only content table.

Scribe Docs — offline sync design

1. Architecture

2. Schema

3. Server-side diff computation