23 KiB
i18n — Content Translation & Versioning
Proposal for translating pages, widgets, and other content types with version tracking.
Status Quo
| What exists | Where |
|---|---|
i18n_translations — flat src_text → dst_text cache |
db-i18n.ts |
i18n_glossaries / i18n_glossary_terms — DeepL glossary sync |
db-i18n.ts |
| DeepL server-side translate — translate + cache in one call | i18n-deepl.ts |
@polymech/i18n — shared clean() helper etc. |
monorepo package |
The existing system translates arbitrary text blobs. It has no awareness of:
- Which page / widget a translation belongs to
- Which version of the source content was translated
- Structural identity — if a widget moves or is deleted, orphaned translations linger
Goals
- Page-level translations — a translated "snapshot" of an entire page
- Widget-level translations — translate individual widget text props independently
- Content versioning — track which source version a translation was produced from, detect drift
- Reuse existing infra —
i18n_translationsstays as the text cache, DeepL stays as the engine
Proposed Database Schema
1. content_versions
Tracks every published snapshot of any content entity (pages, posts, collections, …).
create table content_versions (
id uuid primary key default gen_random_uuid(),
entity_type text not null, -- 'page' | 'post' | 'collection'
entity_id uuid not null, -- pages.id / posts.id / …
version int not null default 1, -- monotonic per entity
content_hash text not null, -- sha256 of JSON content
content jsonb, -- snapshot of content at this version (optional, for rollback)
meta jsonb default '{}', -- { author, change_note, … }
created_at timestamptz default now(),
created_by uuid references auth.users(id),
unique (entity_type, entity_id, version)
);
create index idx_cv_entity on content_versions (entity_type, entity_id);
Why a separate table?
Thepagestable stores the current working state.
content_versionsstores immutable snapshots you can diff, rollback, or translate against.
2. content_translations
Links a translated content blob to a specific source version + language.
create type translation_status as enum ('draft', 'machine', 'reviewed', 'published');
create table content_translations (
id uuid primary key default gen_random_uuid(),
entity_type text not null,
entity_id uuid not null,
source_version int not null, -- FK-like ref to content_versions.version
source_lang text not null default 'de',
target_lang text not null,
status translation_status default 'draft',
-- Translated payload (same shape as source content)
translated_content jsonb, -- full page JSON with translated strings
-- Drift detection
source_hash text, -- hash of source at translation time
is_stale boolean default false, -- set true when source gets a newer version
meta jsonb default '{}', -- { translator, provider, cost, … }
created_at timestamptz default now(),
updated_at timestamptz default now(),
translated_by uuid references auth.users(id),
unique (entity_type, entity_id, source_version, target_lang)
);
create index idx_ct_entity on content_translations (entity_type, entity_id, target_lang);
3. widget_translations (optional — granular level)
For widget-by-widget translation without duplicating the whole page JSON.
create table widget_translations (
id uuid primary key default gen_random_uuid(),
entity_type text not null default 'page',
entity_id uuid not null,
widget_id text not null, -- WidgetInstance.id from the JSON tree
prop_path text not null default 'content', -- e.g. 'content', 'label', 'placeholder'
source_lang text not null,
target_lang text not null,
source_text text not null,
translated_text text not null,
source_version int, -- which content_version this was derived from
status translation_status default 'machine',
meta jsonb default '{}',
created_at timestamptz default now(),
updated_at timestamptz default now(),
unique (entity_type, entity_id, widget_id, prop_path, target_lang)
);
create index idx_wt_entity on widget_translations (entity_type, entity_id, target_lang);
Why both
content_translationsandwidget_translations?
content_translations= "give me the whole page in French" (fast serve)widget_translations= "give me just widget X in French" (granular edit, partial retranslation)
When serving, we prefercontent_translations(single read). When editing, we usewidget_translationsfor surgical updates.
Translatable Widget Props
Not every widget property needs translation. Here's the map of translatable text:
| Widget Type | Translatable Props |
|---|---|
html-widget |
content |
markdown-text |
content |
tabs-widget |
tabs[].label |
layout-container-widget |
nestedPageName |
photo-card |
— (title/description from pictures table) |
gallery-widget |
— |
file-browser |
— |
| Container (settings) | settings.title |
The shared function iterateWidgets() from @polymech/shared can walk the full content tree to extract translatable strings per widget.
Content Versioning Flow
flowchart TD
A["Page Editor"] -->|save| B["pages.content — working draft"]
B -->|publish / snapshot| C["content_versions — immutable v1, v2, ..."]
C -->|translate via DeepL / manual| D["content_translations — per version + lang"]
Version Lifecycle
- Author saves →
pages.contentupdated (working state, no version bump) - Author publishes → new row in
content_versions(hash of content JSON, version++) - Translation triggered → walks content tree, translates per widget, stores
widget_translations+ assembles a fullcontent_translationsrow - Source changes → next publish creates version N+1, all
content_translationsfor version N getis_stale = true - Retranslation → only re-translates widgets whose
source_textchanged (compare hashes)
Serving Translated Pages
When a page is requested with ?lang=fr:
1. Look up content_translations WHERE entity_id = ? AND target_lang = 'fr' AND status = 'published'
2. If found → serve translated_content directly (no extra processing)
3. If not found → serve source content (fallback)
4. If is_stale = true → serve but add X-Translation-Stale: true header
Add lang to the enrichment / cache key in getPagesState() or create a parallel getTranslatedPagesState().
Integration with Existing i18n
The existing i18n_translations table continues to serve as the text-level translation cache (src → dst lookup). The new tables add structural awareness on top:
i18n_translations → text cache (DeepL results, any text)
widget_translations → maps widget+prop → translation pair
content_translations → full translated content snapshot
content_versions → immutable source snapshots
translateTextServer() (from i18n-deepl.ts) remains the engine. The new translation logic calls it per widget prop, then assembles results.
External Translation Services (Crowdin, Phrase, Lokalise)
The Problem
Our page content is deeply nested JSON (RootLayoutData → pages → containers → widgets → props). External TMS platforms don't understand this structure — they work with flat key→value files in standard formats.
We need an extract/inject pipeline that converts between our JSON tree and industry-standard formats.
Exchange Format Strategy
| Format | Best For | Crowdin | Phrase | Lokalise |
|---|---|---|---|---|
| XLIFF 2.0 | Industry standard, rich metadata, tool support | ✅ | ✅ | ✅ |
| Flat JSON | Simple key→value, easy to diff | ✅ | ✅ | ✅ |
| ICU MessageFormat | Plurals, gender, variables | ✅ | ✅ | ✅ |
Recommended primary format: XLIFF 2.0 — it carries source + target in one file, supports notes/context for translators, and every TMS speaks it natively.
Secondary: Flat JSON — for scripting, quick diffs, and lightweight integrations.
Key Design — Stable Translation Keys
Every translatable string gets a stable key derived from its position in the content tree:
page.<page_id>.widget.<widget_id>.<prop_path>
Examples:
page.a1b2c3.widget.w-markdown-1.content
page.a1b2c3.widget.w-tabs-1.tabs.0.label
page.a1b2c3.widget.w-tabs-1.tabs.1.label
page.a1b2c3.container.c-hero.settings.title
page.a1b2c3.meta.title ← page title itself
These keys are widget-ID-based, not position-based. If a widget moves within the page, its key stays the same. If a widget is deleted, its key disappears from the next export.
XLIFF Export Example
<?xml version="1.0" encoding="UTF-8"?>
<xliff version="2.0" srcLang="de" trgLang="en">
<file id="page-a1b2c3" original="page/a1b2c3">
<unit id="page.a1b2c3.meta.title">
<notes>
<note category="context">Page title</note>
<note category="max-length">255</note>
</notes>
<segment>
<source>Kunststoff-Recycling Übersicht</source>
<target>Plastic Recycling Overview</target>
</segment>
</unit>
<unit id="page.a1b2c3.widget.w-md-1.content">
<notes>
<note category="context">Markdown text widget — supports markdown formatting</note>
<note category="widget-type">markdown-text</note>
</notes>
<segment>
<source>## Einleitung\n\nDiese Seite beschreibt...</source>
<target/>
</segment>
</unit>
<unit id="page.a1b2c3.widget.w-tabs-1.tabs.0.label">
<notes>
<note category="context">Tab label</note>
<note category="max-length">50</note>
</notes>
<segment>
<source>Übersicht</source>
<target/>
</segment>
</unit>
</file>
</xliff>
Flat JSON Export Example
{
"_meta": {
"entity_type": "page",
"entity_id": "a1b2c3",
"source_version": 3,
"source_lang": "de",
"exported_at": "2026-02-17T10:00:00Z"
},
"page.a1b2c3.meta.title": "Kunststoff-Recycling Übersicht",
"page.a1b2c3.widget.w-md-1.content": "## Einleitung\n\nDiese Seite beschreibt...",
"page.a1b2c3.widget.w-tabs-1.tabs.0.label": "Übersicht",
"page.a1b2c3.widget.w-tabs-1.tabs.1.label": "Details",
"page.a1b2c3.container.c-hero.settings.title": "Willkommen"
}
Extract → Export → Translate → Import → Inject Pipeline
flowchart LR
subgraph OUR_SYSTEM["Our System"]
CV["content_versions v3"] -->|"1 EXTRACT\niterateWidgets"| KV["Flat key-value map"]
KV -->|"2 EXPORT\nserialize to XLIFF or JSON"| FILE_OUT[".xliff / .json file"]
FILE_IN["Translated .xliff / .json"] -->|"3 IMPORT\nparse to key-value map"| KV_TR["Translated key-value map"]
KV_TR -->|"4 INJECT\nwalk tree, replace strings"| CT["content_translations"]
KV_TR -->|"4 INJECT"| WT["widget_translations"]
end
subgraph TMS["External TMS"]
CROWDIN["Crowdin / Phrase / Lokalise"]
HUMAN["Human translators + MT review"]
CROWDIN --> HUMAN
HUMAN --> CROWDIN
end
FILE_OUT --> CROWDIN
CROWDIN --> FILE_IN
How Human Translation Fits the Status Flow
flowchart TD
A["Machine translate via DeepL"] --> B["status = machine"]
B --> C["Export to TMS"]
C --> D["Human review and edit"]
D --> E["Import back"]
E --> F["status = reviewed"]
F --> G["Editor approves"]
G --> H["status = published"]
- Machine pre-fill: DeepL translates all strings → stored with
status = 'machine' - Export to TMS: export the machine-translated file (with source + target pre-filled) so human translators only need to review and fix, not translate from scratch
- Import from TMS: translated file comes back →
status = 'reviewed' - Publish: editor approves →
status = 'published',content_translationsassembled
API Additions for TMS Interop
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/pages/:id/export/:lang?format=xliff |
Export translatable strings as XLIFF or JSON |
POST |
/api/pages/:id/import/:lang |
Import translated XLIFF or JSON file |
GET |
/api/pages/:id/export/:lang?format=json |
Export as flat JSON |
POST |
/api/i18n/webhook/crowdin |
Crowdin webhook for auto-import on completion |
Crowdin-Specific Integration Notes
- Source files: upload the flat JSON export as a "source file" per page
- File naming:
page-{slug}-v{version}.json— Crowdin tracks versions by filename - Branches: use Crowdin branches to match
content_versions— branch = version - Webhooks: Crowdin fires
file.translated/file.approved→ our webhook imports - In-Context: Crowdin's in-context editing can work via our
?lang=pseudomode that renders keys instead of text
Glossary Sync
The existing i18n_glossaries / i18n_glossary_terms tables can be:
- Exported as TBX (TermBase eXchange) or Crowdin-compatible CSV
- Synced bidirectionally: terms added in Crowdin → imported to our DB → pushed to DeepL glossary
This keeps DeepL machine translations and human translations using the same terminology.
API Surface (Proposed)
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/pages/:id/publish |
Snapshot current content → content_versions |
GET |
/api/pages/:id/versions |
List versions for a page |
GET |
/api/pages/:id/versions/:v |
Get specific version snapshot |
POST |
/api/pages/:id/translate |
Translate page to target lang(s) |
GET |
/api/pages/:id/translations |
List available translations |
GET |
/api/pages/:id/translations/:lang |
Get translated content for lang |
PATCH |
/api/pages/:id/translations/:lang/widgets/:wid |
Update single widget translation |
POST |
/api/content/:type/:id/publish |
Generic publish for any entity type |
POST |
/api/content/:type/:id/translate |
Generic translate for any entity type |
Open Questions / Decisions Needed
-
Publish-on-save vs explicit publish?
Do we auto-version on every save, or require an explicit "Publish" action?
Recommend: explicit publish to avoid version spam. -
Widget-level table — now or later?
widget_translationsadds complexity. We could start with page-level only (content_translations) and add widget-level later.
Recommend: start with both — widget-level is needed for partial retranslation. -
Store full content in
content_versionsor just the hash?
Storing full JSON enables rollback but costs storage.
Recommend: store it — pages are small (< 100 KB each), rollback is high value. -
Which entity types beyond pages?
Posts? Collections? Categories?
Recommend: start with pages only, the schema is generic enough to extend. -
UI for translation management?
A side-by-side translation editor? Or just an "auto-translate" button?
This doc covers the backend schema only — UI TBD.
Migration Priority
| Phase | Scope | Tables |
|---|---|---|
| Phase 1 | Content versioning for pages | content_versions |
| Phase 2 | Page-level translations | content_translations |
| Phase 3 | Widget-level translations | widget_translations ✅ |
| Phase 4 | Extend to posts / collections | Same tables, new entity_type values |
Implemented Features
Client i18n Loading (src/i18n.tsx)
Translations are loaded from src/i18n/*.json using Vite's import.meta.glob with eager: true. This ensures:
- All JSON files are statically included at build time
- Vite HMR pushes updates instantly when a JSON file changes on disk
- No stale module cache issues (unlike dynamic
import())
const langModules = import.meta.glob('./i18n/*.json', { eager: true });
Requested terms (keys seen in the app but not yet translated) are cached in localStorage under i18n-requested-terms. These are merged with the loaded JSON translations, with JSON taking priority.
Glossary Term Editing (DeepL v3 API)
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/i18n/glossaries/:id/terms |
Fetch all terms for a glossary |
PUT |
/api/i18n/glossaries/:id/terms |
Replace all terms (syncs with DeepL v3, updates DB, flushes cache) |
The PUT endpoint uses the DeepL v3 API (PUT /v3/glossaries/{id}/dictionaries) to replace the entire glossary dictionary in TSV format. It then syncs the local DB (i18n_glossary_terms) and updates entry_count.
Client Functions
fetchGlossaryTerms(glossaryId)— fetches term pairs asRecord<string, string>updateGlossaryTerms(glossaryId, entries)— replaces all terms
Playground UI
Glossaries in the management section are expandable — click to load and inline-edit terms. Each glossary row shows:
- Add/delete individual terms
- "Save" button (enabled only when there are unsaved changes via dirty-state detection)
Glossary Selection Improvements
- Bidirectional filter: The glossary dropdown in the Translation section shows glossaries matching the language pair in either direction (e.g. when translating
en→de, bothen→deandde→englossaries appear) - Direction label: Each glossary option shows its direction:
osr (de→en, 2 entries) - DeepL target lang normalization:
en/EN→en-GB,pt/PT→pt-PT(DeepL rejects bareen/pttarget codes)
Widget Translations
Schema (Actual — Deployed)
CREATE TABLE widget_translations (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
entity_type text NOT NULL DEFAULT 'page',
entity_id text, -- nullable for system translations
widget_id text, -- nullable for system translations
prop_path text NOT NULL DEFAULT 'content',
source_lang text NOT NULL,
target_lang text NOT NULL,
source_text text,
translated_text text,
source_version int,
status text DEFAULT 'draft',
meta jsonb DEFAULT '{}',
created_at timestamptz DEFAULT now(),
updated_at timestamptz DEFAULT now(),
CONSTRAINT uq_widget_translation
UNIQUE NULLS NOT DISTINCT (entity_type, entity_id, widget_id, prop_path, target_lang)
);
Uses
NULLS NOT DISTINCTso system translations (with NULLentity_id/widget_id) are still properly deduplicated. The unique constraint is required by PostgREST for upsert conflict resolution.
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/i18n/widget-translations |
Query with filters: entity_type, entity_id, widget_id, target_lang |
PUT |
/api/i18n/widget-translations |
Upsert single translation |
PUT |
/api/i18n/widget-translations/batch |
Upsert multiple translations |
DELETE |
/api/i18n/widget-translations/:id |
Delete by ID |
DELETE |
/api/i18n/widget-translations/entity/:type/:id |
Delete all translations for an entity (optional ?target_lang=) |
Client Functions
fetchWidgetTranslations(filters)— query with optional entity/widget/lang filtersupsertWidgetTranslation(input)— upsert a single translationupsertWidgetTranslationsBatch(inputs)— upsert multiple (used by "Update Database")deleteWidgetTranslation(id)— delete by IDdeleteWidgetTranslationsByEntity(type, id, lang?)— bulk delete
Update i18n Language Files
API Endpoint
| Method | Endpoint | Description |
|---|---|---|
PUT |
/api/i18n/update-lang-file |
Merge translations into src/i18n/{lang}.json |
Request body: { lang: string, entries: Record<string, string> }
Behavior:
- Reads
CLIENT_SRC_PATHfrom server.env(set to../) - Resolves
${CLIENT_SRC_PATH}/src/i18n/${lang}.json - Reads existing file, merges new entries (skips empty values)
- Sorts alphabetically by key
- Writes back with
JSON.stringify(sorted, null, 2) - Returns
{ success, total, added, updated }
Client function: updateLangFile(lang, entries)
Playground UI — Widget Translations Section
The i18n Playground (/playground → i18n tab) provides a full management UI:
Search & Filter
- Entity type / Entity ID / Widget ID / Target lang — server-side filters for querying
- Client-side search — filter loaded results by source or translation text (case-insensitive)
- Show missing toggle — filter to untranslated entries only
Row Selection
- Checkbox per row with select-all in header
- Selected rows get a subtle highlight
- Selection affects: batch translate, Update Database, and Update i18n
Batch Translation
- Glossary picker — select a glossary for batch translation (shows all glossaries with direction labels)
- Translate All Missing / Translate Selected — batch-translates via DeepL
- Progress indicator during batch translation
Persistence Actions
- 🟠 Update Database — batch-upserts translated entries to Supabase via
upsertWidgetTranslationsBatch - 🟢 Update i18n — merges translations into
src/i18n/{lang}.jsonfiles (groups bytarget_lang, usessource_textas key)
Both buttons respect checkbox selection: if rows are selected, only those are processed; otherwise all translated entries.
Import from i18n
- Import from app — loads terms from
localStoragerequested-terms cache, cross-references with existing translations, and populates the list
Inline Editing
- Click any row to expand and edit source text, translated text, status, and metadata
- Single-row translate button (DeepL) in edit mode
Environment Variables
| Variable | Location | Value | Purpose |
|---|---|---|---|
CLIENT_SRC_PATH |
server/.env |
../ |
Path to client source root (for writing src/i18n/*.json) |
CLIENT_DIST_PATH |
server/.env |
../dist |
Path to client build output |
E2E Tests (i18n.e2e.test.ts)
Tests cover:
- Glossary CRUD (create, list, get terms, update terms via DeepL v3, delete)
- Translation with glossary
- Widget translation CRUD (upsert, batch upsert, query, delete)
- Authentication checks (401 for unauthorized requests)