16 KiB
i18n — Content Translation & Versioning
Proposal for translating pages, widgets, and other content types with version tracking.
Status Quo
| What exists | Where |
|---|---|
i18n_translations — flat src_text → dst_text cache |
db-i18n.ts |
i18n_glossaries / i18n_glossary_terms — DeepL glossary sync |
db-i18n.ts |
| DeepL server-side translate — translate + cache in one call | i18n-deepl.ts |
@polymech/i18n — shared clean() helper etc. |
monorepo package |
The existing system translates arbitrary text blobs. It has no awareness of:
- Which page / widget a translation belongs to
- Which version of the source content was translated
- Structural identity — if a widget moves or is deleted, orphaned translations linger
Goals
- Page-level translations — a translated "snapshot" of an entire page
- Widget-level translations — translate individual widget text props independently
- Content versioning — track which source version a translation was produced from, detect drift
- Reuse existing infra —
i18n_translationsstays as the text cache, DeepL stays as the engine
Proposed Database Schema
1. content_versions
Tracks every published snapshot of any content entity (pages, posts, collections, …).
create table content_versions (
id uuid primary key default gen_random_uuid(),
entity_type text not null, -- 'page' | 'post' | 'collection'
entity_id uuid not null, -- pages.id / posts.id / …
version int not null default 1, -- monotonic per entity
content_hash text not null, -- sha256 of JSON content
content jsonb, -- snapshot of content at this version (optional, for rollback)
meta jsonb default '{}', -- { author, change_note, … }
created_at timestamptz default now(),
created_by uuid references auth.users(id),
unique (entity_type, entity_id, version)
);
create index idx_cv_entity on content_versions (entity_type, entity_id);
Why a separate table?
Thepagestable stores the current working state.
content_versionsstores immutable snapshots you can diff, rollback, or translate against.
2. content_translations
Links a translated content blob to a specific source version + language.
create type translation_status as enum ('draft', 'machine', 'reviewed', 'published');
create table content_translations (
id uuid primary key default gen_random_uuid(),
entity_type text not null,
entity_id uuid not null,
source_version int not null, -- FK-like ref to content_versions.version
source_lang text not null default 'de',
target_lang text not null,
status translation_status default 'draft',
-- Translated payload (same shape as source content)
translated_content jsonb, -- full page JSON with translated strings
-- Drift detection
source_hash text, -- hash of source at translation time
is_stale boolean default false, -- set true when source gets a newer version
meta jsonb default '{}', -- { translator, provider, cost, … }
created_at timestamptz default now(),
updated_at timestamptz default now(),
translated_by uuid references auth.users(id),
unique (entity_type, entity_id, source_version, target_lang)
);
create index idx_ct_entity on content_translations (entity_type, entity_id, target_lang);
3. widget_translations (optional — granular level)
For widget-by-widget translation without duplicating the whole page JSON.
create table widget_translations (
id uuid primary key default gen_random_uuid(),
entity_type text not null default 'page',
entity_id uuid not null,
widget_id text not null, -- WidgetInstance.id from the JSON tree
prop_path text not null default 'content', -- e.g. 'content', 'label', 'placeholder'
source_lang text not null,
target_lang text not null,
source_text text not null,
translated_text text not null,
source_version int, -- which content_version this was derived from
status translation_status default 'machine',
meta jsonb default '{}',
created_at timestamptz default now(),
updated_at timestamptz default now(),
unique (entity_type, entity_id, widget_id, prop_path, target_lang)
);
create index idx_wt_entity on widget_translations (entity_type, entity_id, target_lang);
Why both
content_translationsandwidget_translations?
content_translations= "give me the whole page in French" (fast serve)widget_translations= "give me just widget X in French" (granular edit, partial retranslation)
When serving, we prefercontent_translations(single read). When editing, we usewidget_translationsfor surgical updates.
Translatable Widget Props
Not every widget property needs translation. Here's the map of translatable text:
| Widget Type | Translatable Props |
|---|---|
html-widget |
content |
markdown-text |
content |
tabs-widget |
tabs[].label |
layout-container-widget |
nestedPageName |
photo-card |
— (title/description from pictures table) |
gallery-widget |
— |
file-browser |
— |
| Container (settings) | settings.title |
The shared function iterateWidgets() from @polymech/shared can walk the full content tree to extract translatable strings per widget.
Content Versioning Flow
flowchart TD
A["Page Editor"] -->|save| B["pages.content — working draft"]
B -->|publish / snapshot| C["content_versions — immutable v1, v2, ..."]
C -->|translate via DeepL / manual| D["content_translations — per version + lang"]
Version Lifecycle
- Author saves →
pages.contentupdated (working state, no version bump) - Author publishes → new row in
content_versions(hash of content JSON, version++) - Translation triggered → walks content tree, translates per widget, stores
widget_translations+ assembles a fullcontent_translationsrow - Source changes → next publish creates version N+1, all
content_translationsfor version N getis_stale = true - Retranslation → only re-translates widgets whose
source_textchanged (compare hashes)
Serving Translated Pages
When a page is requested with ?lang=fr:
1. Look up content_translations WHERE entity_id = ? AND target_lang = 'fr' AND status = 'published'
2. If found → serve translated_content directly (no extra processing)
3. If not found → serve source content (fallback)
4. If is_stale = true → serve but add X-Translation-Stale: true header
Add lang to the enrichment / cache key in getPagesState() or create a parallel getTranslatedPagesState().
Integration with Existing i18n
The existing i18n_translations table continues to serve as the text-level translation cache (src → dst lookup). The new tables add structural awareness on top:
i18n_translations → text cache (DeepL results, any text)
widget_translations → maps widget+prop → translation pair
content_translations → full translated content snapshot
content_versions → immutable source snapshots
translateTextServer() (from i18n-deepl.ts) remains the engine. The new translation logic calls it per widget prop, then assembles results.
External Translation Services (Crowdin, Phrase, Lokalise)
The Problem
Our page content is deeply nested JSON (RootLayoutData → pages → containers → widgets → props). External TMS platforms don't understand this structure — they work with flat key→value files in standard formats.
We need an extract/inject pipeline that converts between our JSON tree and industry-standard formats.
Exchange Format Strategy
| Format | Best For | Crowdin | Phrase | Lokalise |
|---|---|---|---|---|
| XLIFF 2.0 | Industry standard, rich metadata, tool support | ✅ | ✅ | ✅ |
| Flat JSON | Simple key→value, easy to diff | ✅ | ✅ | ✅ |
| ICU MessageFormat | Plurals, gender, variables | ✅ | ✅ | ✅ |
Recommended primary format: XLIFF 2.0 — it carries source + target in one file, supports notes/context for translators, and every TMS speaks it natively.
Secondary: Flat JSON — for scripting, quick diffs, and lightweight integrations.
Key Design — Stable Translation Keys
Every translatable string gets a stable key derived from its position in the content tree:
page.<page_id>.widget.<widget_id>.<prop_path>
Examples:
page.a1b2c3.widget.w-markdown-1.content
page.a1b2c3.widget.w-tabs-1.tabs.0.label
page.a1b2c3.widget.w-tabs-1.tabs.1.label
page.a1b2c3.container.c-hero.settings.title
page.a1b2c3.meta.title ← page title itself
These keys are widget-ID-based, not position-based. If a widget moves within the page, its key stays the same. If a widget is deleted, its key disappears from the next export.
XLIFF Export Example
<?xml version="1.0" encoding="UTF-8"?>
<xliff version="2.0" srcLang="de" trgLang="en">
<file id="page-a1b2c3" original="page/a1b2c3">
<unit id="page.a1b2c3.meta.title">
<notes>
<note category="context">Page title</note>
<note category="max-length">255</note>
</notes>
<segment>
<source>Kunststoff-Recycling Übersicht</source>
<target>Plastic Recycling Overview</target>
</segment>
</unit>
<unit id="page.a1b2c3.widget.w-md-1.content">
<notes>
<note category="context">Markdown text widget — supports markdown formatting</note>
<note category="widget-type">markdown-text</note>
</notes>
<segment>
<source>## Einleitung\n\nDiese Seite beschreibt...</source>
<target/>
</segment>
</unit>
<unit id="page.a1b2c3.widget.w-tabs-1.tabs.0.label">
<notes>
<note category="context">Tab label</note>
<note category="max-length">50</note>
</notes>
<segment>
<source>Übersicht</source>
<target/>
</segment>
</unit>
</file>
</xliff>
Flat JSON Export Example
{
"_meta": {
"entity_type": "page",
"entity_id": "a1b2c3",
"source_version": 3,
"source_lang": "de",
"exported_at": "2026-02-17T10:00:00Z"
},
"page.a1b2c3.meta.title": "Kunststoff-Recycling Übersicht",
"page.a1b2c3.widget.w-md-1.content": "## Einleitung\n\nDiese Seite beschreibt...",
"page.a1b2c3.widget.w-tabs-1.tabs.0.label": "Übersicht",
"page.a1b2c3.widget.w-tabs-1.tabs.1.label": "Details",
"page.a1b2c3.container.c-hero.settings.title": "Willkommen"
}
Extract → Export → Translate → Import → Inject Pipeline
flowchart LR
subgraph OUR_SYSTEM["Our System"]
CV["content_versions v3"] -->|"1 EXTRACT\niterateWidgets"| KV["Flat key-value map"]
KV -->|"2 EXPORT\nserialize to XLIFF or JSON"| FILE_OUT[".xliff / .json file"]
FILE_IN["Translated .xliff / .json"] -->|"3 IMPORT\nparse to key-value map"| KV_TR["Translated key-value map"]
KV_TR -->|"4 INJECT\nwalk tree, replace strings"| CT["content_translations"]
KV_TR -->|"4 INJECT"| WT["widget_translations"]
end
subgraph TMS["External TMS"]
CROWDIN["Crowdin / Phrase / Lokalise"]
HUMAN["Human translators + MT review"]
CROWDIN --> HUMAN
HUMAN --> CROWDIN
end
FILE_OUT --> CROWDIN
CROWDIN --> FILE_IN
How Human Translation Fits the Status Flow
flowchart TD
A["Machine translate via DeepL"] --> B["status = machine"]
B --> C["Export to TMS"]
C --> D["Human review and edit"]
D --> E["Import back"]
E --> F["status = reviewed"]
F --> G["Editor approves"]
G --> H["status = published"]
- Machine pre-fill: DeepL translates all strings → stored with
status = 'machine' - Export to TMS: export the machine-translated file (with source + target pre-filled) so human translators only need to review and fix, not translate from scratch
- Import from TMS: translated file comes back →
status = 'reviewed' - Publish: editor approves →
status = 'published',content_translationsassembled
API Additions for TMS Interop
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/pages/:id/export/:lang?format=xliff |
Export translatable strings as XLIFF or JSON |
POST |
/api/pages/:id/import/:lang |
Import translated XLIFF or JSON file |
GET |
/api/pages/:id/export/:lang?format=json |
Export as flat JSON |
POST |
/api/i18n/webhook/crowdin |
Crowdin webhook for auto-import on completion |
Crowdin-Specific Integration Notes
- Source files: upload the flat JSON export as a "source file" per page
- File naming:
page-{slug}-v{version}.json— Crowdin tracks versions by filename - Branches: use Crowdin branches to match
content_versions— branch = version - Webhooks: Crowdin fires
file.translated/file.approved→ our webhook imports - In-Context: Crowdin's in-context editing can work via our
?lang=pseudomode that renders keys instead of text
Glossary Sync
The existing i18n_glossaries / i18n_glossary_terms tables can be:
- Exported as TBX (TermBase eXchange) or Crowdin-compatible CSV
- Synced bidirectionally: terms added in Crowdin → imported to our DB → pushed to DeepL glossary
This keeps DeepL machine translations and human translations using the same terminology.
API Surface (Proposed)
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/pages/:id/publish |
Snapshot current content → content_versions |
GET |
/api/pages/:id/versions |
List versions for a page |
GET |
/api/pages/:id/versions/:v |
Get specific version snapshot |
POST |
/api/pages/:id/translate |
Translate page to target lang(s) |
GET |
/api/pages/:id/translations |
List available translations |
GET |
/api/pages/:id/translations/:lang |
Get translated content for lang |
PATCH |
/api/pages/:id/translations/:lang/widgets/:wid |
Update single widget translation |
POST |
/api/content/:type/:id/publish |
Generic publish for any entity type |
POST |
/api/content/:type/:id/translate |
Generic translate for any entity type |
Open Questions / Decisions Needed
-
Publish-on-save vs explicit publish?
Do we auto-version on every save, or require an explicit "Publish" action?
Recommend: explicit publish to avoid version spam. -
Widget-level table — now or later?
widget_translationsadds complexity. We could start with page-level only (content_translations) and add widget-level later.
Recommend: start with both — widget-level is needed for partial retranslation. -
Store full content in
content_versionsor just the hash?
Storing full JSON enables rollback but costs storage.
Recommend: store it — pages are small (< 100 KB each), rollback is high value. -
Which entity types beyond pages?
Posts? Collections? Categories?
Recommend: start with pages only, the schema is generic enough to extend. -
UI for translation management?
A side-by-side translation editor? Or just an "auto-translate" button?
This doc covers the backend schema only — UI TBD.
Migration Priority
| Phase | Scope | Tables |
|---|---|---|
| Phase 1 | Content versioning for pages | content_versions |
| Phase 2 | Page-level translations | content_translations |
| Phase 3 | Widget-level translations | widget_translations |
| Phase 4 | Extend to posts / collections | Same tables, new entity_type values |