7.6 KiB
AI Page Generation Wizard
1. Overview
The AI Page Generation Wizard introduces a high-level, AI-driven workflow for creating complete pages, not just individual images. It leverages the existing voice input and image generation capabilities to offer a seamless "voice-to-page" experience. Users can dictate an idea, and the AI will generate a fully-formed page containing rich text content, embedded images, and appropriate metadata like tags and a title.
This feature will be accessed through a new, unified creation popup in the header, which will serve as a central starting point for both the existing Image Wizard and the new Page Wizard.
2. User Interface & Flow
New Entry Popup
A new button will be added to the Header, triggering a "Creation Wizard" popup. This popup will be the primary entry point for AI-assisted content creation.
Popup Design:
- Title: What would you like to create?
- Two Main Options:
- Generate Image: For creating standalone images.
- Fast & Direct: Opens the standard Image Wizard.
- Smart & Optimized: Opens the Image Wizard in Agent mode.
- Voice + AI: Opens the Image Wizard's voice agent popup directly.
- Create Page: For generating entire pages.
- From Scratch: Opens a new page editor.
- AI Agent: A future feature for more complex, multi-step page generation.
- Voice + AI: This is the primary new flow. It opens a voice recording UI to start the voice-to-page process.
- Generate Image: For creating standalone images.
Voice-to-Page Flow
- Initiation: The user clicks the "Voice + AI" button under "Create Page".
- Recording: A voice recording modal appears (reusing the component from
ImageWizard). The user describes the page they want to create (e.g., "Write a tutorial on how to brew the perfect cup of green tea, include an image of a serene tea setup"). - Processing: The UI shows a status progression:
Transcribing...->Generating content...->Creating page.... - Completion: Once the page is created, the user is automatically redirected to the new page in view mode. A success toast notification confirms the creation.
3. Dependencies
This feature will leverage many existing parts of the application and introduce a few new components.
Existing Components & Modules to Reuse:
Header.tsx: To add the new wizard trigger button.ImageWizard.tsx/VoiceRecordingPopup.tsx: The UI for voice recording and transcription.lib/openai.ts: The corerunToolsfunction,zodFunctionhelper, and existing tool definitions (transcribeAudioTool). We will add a new preset and tools here.lib/markdownImageTools.ts: ThegenerateTextWithImagesToolwill be crucial for the AI to generate the main content of the page.integrations/supabase/client.ts: For database interactions within the new page creation tool.pages/UserPage.tsx: The destination view for the newly created page.
New Components & Modules to Create:
components/CreationWizardPopup.tsx: The new modal that serves as the entry point.hooks/usePageGenerator.ts: A new hook to orchestrate the multi-step voice-to-page generation process.lib/pageTools.ts: A new file to house the AI tool(s) responsible for page creation to keep concerns separated.
4. Implementation Plan
The implementation can be broken down into the following tasks:
-
Task 1: Create New AI Tools for Page Management
- Create a new file:
src/lib/pageTools.ts. - Define a new tool
createPageToolusingzodFunction. - Schema:
({ title: string, content: string, tags: string[], slug: string, is_public?: boolean, visible?: boolean }). - Functionality:
- It will accept the page title, markdown content, and tags.
- It must format the markdown content into the required page JSON structure (with
containers,widgets, andwidgetId: "markdown-text"). - It will insert a new row into the
pagestable in Supabase. - It will return the
slugof the newly created page so the UI can navigate to it.
- Create a new file:
-
Task 2: Define a New
runToolsPreset- In
src/lib/openai.ts, create a new preset called'page-generator'. - Tools: This preset will include
generateTextWithImagesTool(frommarkdownImageTools.ts) and the newcreatePageTool(frompageTools.ts). - System Prompt: A detailed system prompt will guide the LLM through the process:
- First, understand the user's request from the transcribed text.
- Use the
generateTextWithImagesToolto create rich markdown content, including one or more relevant images. - From the generated content, derive a concise title and a list of relevant tags.
- Generate a URL-friendly slug from the title.
- Finally, call the
createPageToolwith the title, slug, tags, and the full markdown content to save the page.
- In
-
Task 3: Develop the Orchestration Logic
- Create a new hook
usePageGenerator(src/hooks/usePageGenerator.ts). - This hook will manage the state of the voice-to-page flow (
isTranscribing,isGenerating,isCreating). - It will contain a function, e.g.,
generatePageFromVoice(audioFile), which:- Calls
transcribeAudio. - Calls
runToolswith the'page-generator'preset and the transcribed text. - Processes the result from
createPageToolto get the new page slug. - Uses the
navigatefunction fromreact-router-domto redirect the user.
- Calls
- Create a new hook
-
Task 4: Build the UI Components
- Create the
CreationWizardPopup.tsxcomponent with the layout described in the UI section. - Add a new state and button to
Header.tsxto open this popup. - The "Voice + AI" button for page creation will trigger the
usePageGeneratorlogic and display the status to the user.
- Create the
5. Sequence Diagram (Mermaid)
This diagram illustrates the full voice-to-page workflow.
sequenceDiagram
participant User
participant HeaderUI
participant CreationWizardPopup
participant VoiceUI
participant PageGeneratorHook
participant OpenAIApi as OpenAI API
participant SupabaseDB as Supabase DB
User->>HeaderUI: Clicks "Create" button
HeaderUI->>CreationWizardPopup: Opens popup
User->>CreationWizardPopup: Selects "Create Page" -> "Voice + AI"
CreationWizardPopup->>VoiceUI: Opens voice recorder
User->>VoiceUI: Records voice command
VoiceUI-->>PageGeneratorHook: onTranscriptionComplete(audioBlob)
PageGeneratorHook->>OpenAIApi: transcribeAudio(audioBlob)
OpenAIApi-->>PageGeneratorHook: Returns transcribed text
PageGeneratorHook->>OpenAIApi: runTools('page-generator', transcribedText)
note right of OpenAIApi: System prompt instructs AI to:<br/>1. Call generateTextWithImagesTool<br/>2. Call createPageTool
OpenAIApi->>OpenAIApi: 1. generateTextWithImagesTool(prompt)
note right of OpenAIApi: This internally calls<br/>createImage and uploads to storage
OpenAIApi-->>OpenAIApi: Returns markdown with image URLs
OpenAIApi->>OpenAIApi: 2. createPageTool(title, content, ...)
OpenAIApi-->>PageGeneratorHook: Tool call to create page
PageGeneratorHook->>SupabaseDB: INSERT INTO pages (title, content, ...)
SupabaseDB-->>PageGeneratorHook: Returns new page data (incl. slug)
PageGeneratorHook->>CreationWizardPopup: Page creation successful (returns slug)
CreationWizardPopup->>User: Navigates to new page URL (/user/.../pages/new-slug)