Most content teams produce their article first and then scramble for visuals. A designer receives the finished article, skims it for visual cues, and produces a hero image that may or may not illustrate the article's actual argument. The video team gets a separate brief entirely. The result: visuals that look professional but feel disconnected from the content they accompany.
The IO Platform eliminates this disconnect by running the Image and Video libraries from the same Context Brief that produces the article. When the Article Library writes about “parallel dispatch architecture,” the Image Library knows to produce a diagram of that architecture. The Video Library knows to reference the same concept in its hook. Coherence is architectural, not editorial.
This article breaks down both libraries: how the Image Library's 8-prompt chain translates natural language style descriptions into structured image generation parameters, and how the Video Library's 6-prompt chain produces platform-optimized concept scripts with hooks calibrated to the article's key insights.
Visual Architecture
The visual pipeline reads three Context Brief fields that the text-focused libraries largely ignore: Visual Style (the aesthetic vocabulary), Core Thesis (what the visual must communicate), and Brand Identity (whose visual language to use). From these three fields, the Image Library derives a complete DALL-E style directive and the Video Library derives platform-specific visual hooks.
What makes this different from tools that generate random images from article text is the structural relationship between the visual and the argument. The Image Library does not generate “a picture about content pipelines.” It generates a specific diagram of the 9-library hub-and-spoke architecture described in the article, using the visual vocabulary specified in the brief. The visual illustrates the argument — not the topic.
The Image Library doesn’t generate random images. It reads the same brief the Article Library reads and produces visuals that illustrate the article’s actual argument. Coherence is architectural, not editorial.
The Image Library — 8 Prompts
The Image Library runs 8 sequential prompts, each producing a specific visual asset or parameter set. The chain begins with brief analysis (extracting the visual vocabulary from the Context Brief), moves through style directive generation (translating natural language into structured generation parameters), and ends with variant production (generating 3 hero image concepts, social-sized variants, and thumbnails).
The Video Library — 6 Prompts
The Video Library produces concept scripts, not finished videos. Each script includes a platform-optimized hook (the first 3 seconds), a body that references the article's key insight, and a call-to-action. It generates 3 concept variants — one for short-form (TikTok/Reels), one for mid-form (YouTube Shorts), and one for long-form (YouTube).
Style Transfer from Brief to Asset
The most technically interesting step in both libraries is the style transfer — translating a natural language description like “Dark editorial. Playfair + DM Sans. Electric blue primary. Animated network art. Precision over decoration” into structured parameters that an image generation model or a video script can execute against.
The Image Library does this in prompt 2 (Style Directive), which outputs a structured JSON containing: color palette (primary, secondary, accent, background), typography references, composition rules (grid-based vs. organic, symmetric vs. asymmetric), texture vocabulary (clean vs. gritty, flat vs. dimensional), and lighting parameters (high-key vs. low-key, directional vs. ambient). This structured directive is then consumed by prompts 3–7, ensuring all image variants share the same visual vocabulary.
The Video Library performs a similar translation in its brief analysis step, deriving: pacing (fast-cut vs. long-hold), tone (urgent vs. contemplative), visual reference style (talking head vs. kinetic typography vs. screen capture), and hook structure (question-led vs. statement-led vs. visual surprise). These parameters ensure that all three format variants — short, mid, and long — feel like they belong to the same content package.