Introduction
Creating videos traditionally involves multiple steps: sourcing materials, writing scripts, storyboarding, animating, transitioning, adding music, and exporting. This process can take a considerable amount of time, even for non-professional projects.
In a recent example, I provided Codex with a simple description: create a video themed around “elderly people learning AI” using Hermes in conjunction with HyperFrames. Codex autonomously handled project setup, image asset generation, HTML video engineering, animation and sound effects, previewing, checking, and rendering.
I spent about 1 minute, while Codex took approximately 59 minutes to generate a 32.02-second video at a resolution of 1920×1080 and 60fps.

The most astonishing aspect is not just that AI can create videos, but that images, scripts, animations, transitions, music, and sound effects can be organized by the agent through a simple objective. With a bit of detailed description, everything can be precisely adjustable.
This article discusses the new approach of Codex + HyperFrames.
Note: Tools like Hermes and Openclaw can also be utilized.
What is HyperFrames?
In simple terms:
HyperFrames is an open-source framework that allows AI agents to create videos using HTML, CSS, and JavaScript.
Its official description is straightforward: Write HTML. Render video. Built for agents. The GitHub README clarifies that HyperFrames is an open-source video rendering framework designed to create, preview, and render HTML-based video compositions, with first-class support for AI agents.
You can think of it as:
Previously, when creating web pages, you wrote HTML, CSS, and JS. Now, for video production, you can also write visuals, animations, subtitles, transitions, cards, and sound effects as a web-style project. HyperFrames captures each frame and renders it into MP4 or WebM.
The official HyperFrames Skill documentation states that HTML is the “source of truth” for videos. A composition is an HTML file with data-* time attributes, combined with GSAP timeline animations and CSS styles. The HyperFrames engine captures the page frame by frame and encodes it into MP4/WebM using FFmpeg.
This explanation is crucial because it clarifies why such videos can achieve “precise adjustability.” It does not rely solely on a video model to produce uncontrollable output. Instead, it allows for detailed specifications such as:
- Where visual elements appear
- When they slide in
- How transparency changes
- Text size
- Duration of card displays
- Transition effects
- Timing of music
- Camera movements
These can all be coded, making adjustments manageable compared to traditional AI video models.
Differences from Traditional AI Video Generation
When people think of AI video, they often imagine inputting a prompt and having a model generate a video directly. While this method is impressive, it has a significant drawback: controlling details is challenging.
For instance, if you want a title to appear at the 3-second mark and fade out at 5 seconds, align subtitles with music rhythm, or animate a button to flash at the 8-second mark, a pure video generation model may struggle. Achieving precise adjustments can be costly in terms of time and resources.
HyperFrames takes a different approach, resembling “web animation + video rendering.” According to the official homepage, HyperFrames enables AI agents to compose videos using HTML, CSS, and JS, and it is open-source under the Apache 2.0 license.
This makes it particularly suitable for videos such as:
- Product introductions
- Knowledge cards
- Title animations
- Short social media videos
- Explanatory videos
- Data visualizations
- Subtitle synchronization
- Text-image combinations
- Web-to-video conversions
- Rhythm-based motion videos
It does not replace all video generation models but is better suited for videos with clear structures, defined elements, and controllable animations.
Why Codex is Suitable for HyperFrames
The core of HyperFrames is code. It requires writing HTML, CSS, and JS, organizing projects, running lint checks, previewing, rendering, and debugging. This is where coding agents like Codex excel.
The recommended method in the HyperFrames official Quickstart documentation is to use it alongside an AI coding agent. After installing HyperFrames skills, you simply describe the desired video, and the agent learns to write the correct composition, GSAP timelines, Tailwind v4 browser runtime styles, and first-class adapter animations. Supported agents include Claude Code, Cursor, Gemini CLI, and Codex.
The GitHub README specifically mentions Codex: the same set of skills is also available as an OpenAI Codex plugin, which can be installed using the command codex plugin marketplace add heygen-com/hyperframes --sparse .codex-plugin --sparse skills --sparse assets.
This is why the recent case was successful. I did not manually write complex video projects; I simply communicated the goal to Codex. Codex, using the context of HyperFrames skills, autonomously built the project, wrote the page, generated assets, checked, previewed, and rendered it. It was akin to having a coding assistant for video production—just provide the requirements, and it breaks them down into a project.
Hermes Can Also Use HyperFrames
It is important to note that HyperFrames is not exclusive to Codex or Claude Code. The Hermes official Skills page lists HyperFrames as an optional skill. The documentation states that it can create HTML-based video compositions, animated title cards, social overlays, talking-head videos with subtitles, audio-responsive visual effects, and shader transitions. It is suitable for users wanting to render MP4/WebM from HTML compositions, add text and animated charts, synchronize subtitles with audio, or convert websites into videos.
The installation command for Hermes is:
hermes skills install official/creative/hyperframes
Its metadata indicates that the source is optional, the author is heygen-com, the license is Apache-2.0, and tags include creative, video, animation, html, gsap, and motion-graphics.
Thus, while this article focuses on Codex + HyperFrames, it does not imply that only Codex can utilize this approach. Any agent capable of using this skill can follow the same methodology, including Hermes, Claude Code, Cursor, and Codex. The differences lie mainly in:
- Installation methods
- How agents call skills
- Tool permissions and local environments
- Personal preference for iteration within a specific agent
Value of This Case Study
The most insightful aspect of this case study is not the complexity of the prompts I wrote. On the contrary, I simply provided a natural language objective:
Create a video themed around “elderly people learning AI” using Hermes and HyperFrames, with cartoon-style visuals and characters generated by Image Gen, and video produced by HyperFrames, featuring natural movements, special effects for scene transitions, pleasant music, and no voice.
This is not a traditional “super prompt” but more like assigning a task to a production assistant. Once Codex received the objective, it autonomously performed several tasks:
- Created a HyperFrames project
- Established visual guidelines
- Generated character, scene, and interface image assets
- Developed a non-narrated storyboard animation and sound effects
- Ran lint checks, inspected, rendered, and validated the final product
- Launched previews and delivered paths
From the screenshots, you can see it completed a progress list, including project creation and visual guidelines, asset generation, animation and sound effect development, lint/inspect/render validation, and preview initiation.
This illustrates the value of AI + Skills: you do not need to instruct it on every step. You provide the goal, and it advances based on the established processes within the skills.
Of course, this does not mean every video will be perfect on the first try. However, it demonstrates that current agents can decompose “video production” into manageable project tasks.
Why Simple Prompts Can Yield Good Results
The reason simple prompts can still produce satisfactory results is that skills provide agents with methods. Many users place all their hopes on prompts when using AI tools, believing that longer prompts yield better results.
However, the logic of Agent + Skill differs. Skills already encapsulate many professional processes. For instance, the first step in the Hermes HyperFrames Skill documentation is “Plan before writing HTML,” which requires the agent to clarify the narrative arc, key moments, emotional beats, structure, tracks, durations, visual identity, motion character, and hero frame.
In layman’s terms:
- First, think about the storyline
- Next, consider key moments
- Then, outline the visual structure
- Determine video duration
- Decide on colors and fonts
- Consider motion characteristics
- Identify the most crucial frame for each scene
Thus, you do not need to embed all these processes into the prompt. The skill will prompt the agent to handle them independently. This is why a simple objective can still yield impressive results: the prompt specifies “what I want,” while the skill instructs the agent on “how to do it.”
Installation and Getting Started
If you are using a general AI coding agent, you can install HyperFrames as per the official Quickstart:
npx skills add heygen-com/hyperframes
The official documentation states that this will teach the agent how to write the correct composition, GSAP timeline, Tailwind v4 browser runtime styles, and adapter animations.
For Codex, you can follow the Codex plugin installation method outlined in the GitHub README:
codex plugin marketplace add heygen-com/hyperframes --sparse .codex-plugin --sparse skills --sparse assets
For Hermes, you can install the official optional skill:
hermes skills install official/creative/hyperframes
You will also need to set up the HyperFrames runtime environment. The Hermes HyperFrames Skill documentation states that a one-time setup script will check for Node.js >= 22 and FFmpeg, install the global hyperframes CLI, pre-cache Puppeteer’s chrome-headless-shell, and run npx hyperframes doctor.
Key commands to remember include:
npx hyperframes init my-video
cd my-video
npx hyperframes lint
npx hyperframes preview
npx hyperframes render --output final.mp4
npx hyperframes doctor
The Hermes official Skill documentation also lists these Quick References: init, lint, preview, render, doctor, and render supports parameters such as –quality draft|standard|high, –fps 24|30|60, –format mp4|webm, –docker, –strict, etc.
These commands illustrate one thing: HyperFrames is not a black box where you just click to generate a video.
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.