Exploring AI Video Creation with Codex and HyperFrames

Introduction

Creating videos traditionally involves multiple steps: sourcing materials, writing scripts, storyboarding, animating, transitioning, adding music, and exporting. This process can take a considerable amount of time, even for non-professional projects.

In a recent example, I provided Codex with a simple description: create a video themed around “elderly people learning AI” using Hermes in conjunction with HyperFrames. Codex autonomously handled project setup, image asset generation, HTML video engineering, animation and sound effects, previewing, checking, and rendering.

I spent about 1 minute, while Codex took approximately 59 minutes to generate a 32.02-second video at a resolution of 1920×1080 and 60fps.

The most astonishing aspect is not just that AI can create videos, but that images, scripts, animations, transitions, music, and sound effects can be organized by the agent through a simple objective. With a bit of detailed description, everything can be precisely adjustable.

This article discusses the new approach of Codex + HyperFrames.

Note: Tools like Hermes and Openclaw can also be utilized.

What is HyperFrames?

In simple terms:

HyperFrames is an open-source framework that allows AI agents to create videos using HTML, CSS, and JavaScript.

Its official description is straightforward: Write HTML. Render video. Built for agents. The GitHub README clarifies that HyperFrames is an open-source video rendering framework designed to create, preview, and render HTML-based video compositions, with first-class support for AI agents.

You can think of it as:

Previously, when creating web pages, you wrote HTML, CSS, and JS. Now, for video production, you can also write visuals, animations, subtitles, transitions, cards, and sound effects as a web-style project. HyperFrames captures each frame and renders it into MP4 or WebM.

The official HyperFrames Skill documentation states that HTML is the “source of truth” for videos. A composition is an HTML file with data-* time attributes, combined with GSAP timeline animations and CSS styles. The HyperFrames engine captures the page frame by frame and encodes it into MP4/WebM using FFmpeg.

This explanation is crucial because it clarifies why such videos can achieve “precise adjustability.” It does not rely solely on a video model to produce uncontrollable output. Instead, it allows for detailed specifications such as:

Where visual elements appear
When they slide in
How transparency changes
Text size
Duration of card displays
Transition effects
Timing of music
Camera movements

These can all be coded, making adjustments manageable compared to traditional AI video models.

Differences from Traditional AI Video Generation

When people think of AI video, they often imagine inputting a prompt and having a model generate a video directly. While this method is impressive, it has a significant drawback: controlling details is challenging.

For instance, if you want a title to appear at the 3-second mark and fade out at 5 seconds, align subtitles with music rhythm, or animate a button to flash at the 8-second mark, a pure video generation model may struggle. Achieving precise adjustments can be costly in terms of time and resources.

HyperFrames takes a different approach, resembling “web animation + video rendering.” According to the official homepage, HyperFrames enables AI agents to compose videos using HTML, CSS, and JS, and it is open-source under the Apache 2.0 license.

This makes it particularly suitable for videos such as:

Product introductions
Knowledge cards
Title animations
Short social media videos
Explanatory videos
Data visualizations
Subtitle synchronization
Text-image combinations
Web-to-video conversions
Rhythm-based motion videos

It does not replace all video generation models but is better suited for videos with clear structures, defined elements, and controllable animations.

Why Codex is Suitable for HyperFrames

The core of HyperFrames is code. It requires writing HTML, CSS, and JS, organizing projects, running lint checks, previewing, rendering, and debugging. This is where coding agents like Codex excel.

The recommended method in the HyperFrames official Quickstart documentation is to use it alongside an AI coding agent. After installing HyperFrames skills, you simply describe the desired video, and the agent learns to write the correct composition, GSAP timelines, Tailwind v4 browser runtime styles, and first-class adapter animations. Supported agents include Claude Code, Cursor, Gemini CLI, and Codex.

The GitHub README specifically mentions Codex: the same set of skills is also available as an OpenAI Codex plugin, which can be installed using the command codex plugin marketplace add heygen-com/hyperframes --sparse .codex-plugin --sparse skills --sparse assets.

This is why the recent case was successful. I did not manually write complex video projects; I simply communicated the goal to Codex. Codex, using the context of HyperFrames skills, autonomously built the project, wrote the page, generated assets, checked, previewed, and rendered it. It was akin to having a coding assistant for video production—just provide the requirements, and it breaks them down into a project.

Hermes Can Also Use HyperFrames

It is important to note that HyperFrames is not exclusive to Codex or Claude Code. The Hermes official Skills page lists HyperFrames as an optional skill. The documentation states that it can create HTML-based video compositions, animated title cards, social overlays, talking-head videos with subtitles, audio-responsive visual effects, and shader transitions. It is suitable for users wanting to render MP4/WebM from HTML compositions, add text and animated charts, synchronize subtitles with audio, or convert websites into videos.

The installation command for Hermes is:

hermes skills install official/creative/hyperframes

Its metadata indicates that the source is optional, the author is heygen-com, the license is Apache-2.0, and tags include creative, video, animation, html, gsap, and motion-graphics.

Thus, while this article focuses on Codex + HyperFrames, it does not imply that only Codex can utilize this approach. Any agent capable of using this skill can follow the same methodology, including Hermes, Claude Code, Cursor, and Codex. The differences lie mainly in:

Installation methods
How agents call skills
Tool permissions and local environments
Personal preference for iteration within a specific agent

Value of This Case Study

The most insightful aspect of this case study is not the complexity of the prompts I wrote. On the contrary, I simply provided a natural language objective:

Create a video themed around “elderly people learning AI” using Hermes and HyperFrames, with cartoon-style visuals and characters generated by Image Gen, and video produced by HyperFrames, featuring natural movements, special effects for scene transitions, pleasant music, and no voice.

This is not a traditional “super prompt” but more like assigning a task to a production assistant. Once Codex received the objective, it autonomously performed several tasks:

Created a HyperFrames project
Established visual guidelines
Generated character, scene, and interface image assets
Developed a non-narrated storyboard animation and sound effects
Ran lint checks, inspected, rendered, and validated the final product
Launched previews and delivered paths

From the screenshots, you can see it completed a progress list, including project creation and visual guidelines, asset generation, animation and sound effect development, lint/inspect/render validation, and preview initiation.

This illustrates the value of AI + Skills: you do not need to instruct it on every step. You provide the goal, and it advances based on the established processes within the skills.

Of course, this does not mean every video will be perfect on the first try. However, it demonstrates that current agents can decompose “video production” into manageable project tasks.

Why Simple Prompts Can Yield Good Results

The reason simple prompts can still produce satisfactory results is that skills provide agents with methods. Many users place all their hopes on prompts when using AI tools, believing that longer prompts yield better results.

However, the logic of Agent + Skill differs. Skills already encapsulate many professional processes. For instance, the first step in the Hermes HyperFrames Skill documentation is “Plan before writing HTML,” which requires the agent to clarify the narrative arc, key moments, emotional beats, structure, tracks, durations, visual identity, motion character, and hero frame.

In layman’s terms:

First, think about the storyline
Next, consider key moments
Then, outline the visual structure
Determine video duration
Decide on colors and fonts
Consider motion characteristics
Identify the most crucial frame for each scene

Thus, you do not need to embed all these processes into the prompt. The skill will prompt the agent to handle them independently. This is why a simple objective can still yield impressive results: the prompt specifies “what I want,” while the skill instructs the agent on “how to do it.”

Installation and Getting Started

If you are using a general AI coding agent, you can install HyperFrames as per the official Quickstart:

npx skills add heygen-com/hyperframes

The official documentation states that this will teach the agent how to write the correct composition, GSAP timeline, Tailwind v4 browser runtime styles, and adapter animations.

For Codex, you can follow the Codex plugin installation method outlined in the GitHub README:

codex plugin marketplace add heygen-com/hyperframes --sparse .codex-plugin --sparse skills --sparse assets

For Hermes, you can install the official optional skill:

hermes skills install official/creative/hyperframes

You will also need to set up the HyperFrames runtime environment. The Hermes HyperFrames Skill documentation states that a one-time setup script will check for Node.js >= 22 and FFmpeg, install the global hyperframes CLI, pre-cache Puppeteer’s chrome-headless-shell, and run npx hyperframes doctor.

Key commands to remember include:

npx hyperframes init my-video
cd my-video
npx hyperframes lint
npx hyperframes preview
npx hyperframes render --output final.mp4
npx hyperframes doctor

These commands illustrate one thing: HyperFrames is not a black box where you just click to generate a video.