This is page 1 of 2. Use http://codebase.md/gyoridavid/short-video-maker?page={x} to view the full context. # Directory Structure ``` ├── __mocks__ │ └── pexels-response.json ├── .dockerignore ├── .editorconfig ├── .env.example ├── .gitignore ├── .prettierrc ├── CONTRIBUTING.md ├── docker-compose.yml ├── eslint.config.mjs ├── LICENSE ├── main-cuda.Dockerfile ├── main-tiny.Dockerfile ├── main.Dockerfile ├── package.json ├── pnpm-lock.yaml ├── postcss.config.js ├── postcss.config.mjs ├── README.md ├── remotion.config.ts ├── rest.http ├── src │ ├── components │ │ ├── root │ │ │ ├── index.ts │ │ │ └── Root.tsx │ │ ├── types.ts │ │ ├── utils.ts │ │ └── videos │ │ ├── LandscapeVideo.tsx │ │ ├── PortraitVideo.tsx │ │ └── Test.tsx │ ├── config.ts │ ├── index.ts │ ├── logger.ts │ ├── scripts │ │ ├── install.ts │ │ └── normalizeMusic.ts │ ├── server │ │ ├── routers │ │ │ ├── mcp.ts │ │ │ └── rest.ts │ │ ├── server.ts │ │ └── validator.ts │ ├── short-creator │ │ ├── libraries │ │ │ ├── FFmpeg.ts │ │ │ ├── Kokoro.ts │ │ │ ├── Pexels.test.ts │ │ │ ├── Pexels.ts │ │ │ ├── Remotion.ts │ │ │ └── Whisper.ts │ │ ├── music.ts │ │ ├── ShortCreator.test.ts │ │ └── ShortCreator.ts │ ├── types │ │ └── shorts.ts │ └── ui │ ├── App.tsx │ ├── components │ │ └── Layout.tsx │ ├── index.html │ ├── index.tsx │ ├── pages │ │ ├── VideoCreator.tsx │ │ ├── VideoDetails.tsx │ │ └── VideoList.tsx │ ├── public │ │ └── index.html │ └── styles │ └── index.css ├── static │ └── music │ ├── Aurora on the Boulevard - National Sweetheart.mp3 │ ├── Baby Animals Playing - Joel Cummins.mp3 │ ├── Banjo Doops - Joel Cummins.mp3 │ ├── Buckle Up - Jeremy Korpas.mp3 │ ├── Cafecito por la Manana - Cumbia Deli.mp3 │ ├── Champion - Telecasted.mp3 │ ├── Crystaline - Quincas Moreira.mp3 │ ├── Curse of the Witches - Jimena Contreras.mp3 │ ├── Delayed Baggage - Ryan Stasik.mp3 │ ├── Final Soliloquy - Asher Fulero.mp3 │ ├── Heartbeat Of The Wind - Asher Fulero.mp3 │ ├── Honey, I Dismembered The Kids - Ezra Lipp.mp3 │ ├── Hopeful - Nat Keefe.mp3 │ ├── Hopeful Freedom - Asher Fulero.mp3 │ ├── Hopeless - Jimena Contreras.mp3 │ ├── Jetski - Telecasted.mp3 │ ├── Like It Loud - Dyalla.mp3 │ ├── Name The Time And Place - Telecasted.mp3 │ ├── Night Hunt - Jimena Contreras.mp3 │ ├── No.2 Remembering Her - Esther Abrami.mp3 │ ├── Oh Please - Telecasted.mp3 │ ├── On The Hunt - Andrew Langdon.mp3 │ ├── Organic Guitar House - Dyalla.mp3 │ ├── Phantom - Density & Time.mp3 │ ├── README.md │ ├── Restless Heart - Jimena Contreras.mp3 │ ├── Seagull - Telecasted.mp3 │ ├── Sinister - Anno Domini Beats.mp3 │ ├── Sly Sky - Telecasted.mp3 │ ├── Touch - Anno Domini Beats.mp3 │ ├── Traversing - Godmode.mp3 │ └── Twin Engines - Jeremy Korpas.mp3 ├── tailwind.config.js ├── tsconfig.build.json ├── tsconfig.json ├── vite.config.ts └── vitest.config.ts ``` # Files -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- ``` node_modules dist .DS_Store .env ``` -------------------------------------------------------------------------------- /.dockerignore: -------------------------------------------------------------------------------- ``` node_modules .git .gitignore *.md dist ``` -------------------------------------------------------------------------------- /.prettierrc: -------------------------------------------------------------------------------- ``` { "useTabs": false, "bracketSpacing": true, "tabWidth": 2 } ``` -------------------------------------------------------------------------------- /.editorconfig: -------------------------------------------------------------------------------- ``` root = true [*] end_of_line = crlf charset = utf-8 trim_trailing_whitespace = true insert_final_newline = true indent_style = space indent_size = 2 ``` -------------------------------------------------------------------------------- /.env.example: -------------------------------------------------------------------------------- ``` PEXELS_API_KEY= # crucial for the project to work LOG_LEVEL=trace # trace, debug, info, warn, error, fatal, silent WHISPER_VERBOSE=true PORT=3123 DEV=true # local development mode DATA_DIR_PATH= # only for docker, otherwise leave empty ``` -------------------------------------------------------------------------------- /static/music/README.md: -------------------------------------------------------------------------------- ```markdown # Music Library for Shorts Creator This directory contains background music tracks for use in the shorts creator project. All music files are sourced from the YouTube audio library, and are free to use under their license. You can use this audio track in any of your videos, including videos that you monetize. No attribution is required. ## Music Collection The music is categorized by mood to match the `MusicMoodEnum` in the project: ## Mood Categories The following moods are defined in the project's `MusicMoodEnum`: - sad - melancholic - happy - euphoric/high - excited - chill - uneasy - angry - dark - hopeful - contemplative - funny/quirky ## How to Add New Music To add new music to the project: 1. Add your MP3 file to this directory (`static/music/`) 2. Update the `src/short-creator/music.ts` file by adding a new record to the `musicList` array: ```typescript { file: "your-new-music-file.mp3", // Filename of your MP3 start: 5, // Start time in seconds (when to begin playing) end: 30, // End time in seconds (when to stop playing) mood: MusicMoodEnum.happy, // Mood tag for the music } ``` ## Usage The shorts creator uses these mood tags to filter and match appropriate music with video content. Choose tags carefully to ensure proper matching between music mood and video content. ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown ## [📚 Join our Skool community for support, premium content and more!](https://www.skool.com/ai-agents-az/about?s1m) ### Be part of a growing community and help us create more content like this # Description An open source automated video creation tool for generating short-form video content. Short Video Maker combines text-to-speech, automatic captions, background videos, and music to create engaging short videos from simple text inputs. This project is meant to provide a free alternative to heavy GPU-power hungry video generation (and a free alternative to expensive, third-party API calls). It doesn't generate a video from scratch based on an image or an image prompt. The repository was open-sourced by the [AI Agents A-Z Youtube Channel](https://www.youtube.com/channel/UCloXqLhp_KGhHBe1kwaL2Tg). We encourage you to check out the channel for more AI-related content and tutorials. The server exposes an [MCP](https://github.com/modelcontextprotocol) and a REST server. While the MCP server can be used with an AI Agent (like n8n) the REST endpoints provide more flexibility for video generation. You can find example n8n workflows created with the REST/MCP server [in this repository](https://github.com/gyoridavid/ai_agents_az/tree/main/episode_7). # TOC ## Getting started - [Requirements](#general-requirements) - [How to run the server](#getting-started-1) - [Web UI](#web-ui) - [Tutorial](#tutorial-with-n8n) - [Examples](#examples) ## Usage - [Environment variables](#environment-variables) - [REST API](#rest-api) - [Configuration options](#configuration-options) - [MCP](#mcp-server) ## Info - [Features](#features) - [How it works](#how-it-works) - [Limitations](#limitations) - [Concepts](#concepts) - [Troubleshooting](#troubleshooting) - [Deploying in the cloud](#deploying-to-the-cloud) - [FAQ](#faq) - [Dependencies](#dependencies-for-the-video-generation) - [Contributing](#how-to-contribute) - [License](#license) - [Acknowledgements](#acknowledgments) # Tutorial with n8n [](https://www.youtube.com/watch?v=jzsQpn-AciM) # Examples <table> <tr> <td> <video src="https://github.com/user-attachments/assets/1b488e7d-1b40-439d-8767-6ab51dbc0922" width="480" height="270"></video> </td> <td> <video src="https://github.com/user-attachments/assets/bb7ce80f-e6e1-44e5-ba4e-9b13d917f55b" width="270" height="480"></video> </td> <td> </tr> </table> # Features - Generate complete short videos from text prompts - Text-to-speech conversion - Automatic caption generation and styling - Background video search and selection via Pexels - Background music with genre/mood selection - Serve as both REST API and Model Context Protocol (MCP) server # How It Works Shorts Creator takes simple text inputs and search terms, then: 1. Converts text to speech using Kokoro TTS 2. Generates accurate captions via Whisper 3. Finds relevant background videos from Pexels 4. Composes all elements with Remotion 5. Renders a professional-looking short video with perfectly timed captions # Limitations - The project only capable generating videos with English voiceover (kokoro-js doesn’t support other languages at the moment) - The background videos are sourced from Pexels # General Requirements - internet - free pexels api key - ≥ 3 gb free RAM, my recommendation is 4gb RAM - ≥ 2 vCPU - ≥ 5gb disc space # Concepts ## Scene Each video is assembled from multiple scenes. These scenes consists of 1. Text: Narration, the text the TTS will read and create captions from. 2. Search terms: The keywords the server should use to find videos from Pexels API. If none can be found, joker terms are being used (`nature`, `globe`, `space`, `ocean`) # Getting started ## Docker (recommended) There are three docker images, for three different use cases. Generally speaking, most of the time you want to spin up the `tiny` one. ### Tiny - Uses the `tiny.en` whisper.cpp model - Uses the `q4` quantized kokoro model - `CONCURRENCY=1` to overcome OOM errors coming from Remotion with limited resources - `VIDEO_CACHE_SIZE_IN_BYTES=2097152000` (2gb) to overcome OOM errors coming from Remotion with limited resources ```jsx docker run -it --rm --name short-video-maker -p 3123:3123 -e LOG_LEVEL=debug -e PEXELS_API_KEY= gyoridavid/short-video-maker:latest-tiny ``` ### Normal - Uses the `base.en` whisper.cpp model - Uses the `fp32` kokoro model - `CONCURRENCY=1` to overcome OOM errors coming from Remotion with limited resources - `VIDEO_CACHE_SIZE_IN_BYTES=2097152000` (2gb) to overcome OOM errors coming from Remotion with limited resources ```jsx docker run -it --rm --name short-video-maker -p 3123:3123 -e LOG_LEVEL=debug -e PEXELS_API_KEY= gyoridavid/short-video-maker:latest ``` ### Cuda If you own an Nvidia GPU and you want use a larger whisper model with GPU acceleration, you can use the CUDA optimised Docker image. - Uses the `medium.en` whisper.cpp model (with GPU acceleration) - Uses `fp32` kokoro model - `CONCURRENCY=1` to overcome OOM errors coming from Remotion with limited resources - `VIDEO_CACHE_SIZE_IN_BYTES=2097152000` (2gb) to overcome OOM errors coming from Remotion with limited resources ```jsx docker run -it --rm --name short-video-maker -p 3123:3123 -e LOG_LEVEL=debug -e PEXELS_API_KEY= --gpus=all gyoridavid/short-video-maker:latest-cuda ``` ## Docker compose You might use Docker Compose to run n8n or other services, and you want to combine them. Make sure you add the shared network to the service configuration. ```bash version: "3" services: short-video-maker: image: gyoridavid/short-video-maker:latest-tiny environment: - LOG_LEVEL=debug - PEXELS_API_KEY= ports: - "3123:3123" volumes: - ./videos:/app/data/videos # expose the generated videos ``` If you are using the [Self-hosted AI starter kit](https://github.com/n8n-io/self-hosted-ai-starter-kit) you want to add `networks: ['demo']` to the\*\* `short-video-maker` service so you can reach it with http://short-video-maker:3123 in n8n. # NPM While Docker is the recommended way to run the project, you can run it with npm or npx. On top of the general requirements, the following are necessary to run the server. ## Supported platforms - Ubuntu ≥ 22.04 (libc 2.5 for Whisper.cpp) - Required packages: `git wget cmake ffmpeg curl make libsdl2-dev libnss3 libdbus-1-3 libatk1.0-0 libgbm-dev libasound2 libxrandr2 libxkbcommon-dev libxfixes3 libxcomposite1 libxdamage1 libatk-bridge2.0-0 libpango-1.0-0 libcairo2 libcups2` - Mac OS - ffmpeg (`brew install ffmpeg`) - node.js (tested on 22+) Windows is **NOT** supported at the moment (whisper.cpp installation fails occasionally). # Web UI @mushitori made a Web UI to generate the videos from your browser. <table> <tr> <td> <img width="1088" alt="Screenshot 2025-05-12 at 1 45 11 PM" src="https://github.com/user-attachments/assets/2ab64aea-f639-41b0-bd19-2fcf73bb1a3d" /> </td> <td> <img width="1075" alt="Screenshot 2025-05-12 at 1 45 44 PM" src="https://github.com/user-attachments/assets/0ff568fe-ddcb-4dad-ae62-2640290aef1e" /> </td> <td> <img width="1083" alt="Screenshot 2025-05-12 at 1 45 51 PM" src="https://github.com/user-attachments/assets/d3c1c826-3cb3-4313-b17c-605ff612fb63" /> </td> <td> <img width="1070" alt="Screenshot 2025-05-12 at 1 46 42 PM" src="https://github.com/user-attachments/assets/18edb1a0-9fc2-48b3-8896-e919e7dc57ff" /> </td> </tr> </table> You can load it on http://localhost:3123 # Environment variables ## 🟢 Configuration | key | description | default | | --------------- | --------------------------------------------------------------- | ------- | | PEXELS_API_KEY | [your (free) Pexels API key](https://www.pexels.com/api/) | | | LOG_LEVEL | pino log level | info | | WHISPER_VERBOSE | whether the output of whisper.cpp should be forwarded to stdout | false | | PORT | the port the server will listen on | 3123 | ## ⚙️ System configuration | key | description | default | | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- | | KOKORO_MODEL_PRECISION | The size of the Kokoro model to use. Valid options are `fp32`, `fp16`, `q8`, `q4`, `q4f16` | depends, see the descriptions of the docker images above ^^ | | CONCURRENCY | [concurrency refers to how many browser tabs are opened in parallel during a render. Each Chrome tab renders web content and then screenshots it.](https://www.remotion.dev/docs/terminology/concurrency). Tweaking this value helps with running the project with limited resources. | depends, see the descriptions of the docker images above ^^ | | VIDEO_CACHE_SIZE_IN_BYTES | Cache for [<OffthreadVideo>](https://remotion.dev/docs/offthreadvideo) frames in Remotion. Tweaking this value helps with running the project with limited resources. | depends, see the descriptions of the docker images above ^^ | ## ⚠️ Danger zone | key | description | default | | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | | WHISPER_MODEL | Which whisper.cpp model to use. Valid options are `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`, `medium`, `medium.en`, `large-v1`, `large-v2`, `large-v3`, `large-v3-turbo` | Depends, see the descriptions of the docker images above. For npm, the default option is `medium.en` | | DATA_DIR_PATH | the data directory of the project | `~/.ai-agents-az-video-generator` with npm, `/app/data` in the Docker images | | DOCKER | whether the project is running in a Docker container | `true` for the docker images, otherwise `false` | | DEV | guess! :) | `false` | # Configuration options | key | description | default | | ---------------------- | -------------------------------------------------------------------------------------------------------------- | ---------- | | paddingBack | The end screen, for how long the video should keep playing after the narration has finished (in milliseconds). | 0 | | music | The mood of the background music. Get the available options from the GET `/api/music-tags` endpoint. | random | | captionPosition | The position where the captions should be rendered. Possible options: `top`, `center`, `bottom`. Default value | `bottom` | | captionBackgroundColor | The background color of the active caption item. | `blue` | | voice | The Kokoro voice. | `af_heart` | | orientation | The video orientation. Possible options are `portrait` and `landscape` | `portrait` | | musicVolume | Set the volume of the background music. Possible options are `low` `medium` `high` and `muted` | `high` | # Usage ## MCP server ## Server URLs `/mcp/sse` `/mcp/messages` ## Available tools - `create-short-video` Creates a short video - the LLM will figure out the right configuration. If you want to use specific configuration, you need to specify those in you prompt. - `get-video-status` Somewhat useless, it’s meant for checking the status of the video, but since the AI agents aren’t really good with the concept of time, you’ll probably will end up using the REST API for that anyway. # REST API ### GET `/health` Healthcheck endpoint ```bash curl --location 'localhost:3123/health' ``` ```bash { "status": "ok" } ``` ### POST `/api/short-video` ```bash curl --location 'localhost:3123/api/short-video' \ --header 'Content-Type: application/json' \ --data '{ "scenes": [ { "text": "Hello world!", "searchTerms": ["river"] } ], "config": { "paddingBack": 1500, "music": "chill" } }' ``` ```bash { "videoId": "cma9sjly700020jo25vwzfnv9" } ``` ### GET `/api/short-video/{id}/status` ```bash curl --location 'localhost:3123/api/short-video/cm9ekme790000hysi5h4odlt1/status' ``` ```bash { "status": "ready" } ``` ### GET `/api/short-video/{id}` ```bash curl --location 'localhost:3123/api/short-video/cm9ekme790000hysi5h4odlt1' ``` Response: the binary data of the video. ### GET `/api/short-videos` ```bash curl --location 'localhost:3123/api/short-videos' ``` ```bash { "videos": [ { "id": "cma9wcwfc0000brsi60ur4lib", "status": "processing" } ] } ``` ### DELETE `/api/short-video/{id}` ```bash curl --location --request DELETE 'localhost:3123/api/short-video/cma9wcwfc0000brsi60ur4lib' ``` ```bash { "success": true } ``` ### GET `/api/voices` ```bash curl --location 'localhost:3123/api/voices' ``` ```bash [ "af_heart", "af_alloy", "af_aoede", "af_bella", "af_jessica", "af_kore", "af_nicole", "af_nova", "af_river", "af_sarah", "af_sky", "am_adam", "am_echo", "am_eric", "am_fenrir", "am_liam", "am_michael", "am_onyx", "am_puck", "am_santa", "bf_emma", "bf_isabella", "bm_george", "bm_lewis", "bf_alice", "bf_lily", "bm_daniel", "bm_fable" ] ``` ### GET `/api/music-tags` ```bash curl --location 'localhost:3123/api/music-tags' ``` ```bash [ "sad", "melancholic", "happy", "euphoric/high", "excited", "chill", "uneasy", "angry", "dark", "hopeful", "contemplative", "funny/quirky" ] ``` # Troubleshooting ## Docker The server needs at least 3gb free memory. Make sure to allocate enough RAM to Docker. If you are running the server from Windows and via wsl2, you need to set the resource limits from the [wsl utility 2](https://learn.microsoft.com/en-us/windows/wsl/wsl-config#configure-global-options-with-wslconfig) - otherwise set it from Docker Desktop. (Ubuntu is not restricting the resources unless specified with the run command). ## NPM Make sure all the necessary packages are installed. # n8n Setting up the MCP (or REST) server depends on how you run n8n and the server. Please follow the examples from the matrix below. | | n8n is running locally, using `n8n start` | n8n is running locally using Docker | n8n is running in the cloud | | ------------------------------------------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | | `short-video-maker` is running in Docker, locally | `http://localhost:3123` | It depends. You can technically use `http://host.docker.internal:3123` as it points to the host, but you could configure to use the same network and use the service name to communicate like `http://short-video-maker:3123` | won’t work - deploy `short-video-maker` to the cloud | | `short-video-maker` is running with npm/npx | `http://localhost:3123` | `http://host.docker.internal:3123` | won’t work - deploy `short-video-maker` to the cloud | | `short-video-maker` is running in the cloud | You should use your IP address `http://{YOUR_IP}:3123` | You should use your IP address `http://{YOUR_IP}:3123` | You should use your IP address `http://{YOUR_IP}:3123` | # Deploying to the cloud While each VPS provider is different, and it’s impossible to provide configuration to all of them, here are some tips. - Use Ubuntu ≥ 22.04 - Have ≥ 4gb RAM, ≥ 2vCPUs and ≥5gb storage - Use [pm2](https://pm2.keymetrics.io/) to run/manage the server - Put the environment variables to the `.bashrc` file (or similar) # FAQ ## Can I use other languages? (French, German etc.) Unfortunately, it’s not possible at the moment. Kokoro-js only supports English. ## Can I pass in images and videos and can it stitch it together No ## Should I run the project with `npm` or `docker`? Docker is the recommended way to run the project. ## How much GPU is being used for the video generation? Honestly, not a lot - only whisper.cpp can be accelerated. Remotion is CPU-heavy, and [Kokoro-js](https://github.com/hexgrad/kokoro) runs on the CPU. ## Is there a UI that I can use to generate the videos No (t yet) ## Can I select different source for the videos than Pexels, or provide my own video No ## Can the project generate videos from images? No ## Dependencies for the video generation | Dependency | Version | License | Purpose | | ------------------------------------------------------ | -------- | --------------------------------------------------------------------------------- | ------------------------------- | | [Remotion](https://remotion.dev/) | ^4.0.286 | [Remotion License](https://github.com/remotion-dev/remotion/blob/main/LICENSE.md) | Video composition and rendering | | [Whisper CPP](https://github.com/ggml-org/whisper.cpp) | v1.5.5 | MIT | Speech-to-text for captions | | [FFmpeg](https://ffmpeg.org/) | ^2.1.3 | LGPL/GPL | Audio/video manipulation | | [Kokoro.js](https://www.npmjs.com/package/kokoro-js) | ^1.2.0 | MIT | Text-to-speech generation | | [Pexels API](https://www.pexels.com/api/) | N/A | [Pexels Terms](https://www.pexels.com/license/) | Background videos | ## How to contribute? PRs are welcome. See the [CONTRIBUTING.md](CONTRIBUTING.md) file for instructions on setting up a local development environment. ## License This project is licensed under the [MIT License](LICENSE). ## Acknowledgments - ❤️ [Remotion](https://remotion.dev/) for programmatic video generation - ❤️ [Whisper](https://github.com/ggml-org/whisper.cpp) for speech-to-text - ❤️ [Pexels](https://www.pexels.com/) for video content - ❤️ [FFmpeg](https://ffmpeg.org/) for audio/video processing - ❤️ [Kokoro](https://github.com/hexgrad/kokoro) for TTS ``` -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- ```markdown # Contributing to Shorts Creator ## How to setup the development environment 1. Clone the repository ```bash git clone [email protected]:gyoridavid/short-video-maker.git cd shorts-video-maker ``` 2. Install dependencies ```bash pnpm install ``` 3. Copy `.env.example` to `.env` and set the right environment variables. 4. Start the server ```bash pnpm dev ``` ## How to preview the videos and debug the rendering process You can use Remotion Studio to preview videos. Make sure to update the template if the underlying data structure changes. ```bash npx remotion studio ``` ``` -------------------------------------------------------------------------------- /postcss.config.mjs: -------------------------------------------------------------------------------- ``` export default { plugins: { "@tailwindcss/postcss": {}, }, }; ``` -------------------------------------------------------------------------------- /eslint.config.mjs: -------------------------------------------------------------------------------- ``` import { config } from "@remotion/eslint-config-flat"; export default config; ``` -------------------------------------------------------------------------------- /postcss.config.js: -------------------------------------------------------------------------------- ```javascript module.exports = { plugins: { tailwindcss: {}, autoprefixer: {}, }, } ``` -------------------------------------------------------------------------------- /src/logger.ts: -------------------------------------------------------------------------------- ```typescript import { logger } from "./config"; export default logger; export { logger }; ``` -------------------------------------------------------------------------------- /src/components/root/index.ts: -------------------------------------------------------------------------------- ```typescript import { registerRoot } from "remotion"; import { RemotionRoot } from "./Root"; registerRoot(RemotionRoot); ``` -------------------------------------------------------------------------------- /vitest.config.ts: -------------------------------------------------------------------------------- ```typescript import { defineConfig } from "vitest/config"; export default defineConfig({ test: { // ... }, }); ``` -------------------------------------------------------------------------------- /tsconfig.build.json: -------------------------------------------------------------------------------- ```json { "extends": "./tsconfig.json", "compilerOptions": { "outDir": "./dist" }, "include": ["src/**/*"], "exclude": ["**/*.test.ts", "src/ui"] } ``` -------------------------------------------------------------------------------- /tailwind.config.js: -------------------------------------------------------------------------------- ```javascript /** @type {import('tailwindcss').Config} */ module.exports = { content: [ "./src/ui/**/*.{js,jsx,ts,tsx}", ], theme: { extend: {}, }, plugins: [], } ``` -------------------------------------------------------------------------------- /src/components/types.ts: -------------------------------------------------------------------------------- ```typescript export enum AvailableComponentsEnum { PortraitVideo = "ShortVideo", LandscapeVideo = "LandscapeVideo", } export type OrientationConfig = { width: number; height: number; component: AvailableComponentsEnum; }; ``` -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- ```yaml version: "3" services: short-creator: build: context: . dockerfile: main.Dockerfile env_file: - .env environment: - DEV=false ports: - "3123:3123" entrypoint: ["node", "dist/index.js"] ``` -------------------------------------------------------------------------------- /src/ui/index.tsx: -------------------------------------------------------------------------------- ```typescript import React from 'react'; import ReactDOM from 'react-dom/client'; import App from './App'; import './styles/index.css'; const root = ReactDOM.createRoot( document.getElementById('root') as HTMLElement ); root.render( <React.StrictMode> <App /> </React.StrictMode> ); ``` -------------------------------------------------------------------------------- /src/components/videos/Test.tsx: -------------------------------------------------------------------------------- ```typescript import { AbsoluteFill, Sequence } from "remotion"; export const TestVideo: React.FC = () => { return ( <AbsoluteFill> <AbsoluteFill> <AbsoluteFill> <h1>Hello</h1> </AbsoluteFill> <Sequence from={10}> <h1 style={{ marginTop: "60px" }}>World</h1> </Sequence> </AbsoluteFill> </AbsoluteFill> ); }; ``` -------------------------------------------------------------------------------- /tsconfig.json: -------------------------------------------------------------------------------- ```json { "compilerOptions": { "target": "ES2022", "module": "NodeNext", "moduleResolution": "NodeNext", "esModuleInterop": true, "strict": true, "skipLibCheck": true, "forceConsistentCasingInFileNames": true, "outDir": "dist", "rootDir": "src", "declaration": true, "jsx": "react-jsx" }, "exclude": [ "remotion.config.ts", "node_modules", "dist", "vitest.config.ts", "src/ui" ] } ``` -------------------------------------------------------------------------------- /src/ui/public/index.html: -------------------------------------------------------------------------------- ```html <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="theme-color" content="#000000" /> <meta name="description" content="Short Video Maker - Create amazing short videos" /> <title>Short Video Maker</title> </head> <body> <noscript>You need to enable JavaScript to run this app.</noscript> <div id="root"></div> </body> </html> ``` -------------------------------------------------------------------------------- /remotion.config.ts: -------------------------------------------------------------------------------- ```typescript // See all configuration options: https://remotion.dev/docs/config // Each option also is available as a CLI flag: https://remotion.dev/docs/cli // Note: When using the Node.JS APIs, the config file doesn't apply. Instead, pass options directly to the APIs import { Config } from "@remotion/cli/config"; Config.setVideoImageFormat("jpeg"); Config.setOverwriteOutput(true); Config.setPublicDir("static/music"); Config.setEntryPoint("src/components/root/index.ts"); ``` -------------------------------------------------------------------------------- /src/ui/index.html: -------------------------------------------------------------------------------- ```html <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="theme-color" content="#000000" /> <meta name="description" content="Short Video Maker - Create amazing short videos" /> <title>Short Video Maker</title> </head> <body> <noscript>You need to enable JavaScript to run this app.</noscript> <div id="root"></div> <script type="module" src="./index.tsx"></script> </body> </html> ``` -------------------------------------------------------------------------------- /src/ui/App.tsx: -------------------------------------------------------------------------------- ```typescript import React from 'react'; import { BrowserRouter as Router, Routes, Route } from 'react-router-dom'; import VideoList from './pages/VideoList'; import VideoCreator from './pages/VideoCreator'; import VideoDetails from './pages/VideoDetails'; import Layout from './components/Layout'; const App: React.FC = () => { return ( <Router> <Layout> <Routes> <Route path="/" element={<VideoList />} /> <Route path="/create" element={<VideoCreator />} /> <Route path="/video/:videoId" element={<VideoDetails />} /> </Routes> </Layout> </Router> ); }; export default App; ``` -------------------------------------------------------------------------------- /vite.config.ts: -------------------------------------------------------------------------------- ```typescript import { defineConfig } from 'vite'; import react from '@vitejs/plugin-react'; import path from 'path'; export default defineConfig({ plugins: [react()], root: 'src/ui', build: { outDir: path.resolve(__dirname, 'dist/ui'), emptyOutDir: true, rollupOptions: { input: { main: path.resolve(__dirname, 'src/ui/index.html'), }, }, }, resolve: { alias: { '@': path.resolve(__dirname, './src/ui'), }, }, server: { port: 3000, proxy: { '/api': { target: 'http://localhost:3123', changeOrigin: true, }, '/mcp': { target: 'http://localhost:3123', changeOrigin: true, }, }, }, }); ``` -------------------------------------------------------------------------------- /src/ui/styles/index.css: -------------------------------------------------------------------------------- ```css @tailwind base; @tailwind components; @tailwind utilities; /* Base styles */ body { margin: 0; padding: 0; font-family: 'Roboto', 'Helvetica', 'Arial', sans-serif; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } a { color: inherit; text-decoration: none; } /* Scrollbar styles */ ::-webkit-scrollbar { width: 8px; height: 8px; } ::-webkit-scrollbar-track { background: #f1f1f1; } ::-webkit-scrollbar-thumb { background: #888; border-radius: 4px; } ::-webkit-scrollbar-thumb:hover { background: #555; } /* Focus styles for accessibility */ :focus-visible { outline: 2px solid #1976d2; } /* Transitions */ .fade-enter { opacity: 0; } .fade-enter-active { opacity: 1; transition: opacity 300ms; } .fade-exit { opacity: 1; } .fade-exit-active { opacity: 0; transition: opacity 300ms; } ``` -------------------------------------------------------------------------------- /src/scripts/install.ts: -------------------------------------------------------------------------------- ```typescript import { ensureBrowser } from "@remotion/renderer"; import { logger } from "../logger"; import { Kokoro } from "../short-creator/libraries/Kokoro"; import { MusicManager } from "../short-creator/music"; import { Config } from "../config"; import { Whisper } from "../short-creator/libraries/Whisper"; // runs in docker export async function install() { const config = new Config(); logger.info("Installing dependencies..."); logger.info("Installing Kokoro..."); await Kokoro.init(config.kokoroModelPrecision); logger.info("Installing browser shell..."); await ensureBrowser(); logger.info("Installing whisper.cpp"); await Whisper.init(config); logger.info("Installing dependencies complete"); logger.info("Ensuring the music files exist..."); const musicManager = new MusicManager(config); try { musicManager.ensureMusicFilesExist(); } catch (error: unknown) { logger.error(error, "Missing music files"); process.exit(1); } } install() .then(() => { logger.info("Installation complete"); }) .catch((error: unknown) => { logger.error(error, "Installation failed"); }); ``` -------------------------------------------------------------------------------- /src/server/validator.ts: -------------------------------------------------------------------------------- ```typescript import { createShortInput, CreateShortInput } from "../types/shorts"; import { logger } from "../logger"; import { ZodError } from "zod"; export interface ValidationErrorResult { message: string; missingFields: Record<string, string>; } export function validateCreateShortInput(input: object): CreateShortInput { const validated = createShortInput.safeParse(input); logger.info({ validated }, "Validated input"); if (validated.success) { return validated.data; } // Process the validation errors const errorResult = formatZodError(validated.error); throw new Error( JSON.stringify({ message: errorResult.message, missingFields: errorResult.missingFields, }), ); } function formatZodError(error: ZodError): ValidationErrorResult { const missingFields: Record<string, string> = {}; // Extract all the errors into a human-readable format error.errors.forEach((err) => { const path = err.path.join("."); missingFields[path] = err.message; }); // Create a human-readable message const errorPaths = Object.keys(missingFields); let message = `Validation failed for ${errorPaths.length} field(s): `; message += errorPaths.join(", "); return { message, missingFields, }; } ``` -------------------------------------------------------------------------------- /src/short-creator/libraries/Pexels.test.ts: -------------------------------------------------------------------------------- ```typescript process.env.LOG_LEVEL = "debug"; import nock from "nock"; import { PexelsAPI } from "./Pexels"; import { test, assert, expect } from "vitest"; import fs from "fs-extra"; import path from "path"; import { OrientationEnum } from "../../types/shorts"; test("test pexels", async () => { const mockResponse = fs.readFileSync( path.resolve("__mocks__/pexels-response.json"), "utf-8", ); nock("https://api.pexels.com") .get(/videos\/search/) .reply(200, mockResponse); const pexels = new PexelsAPI("asdf"); const video = await pexels.findVideo(["dog"], 2.4, []); console.log(video); assert.isObject(video, "Video should be an object"); }); test("should time out", async () => { nock("https://api.pexels.com") .get(/videos\/search/) .delay(1000) .times(30) .reply(200, {}); expect(async () => { const pexels = new PexelsAPI("asdf"); await pexels.findVideo(["dog"], 2.4, [], OrientationEnum.portrait, 100); }).rejects.toThrow( expect.objectContaining({ name: "TimeoutError", }), ); }); test("should retry 3 times", async () => { nock("https://api.pexels.com") .get(/videos\/search/) .delay(1000) .times(2) .reply(200, {}); const mockResponse = fs.readFileSync( path.resolve("__mocks__/pexels-response.json"), "utf-8", ); nock("https://api.pexels.com") .get(/videos\/search/) .reply(200, mockResponse); const pexels = new PexelsAPI("asdf"); const video = await pexels.findVideo(["dog"], 2.4, []); console.log(video); assert.isObject(video, "Video should be an object"); }); ``` -------------------------------------------------------------------------------- /src/scripts/normalizeMusic.ts: -------------------------------------------------------------------------------- ```typescript import ffmpeg from "fluent-ffmpeg"; import path from "path"; import("@ffmpeg-installer/ffmpeg"); import fs from "fs-extra"; import { logger } from "../logger"; import { MusicManager } from "../short-creator/music"; import { Config } from "../config"; async function normalize(inputPath: string, outputPath: string) { return new Promise((resolve, reject) => { ffmpeg() .input(inputPath) .audioCodec("libmp3lame") .audioBitrate(96) .audioChannels(2) .audioFrequency(44100) .audioFilter("loudnorm,volume=0.1") .toFormat("mp3") .on("error", (err) => { logger.error(err, "Error normalizing audio:"); reject(err); }) .save(outputPath) .on("end", () => { logger.debug("Audio normalization complete"); resolve(outputPath); }); }); } export async function normalizeMusic() { const config = new Config(); const musicManager = new MusicManager(config); try { musicManager.ensureMusicFilesExist(); } catch (error: unknown) { logger.error(error, "Missing music files"); process.exit(1); } const musicFiles = musicManager.musicList(); const normalizedDir = path.join(config.musicDirPath, "normalized"); fs.ensureDirSync(normalizedDir); for (const musicFile of musicFiles) { const inputPath = path.join(config.musicDirPath, musicFile.file); const outputPath = path.join(normalizedDir, musicFile.file); logger.debug({ inputPath, outputPath }, "Normalizing music file"); await normalize(inputPath, outputPath); } } normalizeMusic() .then(() => { logger.info( "Music normalization completed successfully - make sure to replace the original files with the normalized ones", ); }) .catch((error: unknown) => { logger.error(error, "Error normalizing music files"); }); ``` -------------------------------------------------------------------------------- /src/server/server.ts: -------------------------------------------------------------------------------- ```typescript import http from "http"; import express from "express"; import type { Request as ExpressRequest, Response as ExpressResponse, } from "express"; import path from "path"; import { ShortCreator } from "../short-creator/ShortCreator"; import { APIRouter } from "./routers/rest"; import { MCPRouter } from "./routers/mcp"; import { logger } from "../logger"; import { Config } from "../config"; export class Server { private app: express.Application; private config: Config; constructor(config: Config, shortCreator: ShortCreator) { this.config = config; this.app = express(); // add healthcheck endpoint this.app.get("/health", (req: ExpressRequest, res: ExpressResponse) => { res.status(200).json({ status: "ok" }); }); const apiRouter = new APIRouter(config, shortCreator); const mcpRouter = new MCPRouter(shortCreator); this.app.use("/api", apiRouter.router); this.app.use("/mcp", mcpRouter.router); // Serve static files from the UI build this.app.use(express.static(path.join(__dirname, "../../dist/ui"))); this.app.use( "/static", express.static(path.join(__dirname, "../../static")), ); // Serve the React app for all other routes (must be last) this.app.get("*", (req: ExpressRequest, res: ExpressResponse) => { res.sendFile(path.join(__dirname, "../../dist/ui/index.html")); }); } public start(): http.Server { const server = this.app.listen(this.config.port, () => { logger.info( { port: this.config.port, mcp: "/mcp", api: "/api" }, "MCP and API server is running", ); logger.info( `UI server is running on http://localhost:${this.config.port}`, ); }); server.on("error", (error: Error) => { logger.error(error, "Error starting server"); }); return server; } public getApp() { return this.app; } } ``` -------------------------------------------------------------------------------- /src/ui/components/Layout.tsx: -------------------------------------------------------------------------------- ```typescript import React from 'react'; import { useNavigate } from 'react-router-dom'; import { AppBar, Box, Container, CssBaseline, Toolbar, Typography, Button, ThemeProvider, createTheme } from '@mui/material'; import VideoIcon from '@mui/icons-material/VideoLibrary'; import AddIcon from '@mui/icons-material/Add'; interface LayoutProps { children: React.ReactNode; } const theme = createTheme({ palette: { mode: 'light', primary: { main: '#1976d2', }, secondary: { main: '#f50057', }, }, typography: { fontFamily: '"Roboto", "Helvetica", "Arial", sans-serif', }, }); const Layout: React.FC<LayoutProps> = ({ children }) => { const navigate = useNavigate(); return ( <ThemeProvider theme={theme}> <CssBaseline /> <Box sx={{ display: 'flex', flexDirection: 'column', minHeight: '100vh' }}> <AppBar position="static"> <Toolbar> <VideoIcon sx={{ mr: 2 }} /> <Typography variant="h6" component="div" sx={{ flexGrow: 1, cursor: 'pointer' }} onClick={() => navigate('/')} > Short Video Maker </Typography> <Button color="inherit" startIcon={<AddIcon />} onClick={() => navigate('/create')} > Create Video </Button> </Toolbar> </AppBar> <Container component="main" sx={{ flexGrow: 1, py: 4 }}> {children} </Container> <Box component="footer" sx={{ py: 3, mt: 'auto', backgroundColor: (theme) => theme.palette.grey[200], textAlign: 'center' }} > <Typography variant="body2" color="text.secondary"> Short Video Maker © {new Date().getFullYear()} </Typography> </Box> </Box> </ThemeProvider> ); }; export default Layout; ``` -------------------------------------------------------------------------------- /src/short-creator/libraries/Kokoro.ts: -------------------------------------------------------------------------------- ```typescript import { KokoroTTS, TextSplitterStream } from "kokoro-js"; import { VoiceEnum, type kokoroModelPrecision, type Voices, } from "../../types/shorts"; import { KOKORO_MODEL, logger } from "../../config"; export class Kokoro { constructor(private tts: KokoroTTS) {} async generate( text: string, voice: Voices, ): Promise<{ audio: ArrayBuffer; audioLength: number; }> { const splitter = new TextSplitterStream(); const stream = this.tts.stream(splitter, { voice, }); splitter.push(text); splitter.close(); const output = []; for await (const audio of stream) { output.push(audio); } const audioBuffers: ArrayBuffer[] = []; let audioLength = 0; for (const audio of output) { audioBuffers.push(audio.audio.toWav()); audioLength += audio.audio.audio.length / audio.audio.sampling_rate; } const mergedAudioBuffer = Kokoro.concatWavBuffers(audioBuffers); logger.debug({ text, voice, audioLength }, "Audio generated with Kokoro"); return { audio: mergedAudioBuffer, audioLength: audioLength, }; } static concatWavBuffers(buffers: ArrayBuffer[]): ArrayBuffer { const header = Buffer.from(buffers[0].slice(0, 44)); let totalDataLength = 0; const dataParts = buffers.map((buf) => { const b = Buffer.from(buf); const data = b.slice(44); totalDataLength += data.length; return data; }); header.writeUInt32LE(36 + totalDataLength, 4); header.writeUInt32LE(totalDataLength, 40); return Buffer.concat([header, ...dataParts]); } static async init(dtype: kokoroModelPrecision): Promise<Kokoro> { const tts = await KokoroTTS.from_pretrained(KOKORO_MODEL, { dtype, device: "cpu", // only "cpu" is supported in node }); return new Kokoro(tts); } listAvailableVoices(): Voices[] { const voices = Object.values(VoiceEnum) as Voices[]; return voices; } } ``` -------------------------------------------------------------------------------- /main.Dockerfile: -------------------------------------------------------------------------------- ```dockerfile FROM ubuntu:22.04 AS install-whisper ENV DEBIAN_FRONTEND=noninteractive RUN apt update # whisper install dependencies RUN apt install -y \ git \ build-essential \ wget \ cmake \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* WORKDIR /whisper RUN git clone https://github.com/ggml-org/whisper.cpp.git . RUN git checkout v1.7.1 RUN make WORKDIR /whisper/models RUN sh ./download-ggml-model.sh base.en FROM node:22-bookworm-slim AS base ENV DEBIAN_FRONTEND=noninteractive WORKDIR /app RUN apt update RUN apt install -y \ # whisper dependencies git \ wget \ cmake \ ffmpeg \ curl \ make \ libsdl2-dev \ # remotion dependencies libnss3 \ libdbus-1-3 \ libatk1.0-0 \ libgbm-dev \ libasound2 \ libxrandr2 \ libxkbcommon-dev \ libxfixes3 \ libxcomposite1 \ libxdamage1 \ libatk-bridge2.0-0 \ libpango-1.0-0 \ libcairo2 \ libcups2 \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* # setup pnpm ENV PNPM_HOME="/pnpm" ENV PATH="$PNPM_HOME:$PATH" ENV COREPACK_ENABLE_DOWNLOAD_PROMPT=0 RUN corepack enable FROM base AS prod-deps COPY package.json pnpm-lock.yaml* /app/ RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --prod --frozen-lockfile RUN pnpm install --prefer-offline --no-cache --prod FROM prod-deps AS build COPY tsconfig.json /app COPY tsconfig.build.json /app COPY vite.config.ts /app COPY src /app/src RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --frozen-lockfile RUN pnpm build FROM base COPY static /app/static COPY --from=install-whisper /whisper /app/data/libs/whisper COPY --from=prod-deps /app/node_modules /app/node_modules COPY --from=build /app/dist /app/dist COPY package.json /app/ # app configuration via environment variables ENV DATA_DIR_PATH=/app/data ENV DOCKER=true ENV WHISPER_MODEL=base.en # number of chrome tabs to use for rendering ENV CONCURRENCY=1 # video cache - 2000MB ENV VIDEO_CACHE_SIZE_IN_BYTES=2097152000 # install kokoro, headless chrome and ensure music files are present RUN node dist/scripts/install.js CMD ["pnpm", "start"] ``` -------------------------------------------------------------------------------- /main-tiny.Dockerfile: -------------------------------------------------------------------------------- ```dockerfile FROM ubuntu:22.04 AS install-whisper ENV DEBIAN_FRONTEND=noninteractive RUN apt update # whisper install dependencies RUN apt install -y \ git \ build-essential \ wget \ cmake \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* WORKDIR /whisper RUN git clone https://github.com/ggml-org/whisper.cpp.git . RUN git checkout v1.7.1 RUN make WORKDIR /whisper/models RUN sh ./download-ggml-model.sh tiny.en FROM node:22-bookworm-slim AS base ENV DEBIAN_FRONTEND=noninteractive WORKDIR /app RUN apt update RUN apt install -y \ # whisper dependencies git \ wget \ cmake \ ffmpeg \ curl \ make \ libsdl2-dev \ # remotion dependencies libnss3 \ libdbus-1-3 \ libatk1.0-0 \ libgbm-dev \ libasound2 \ libxrandr2 \ libxkbcommon-dev \ libxfixes3 \ libxcomposite1 \ libxdamage1 \ libatk-bridge2.0-0 \ libpango-1.0-0 \ libcairo2 \ libcups2 \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* # setup pnpm ENV PNPM_HOME="/pnpm" ENV PATH="$PNPM_HOME:$PATH" ENV COREPACK_ENABLE_DOWNLOAD_PROMPT=0 RUN corepack enable FROM base AS prod-deps COPY package.json pnpm-lock.yaml* /app/ RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --prod --frozen-lockfile RUN pnpm install --prefer-offline --no-cache --prod FROM prod-deps AS build COPY tsconfig.json /app COPY tsconfig.build.json /app COPY vite.config.ts /app COPY src /app/src RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --frozen-lockfile RUN pnpm build FROM base COPY static /app/static COPY --from=install-whisper /whisper /app/data/libs/whisper COPY --from=prod-deps /app/node_modules /app/node_modules COPY --from=build /app/dist /app/dist COPY package.json /app/ # app configuration via environment variables ENV DATA_DIR_PATH=/app/data ENV DOCKER=true ENV WHISPER_MODEL=tiny.en ENV KOKORO_MODEL_PRECISION=q4 # number of chrome tabs to use for rendering ENV CONCURRENCY=1 # video cache - 2000MB ENV VIDEO_CACHE_SIZE_IN_BYTES=2097152000 # install kokoro, headless chrome and ensure music files are present RUN node dist/scripts/install.js CMD ["pnpm", "start"] ``` -------------------------------------------------------------------------------- /src/short-creator/libraries/FFmpeg.ts: -------------------------------------------------------------------------------- ```typescript import ffmpeg from "fluent-ffmpeg"; import { Readable } from "node:stream"; import { logger } from "../../logger"; export class FFMpeg { static async init(): Promise<FFMpeg> { return import("@ffmpeg-installer/ffmpeg").then((ffmpegInstaller) => { ffmpeg.setFfmpegPath(ffmpegInstaller.path); logger.info("FFmpeg path set to:", ffmpegInstaller.path); return new FFMpeg(); }); } async saveNormalizedAudio( audio: ArrayBuffer, outputPath: string, ): Promise<string> { logger.debug("Normalizing audio for Whisper"); const inputStream = new Readable(); inputStream.push(Buffer.from(audio)); inputStream.push(null); return new Promise((resolve, reject) => { ffmpeg() .input(inputStream) .audioCodec("pcm_s16le") .audioChannels(1) .audioFrequency(16000) .toFormat("wav") .on("end", () => { logger.debug("Audio normalization complete"); resolve(outputPath); }) .on("error", (error: unknown) => { logger.error(error, "Error normalizing audio:"); reject(error); }) .save(outputPath); }); } async createMp3DataUri(audio: ArrayBuffer): Promise<string> { const inputStream = new Readable(); inputStream.push(Buffer.from(audio)); inputStream.push(null); return new Promise((resolve, reject) => { const chunk: Buffer[] = []; ffmpeg() .input(inputStream) .audioCodec("libmp3lame") .audioBitrate(128) .audioChannels(2) .toFormat("mp3") .on("error", (err) => { reject(err); }) .pipe() .on("data", (data: Buffer) => { chunk.push(data); }) .on("end", () => { const buffer = Buffer.concat(chunk); resolve(`data:audio/mp3;base64,${buffer.toString("base64")}`); }) .on("error", (err) => { reject(err); }); }); } async saveToMp3(audio: ArrayBuffer, filePath: string): Promise<string> { const inputStream = new Readable(); inputStream.push(Buffer.from(audio)); inputStream.push(null); return new Promise((resolve, reject) => { ffmpeg() .input(inputStream) .audioCodec("libmp3lame") .audioBitrate(128) .audioChannels(2) .toFormat("mp3") .save(filePath) .on("end", () => { logger.debug("Audio conversion complete"); resolve(filePath); }) .on("error", (err) => { reject(err); }); }); } } ``` -------------------------------------------------------------------------------- /src/short-creator/libraries/Remotion.ts: -------------------------------------------------------------------------------- ```typescript import z from "zod"; import { bundle } from "@remotion/bundler"; import { renderMedia, selectComposition } from "@remotion/renderer"; import path from "path"; import { ensureBrowser } from "@remotion/renderer"; import { Config } from "../../config"; import { shortVideoSchema } from "../../components/utils"; import { logger } from "../../logger"; import { OrientationEnum } from "../../types/shorts"; import { getOrientationConfig } from "../../components/utils"; export class Remotion { constructor( private bundled: string, private config: Config, ) {} static async init(config: Config): Promise<Remotion> { await ensureBrowser(); const bundled = await bundle({ entryPoint: path.join( config.packageDirPath, config.devMode ? "src" : "dist", "components", "root", `index.${config.devMode ? "ts" : "js"}`, ), }); return new Remotion(bundled, config); } async render( data: z.infer<typeof shortVideoSchema>, id: string, orientation: OrientationEnum, ) { const { component } = getOrientationConfig(orientation); const composition = await selectComposition({ serveUrl: this.bundled, id: component, inputProps: data, }); logger.debug({ component, videoID: id }, "Rendering video with Remotion"); const outputLocation = path.join(this.config.videosDirPath, `${id}.mp4`); await renderMedia({ codec: "h264", composition, serveUrl: this.bundled, outputLocation, inputProps: data, onProgress: ({ progress }) => { logger.debug(`Rendering ${id} ${Math.floor(progress * 100)}% complete`); }, // preventing memory issues with docker concurrency: this.config.concurrency, offthreadVideoCacheSizeInBytes: this.config.videoCacheSizeInBytes, }); logger.debug( { outputLocation, component, videoID: id, }, "Video rendered with Remotion", ); } async testRender(outputLocation: string) { const composition = await selectComposition({ serveUrl: this.bundled, id: "TestVideo", }); await renderMedia({ codec: "h264", composition, serveUrl: this.bundled, outputLocation, onProgress: ({ progress }) => { logger.debug( `Rendering test video: ${Math.floor(progress * 100)}% complete`, ); }, // preventing memory issues with docker concurrency: this.config.concurrency, offthreadVideoCacheSizeInBytes: this.config.videoCacheSizeInBytes, }); } } ``` -------------------------------------------------------------------------------- /src/server/routers/mcp.ts: -------------------------------------------------------------------------------- ```typescript import express from "express"; import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js"; import z from "zod"; import { ShortCreator } from "../../short-creator/ShortCreator"; import { logger } from "../../logger"; import { renderConfig, sceneInput } from "../../types/shorts"; export class MCPRouter { router: express.Router; shortCreator: ShortCreator; transports: { [sessionId: string]: SSEServerTransport } = {}; mcpServer: McpServer; constructor(shortCreator: ShortCreator) { this.router = express.Router(); this.shortCreator = shortCreator; this.mcpServer = new McpServer({ name: "Short Creator", version: "0.0.1", capabilities: { resources: {}, tools: {}, }, }); this.setupMCPServer(); this.setupRoutes(); } private setupMCPServer() { this.mcpServer.tool( "get-video-status", "Get the status of a video (ready, processing, failed)", { videoId: z.string().describe("The ID of the video"), }, async ({ videoId }) => { const status = this.shortCreator.status(videoId); return { content: [ { type: "text", text: status, }, ], }; }, ); this.mcpServer.tool( "create-short-video", "Create a short video from a list of scenes", { scenes: z.array(sceneInput).describe("Each scene to be created"), config: renderConfig.describe("Configuration for rendering the video"), }, async ({ scenes, config }) => { const videoId = await this.shortCreator.addToQueue(scenes, config); return { content: [ { type: "text", text: videoId, }, ], }; }, ); } private setupRoutes() { this.router.get("/sse", async (req, res) => { logger.info("SSE GET request received"); const transport = new SSEServerTransport("/mcp/messages", res); this.transports[transport.sessionId] = transport; res.on("close", () => { delete this.transports[transport.sessionId]; }); await this.mcpServer.connect(transport); }); this.router.post("/messages", async (req, res) => { logger.info("SSE POST request received"); const sessionId = req.query.sessionId as string; const transport = this.transports[sessionId]; if (transport) { await transport.handlePostMessage(req, res); } else { res.status(400).send("No transport found for sessionId"); } }); } } ``` -------------------------------------------------------------------------------- /main-cuda.Dockerfile: -------------------------------------------------------------------------------- ```dockerfile ARG UBUNTU_VERSION=22.04 ARG CUDA_VERSION=12.3.1 ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION} ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION} # Ref: https://github.com/ggml-org/whisper.cpp FROM ${BASE_CUDA_DEV_CONTAINER} AS install-whisper ENV DEBIAN_FRONTEND=noninteractive RUN apt-get update && \ apt-get install --fix-missing --no-install-recommends -y bash git make vim wget g++ ffmpeg curl WORKDIR /app/data/libs/whisper RUN git clone https://github.com/ggerganov/whisper.cpp.git -b v1.7.1 --depth 1 . RUN make clean RUN GGML_CUDA=1 make -j RUN sh ./models/download-ggml-model.sh medium.en FROM ${BASE_CUDA_RUN_CONTAINER} AS base # install node RUN apt-get update && apt-get install -y \ curl \ ca-certificates \ gnupg \ lsb-release \ && rm -rf /var/lib/apt/lists/* RUN curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \ && apt-get update && apt-get install -y nodejs \ && rm -rf /var/lib/apt/lists/* RUN node -v && npm -v # install dependencies ENV DEBIAN_FRONTEND=noninteractive WORKDIR /app RUN apt update RUN apt install -y \ # whisper dependencies git \ wget \ cmake \ ffmpeg \ curl \ build-essential \ make \ # remotion dependencies libnss3 \ libdbus-1-3 \ libatk1.0-0 \ libgbm-dev \ libasound2 \ libxrandr2 \ libxkbcommon-dev \ libxfixes3 \ libxcomposite1 \ libxdamage1 \ libatk-bridge2.0-0 \ libpango-1.0-0 \ libcairo2 \ libcups2 \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* # setup pnpm ENV PNPM_HOME="/pnpm" ENV PATH="$PNPM_HOME:$PATH" ENV COREPACK_ENABLE_DOWNLOAD_PROMPT=0 RUN corepack enable FROM base AS prod-deps COPY package.json pnpm-lock.yaml* /app/ RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --prod --frozen-lockfile RUN pnpm install --prefer-offline --no-cache --prod FROM prod-deps AS build COPY tsconfig.json /app COPY tsconfig.build.json /app COPY vite.config.ts /app COPY src /app/src RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --frozen-lockfile RUN pnpm build FROM base COPY static /app/static COPY --from=install-whisper /app/data/libs/whisper /app/data/libs/whisper COPY --from=prod-deps /app/node_modules /app/node_modules COPY --from=build /app/dist /app/dist COPY package.json /app/ # app configuration via environment variables ENV DATA_DIR_PATH=/app/data ENV DOCKER=true # number of chrome tabs to use for rendering ENV CONCURRENCY=1 # video cache - 2000MB ENV VIDEO_CACHE_SIZE_IN_BYTES=2097152000 # install kokoro, headless chrome and ensure music files are present RUN node dist/scripts/install.js CMD ["pnpm", "start"] ``` -------------------------------------------------------------------------------- /src/short-creator/libraries/Whisper.ts: -------------------------------------------------------------------------------- ```typescript import { downloadWhisperModel, installWhisperCpp, transcribe, } from "@remotion/install-whisper-cpp"; import path from "path"; import { Config } from "../../config"; import type { Caption } from "../../types/shorts"; import { logger } from "../../logger"; export const ErrorWhisper = new Error("There was an error with WhisperCpp"); export class Whisper { constructor(private config: Config) {} static async init(config: Config): Promise<Whisper> { if (!config.runningInDocker) { logger.debug("Installing WhisperCpp"); await installWhisperCpp({ to: config.whisperInstallPath, version: config.whisperVersion, printOutput: true, }); logger.debug("WhisperCpp installed"); logger.debug("Downloading Whisper model"); await downloadWhisperModel({ model: config.whisperModel, folder: path.join(config.whisperInstallPath, "models"), printOutput: config.whisperVerbose, onProgress: (downloadedBytes, totalBytes) => { const progress = `${Math.round((downloadedBytes / totalBytes) * 100)}%`; logger.debug( { progress, model: config.whisperModel }, "Downloading Whisper model", ); }, }); // todo run the jfk command to check if everything is ok logger.debug("Whisper model downloaded"); } return new Whisper(config); } // todo shall we extract it to a Caption class? async CreateCaption(audioPath: string): Promise<Caption[]> { logger.debug({ audioPath }, "Starting to transcribe audio"); const { transcription } = await transcribe({ model: this.config.whisperModel, whisperPath: this.config.whisperInstallPath, modelFolder: path.join(this.config.whisperInstallPath, "models"), whisperCppVersion: this.config.whisperVersion, inputPath: audioPath, tokenLevelTimestamps: true, printOutput: this.config.whisperVerbose, onProgress: (progress) => { logger.debug({ audioPath }, `Transcribing is ${progress} complete`); }, }); logger.debug({ audioPath }, "Transcription finished, creating captions"); const captions: Caption[] = []; transcription.forEach((record) => { if (record.text === "") { return; } record.tokens.forEach((token) => { if (token.text.startsWith("[_TT")) { return; } // if token starts without space and the previous node didn't have space either, merge them if ( captions.length > 0 && !token.text.startsWith(" ") && !captions[captions.length - 1].text.endsWith(" ") ) { captions[captions.length - 1].text += record.text; captions[captions.length - 1].endMs = record.offsets.to; return; } captions.push({ text: token.text, startMs: record.offsets.from, endMs: record.offsets.to, }); }); }); logger.debug({ audioPath, captions }, "Captions created"); return captions; } } ``` -------------------------------------------------------------------------------- /src/index.ts: -------------------------------------------------------------------------------- ```typescript /* eslint-disable @typescript-eslint/no-unused-vars */ import path from "path"; import fs from "fs-extra"; import { Kokoro } from "./short-creator/libraries/Kokoro"; import { Remotion } from "./short-creator/libraries/Remotion"; import { Whisper } from "./short-creator/libraries/Whisper"; import { FFMpeg } from "./short-creator/libraries/FFmpeg"; import { PexelsAPI } from "./short-creator/libraries/Pexels"; import { Config } from "./config"; import { ShortCreator } from "./short-creator/ShortCreator"; import { logger } from "./logger"; import { Server } from "./server/server"; import { MusicManager } from "./short-creator/music"; async function main() { const config = new Config(); try { config.ensureConfig(); } catch (err: unknown) { logger.error(err, "Error in config"); process.exit(1); } const musicManager = new MusicManager(config); try { logger.debug("checking music files"); musicManager.ensureMusicFilesExist(); } catch (error: unknown) { logger.error(error, "Missing music files"); process.exit(1); } logger.debug("initializing remotion"); const remotion = await Remotion.init(config); logger.debug("initializing kokoro"); const kokoro = await Kokoro.init(config.kokoroModelPrecision); logger.debug("initializing whisper"); const whisper = await Whisper.init(config); logger.debug("initializing ffmpeg"); const ffmpeg = await FFMpeg.init(); const pexelsApi = new PexelsAPI(config.pexelsApiKey); logger.debug("initializing the short creator"); const shortCreator = new ShortCreator( config, remotion, kokoro, whisper, ffmpeg, pexelsApi, musicManager, ); if (!config.runningInDocker) { // the project is running with npm - we need to check if the installation is correct if (fs.existsSync(config.installationSuccessfulPath)) { logger.info("the installation is successful - starting the server"); } else { logger.info( "testing if the installation was successful - this may take a while...", ); try { const audioBuffer = (await kokoro.generate("hi", "af_heart")).audio; await ffmpeg.createMp3DataUri(audioBuffer); await pexelsApi.findVideo(["dog"], 2.4); const testVideoPath = path.join(config.tempDirPath, "test.mp4"); await remotion.testRender(testVideoPath); fs.rmSync(testVideoPath, { force: true }); fs.writeFileSync(config.installationSuccessfulPath, "ok", { encoding: "utf-8", }); logger.info("the installation was successful - starting the server"); } catch (error: unknown) { logger.fatal( error, "The environment is not set up correctly - please follow the instructions in the README.md file https://github.com/gyoridavid/short-video-maker", ); process.exit(1); } } } logger.debug("initializing the server"); const server = new Server(config, shortCreator); const app = server.start(); // todo add shutdown handler } main().catch((error: unknown) => { logger.error(error, "Error starting server"); }); ``` -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- ```json { "name": "short-video-maker", "version": "1.3.4", "description": "Creates short videos for TikTok, Instagram Reels, and YouTube Shorts using the Model Context Protocol (MCP) and a REST API.", "main": "index.js", "bugs": "https://github.com/gyoridavid/short-video-maker/issues", "homepage": "https://github.com/gyoridavid/short-video-maker", "scripts": { "build": "rimraf dist && tsc --project tsconfig.build.json && vite build", "dev": "vite build --watch | node --watch -r ts-node/register src/index.ts ", "start": "node dist/index.js", "test": "vitest", "prepublishOnly": "npm run build && echo \"#!/usr/bin/env node\n$(cat dist/index.js)\" > dist/index.js && chmod +x dist/index.js", "publish:docker": "npm run publish:docker:normal && npm run publish:docker:cuda && npm run publish:docker:tiny", "publish:docker:cuda": "docker buildx build --platform linux/amd64 -t gyoridavid/short-video-maker:latest-cuda -t gyoridavid/short-video-maker:${npm_package_version}-cuda -f main-cuda.Dockerfile --push ./", "publish:docker:normal": "docker buildx build --platform linux/amd64,linux/arm64 -t gyoridavid/short-video-maker:latest -t gyoridavid/short-video-maker:${npm_package_version} -f main.Dockerfile --push ./", "publish:docker:tiny": "docker buildx build --platform linux/amd64,linux/arm64 -t gyoridavid/short-video-maker:latest-tiny -t gyoridavid/short-video-maker:${npm_package_version}-tiny -f main-tiny.Dockerfile --push ./", "ui:dev": "vite", "ui:build": "vite build", "ui:preview": "vite preview" }, "bin": { "short-video-maker": "dist/index.js" }, "files": [ "dist", "static" ], "keywords": [ "shorts", "mcp", "model context protocol", "reels", "tiktok", "youtube shorts", "youtube", "short video", "video creation", "instagram", "video", "generator", "remotion", "faceless video" ], "author": "David Gyori", "license": "MIT", "dependencies": { "@emotion/react": "^11.11.3", "@emotion/styled": "^11.11.0", "@ffmpeg-installer/ffmpeg": "^1.1.0", "@modelcontextprotocol/sdk": "^1.9.0", "@mui/icons-material": "^5.15.10", "@mui/material": "^5.15.10", "@remotion/bundler": "^4.0.286", "@remotion/cli": "^4.0.286", "@remotion/google-fonts": "^4.0.286", "@remotion/install-whisper-cpp": "^4.0.286", "@remotion/renderer": "^4.0.286", "@remotion/zod-types": "^4.0.286", "@tanstack/react-query": "^5.18.0", "@types/react-dom": "^19.1.3", "@types/react-router-dom": "^5.3.3", "axios": "^1.9.0", "content-type": "^1.0.5", "cuid": "^3.0.0", "dotenv": "^16.4.7", "express": "^4.18.2", "fluent-ffmpeg": "^2.1.3", "fs-extra": "^11.3.0", "kokoro-js": "^1.2.0", "nock": "^14.0.3", "pino": "^9.6.0", "react": "^19.1.0", "react-dom": "^19.1.0", "react-router-dom": "^7.5.3", "remotion": "^4.0.286", "zod": "^3.24.2", "zod-to-json-schema": "^3.24.5" }, "devDependencies": { "@remotion/eslint-config-flat": "^4.0.286", "@types/content-type": "^1.1.8", "@types/express": "^4.17.21", "@types/fluent-ffmpeg": "^2.1.27", "@types/fs-extra": "^11.0.4", "@types/nock": "^11.1.0", "@types/node": "^22.14.0", "@types/react": "^19.1.0", "@vitejs/plugin-react": "^4.4.1", "autoprefixer": "^10.4.16", "eslint": "^9.24.0", "postcss": "^8.4.31", "prettier": "^3.5.3", "rimraf": "^6.0.1", "tailwindcss": "^3.3.0", "ts-node": "^10.9.2", "typescript": "^5.8.3", "vite": "^6.3.4", "vitest": "^3.1.1" } } ``` -------------------------------------------------------------------------------- /src/config.ts: -------------------------------------------------------------------------------- ```typescript import path from "path"; import "dotenv/config"; import os from "os"; import fs from "fs-extra"; import pino from "pino"; import { kokoroModelPrecision, whisperModels } from "./types/shorts"; const defaultLogLevel: pino.Level = "info"; const defaultPort = 3123; const whisperVersion = "1.7.1"; const defaultWhisperModel: whisperModels = "medium.en"; // possible options: "tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large-v2", "large-v3", "large-v3-turbo" // Create the global logger const versionNumber = process.env.npm_package_version; export const logger = pino({ level: process.env.LOG_LEVEL || defaultLogLevel, timestamp: pino.stdTimeFunctions.isoTime, formatters: { level: (label) => { return { level: label }; }, }, base: { pid: process.pid, version: versionNumber, }, }); export class Config { private dataDirPath: string; private libsDirPath: string; private staticDirPath: string; public installationSuccessfulPath: string; public whisperInstallPath: string; public videosDirPath: string; public tempDirPath: string; public packageDirPath: string; public musicDirPath: string; public pexelsApiKey: string; public logLevel: pino.Level; public whisperVerbose: boolean; public port: number; public runningInDocker: boolean; public devMode: boolean; public whisperVersion: string = whisperVersion; public whisperModel: whisperModels = defaultWhisperModel; public kokoroModelPrecision: kokoroModelPrecision = "fp32"; // docker-specific, performance-related settings to prevent memory issues public concurrency?: number; public videoCacheSizeInBytes: number | null = null; constructor() { this.dataDirPath = process.env.DATA_DIR_PATH || path.join(os.homedir(), ".ai-agents-az-video-generator"); this.libsDirPath = path.join(this.dataDirPath, "libs"); this.whisperInstallPath = path.join(this.libsDirPath, "whisper"); this.videosDirPath = path.join(this.dataDirPath, "videos"); this.tempDirPath = path.join(this.dataDirPath, "temp"); this.installationSuccessfulPath = path.join( this.dataDirPath, "installation-successful", ); fs.ensureDirSync(this.dataDirPath); fs.ensureDirSync(this.libsDirPath); fs.ensureDirSync(this.videosDirPath); fs.ensureDirSync(this.tempDirPath); this.packageDirPath = path.join(__dirname, ".."); this.staticDirPath = path.join(this.packageDirPath, "static"); this.musicDirPath = path.join(this.staticDirPath, "music"); this.pexelsApiKey = process.env.PEXELS_API_KEY as string; this.logLevel = (process.env.LOG_LEVEL || defaultLogLevel) as pino.Level; this.whisperVerbose = process.env.WHISPER_VERBOSE === "true"; this.port = process.env.PORT ? parseInt(process.env.PORT) : defaultPort; this.runningInDocker = process.env.DOCKER === "true"; this.devMode = process.env.DEV === "true"; if (process.env.WHISPER_MODEL) { this.whisperModel = process.env.WHISPER_MODEL as whisperModels; } if (process.env.KOKORO_MODEL_PRECISION) { this.kokoroModelPrecision = process.env .KOKORO_MODEL_PRECISION as kokoroModelPrecision; } this.concurrency = process.env.CONCURRENCY ? parseInt(process.env.CONCURRENCY) : undefined; if (process.env.VIDEO_CACHE_SIZE_IN_BYTES) { this.videoCacheSizeInBytes = parseInt( process.env.VIDEO_CACHE_SIZE_IN_BYTES, ); } } public ensureConfig() { if (!this.pexelsApiKey) { throw new Error( "PEXELS_API_KEY environment variable is missing. Get your free API key: https://www.pexels.com/api/key/ - see how to run the project: https://github.com/gyoridavid/short-video-maker", ); } } } export const KOKORO_MODEL = "onnx-community/Kokoro-82M-v1.0-ONNX"; ``` -------------------------------------------------------------------------------- /src/components/utils.ts: -------------------------------------------------------------------------------- ```typescript import { z } from "zod"; import { type Caption, type CaptionPage, type CaptionLine, type OrientationEnum, MusicVolumeEnum, } from "../types/shorts"; import { AvailableComponentsEnum, type OrientationConfig } from "./types"; export const shortVideoSchema = z.object({ scenes: z.array( z.object({ captions: z.custom<Caption[]>(), audio: z.object({ url: z.string(), duration: z.number(), }), video: z.string(), }), ), config: z.object({ paddingBack: z.number().optional(), captionPosition: z.enum(["top", "center", "bottom"]).optional(), captionBackgroundColor: z.string().optional(), durationMs: z.number(), musicVolume: z.nativeEnum(MusicVolumeEnum).optional(), }), music: z.object({ file: z.string(), url: z.string(), start: z.number(), end: z.number(), }), }); export function createCaptionPages({ captions, lineMaxLength, lineCount, maxDistanceMs, }: { captions: Caption[]; lineMaxLength: number; lineCount: number; maxDistanceMs: number; }) { const pages = []; let currentPage: CaptionPage = { startMs: 0, endMs: 0, lines: [], }; let currentLine: CaptionLine = { texts: [], }; captions.forEach((caption, i) => { // Check if we need to start a new page due to time gap if (i > 0 && caption.startMs - currentPage.endMs > maxDistanceMs) { // Add current line if not empty if (currentLine.texts.length > 0) { currentPage.lines.push(currentLine); } // Add current page if not empty if (currentPage.lines.length > 0) { pages.push(currentPage); } // Start new page currentPage = { startMs: caption.startMs, endMs: caption.endMs, lines: [], }; currentLine = { texts: [], }; } // Check if adding this caption exceeds the line length const currentLineText = currentLine.texts.map((t) => t.text).join(" "); if ( currentLine.texts.length > 0 && currentLineText.length + 1 + caption.text.length > lineMaxLength ) { // Line is full, add it to current page currentPage.lines.push(currentLine); currentLine = { texts: [], }; // Check if page is full if (currentPage.lines.length >= lineCount) { // Page is full, add it to pages pages.push(currentPage); // Start new page currentPage = { startMs: caption.startMs, endMs: caption.endMs, lines: [], }; } } // Add caption to current line currentLine.texts.push({ text: caption.text, startMs: caption.startMs, endMs: caption.endMs, }); // Update page timing currentPage.endMs = caption.endMs; if (i === 0 || currentPage.startMs === 0) { currentPage.startMs = caption.startMs; } else { currentPage.startMs = Math.min(currentPage.startMs, caption.startMs); } }); // Don't forget to add the last line and page if (currentLine.texts.length > 0) { currentPage.lines.push(currentLine); } if (currentPage.lines.length > 0) { pages.push(currentPage); } return pages; } export function getOrientationConfig(orientation: OrientationEnum) { const config: Record<OrientationEnum, OrientationConfig> = { portrait: { width: 1080, height: 1920, component: AvailableComponentsEnum.PortraitVideo, }, landscape: { width: 1920, height: 1080, component: AvailableComponentsEnum.LandscapeVideo, }, }; return config[orientation]; } export function calculateVolume( level: MusicVolumeEnum = MusicVolumeEnum.high, ): [number, boolean] { switch (level) { case "muted": return [0, true]; case "low": return [0.2, false]; case "medium": return [0.45, false]; case "high": return [0.7, false]; default: return [0.7, false]; } } ``` -------------------------------------------------------------------------------- /src/types/shorts.ts: -------------------------------------------------------------------------------- ```typescript import z from "zod"; export enum MusicMoodEnum { sad = "sad", melancholic = "melancholic", happy = "happy", euphoric = "euphoric/high", excited = "excited", chill = "chill", uneasy = "uneasy", angry = "angry", dark = "dark", hopeful = "hopeful", contemplative = "contemplative", funny = "funny/quirky", } export enum CaptionPositionEnum { top = "top", center = "center", bottom = "bottom", } export type Scene = { captions: Caption[]; video: string; audio: { url: string; duration: number; }; }; export const sceneInput = z.object({ text: z.string().describe("Text to be spoken in the video"), searchTerms: z .array(z.string()) .describe( "Search term for video, 1 word, and at least 2-3 search terms should be provided for each scene. Make sure to match the overall context with the word - regardless what the video search result would be.", ), }); export type SceneInput = z.infer<typeof sceneInput>; export enum VoiceEnum { af_heart = "af_heart", af_alloy = "af_alloy", af_aoede = "af_aoede", af_bella = "af_bella", af_jessica = "af_jessica", af_kore = "af_kore", af_nicole = "af_nicole", af_nova = "af_nova", af_river = "af_river", af_sarah = "af_sarah", af_sky = "af_sky", am_adam = "am_adam", am_echo = "am_echo", am_eric = "am_eric", am_fenrir = "am_fenrir", am_liam = "am_liam", am_michael = "am_michael", am_onyx = "am_onyx", am_puck = "am_puck", am_santa = "am_santa", bf_emma = "bf_emma", bf_isabella = "bf_isabella", bm_george = "bm_george", bm_lewis = "bm_lewis", bf_alice = "bf_alice", bf_lily = "bf_lily", bm_daniel = "bm_daniel", bm_fable = "bm_fable", } export enum OrientationEnum { landscape = "landscape", portrait = "portrait", } export enum MusicVolumeEnum { muted = "muted", low = "low", medium = "medium", high = "high", } export const renderConfig = z.object({ paddingBack: z .number() .optional() .describe( "For how long the video should be playing after the speech is done, in milliseconds. 1500 is a good value.", ), music: z .nativeEnum(MusicMoodEnum) .optional() .describe("Music tag to be used to find the right music for the video"), captionPosition: z .nativeEnum(CaptionPositionEnum) .optional() .describe("Position of the caption in the video"), captionBackgroundColor: z .string() .optional() .describe( "Background color of the caption, a valid css color, default is blue", ), voice: z .nativeEnum(VoiceEnum) .optional() .describe("Voice to be used for the speech, default is af_heart"), orientation: z .nativeEnum(OrientationEnum) .optional() .describe("Orientation of the video, default is portrait"), musicVolume: z .nativeEnum(MusicVolumeEnum) .optional() .describe("Volume of the music, default is high"), }); export type RenderConfig = z.infer<typeof renderConfig>; export type Voices = `${VoiceEnum}`; export type Video = { id: string; url: string; width: number; height: number; }; export type Caption = { text: string; startMs: number; endMs: number; }; export type CaptionLine = { texts: Caption[]; }; export type CaptionPage = { startMs: number; endMs: number; lines: CaptionLine[]; }; export const createShortInput = z.object({ scenes: z.array(sceneInput).describe("Each scene to be created"), config: renderConfig.describe("Configuration for rendering the video"), }); export type CreateShortInput = z.infer<typeof createShortInput>; export type VideoStatus = "processing" | "ready" | "failed"; export type Music = { file: string; start: number; end: number; mood: string; }; export type MusicForVideo = Music & { url: string; }; export type MusicTag = `${MusicMoodEnum}`; export type kokoroModelPrecision = "fp32" | "fp16" | "q8" | "q4" | "q4f16"; export type whisperModels = | "tiny" | "tiny.en" | "base" | "base.en" | "small" | "small.en" | "medium" | "medium.en" | "large-v1" | "large-v2" | "large-v3" | "large-v3-turbo"; ``` -------------------------------------------------------------------------------- /src/short-creator/music.ts: -------------------------------------------------------------------------------- ```typescript import path from "path"; import fs from "fs-extra"; import { type Music, MusicForVideo, MusicMoodEnum } from "../types/shorts"; import { Config } from "../config"; export class MusicManager { private static musicList: Music[] = [ { file: "Sly Sky - Telecasted.mp3", start: 0, end: 152, mood: MusicMoodEnum.melancholic, }, { file: "No.2 Remembering Her - Esther Abrami.mp3", start: 2, end: 134, mood: MusicMoodEnum.melancholic, }, { file: "Champion - Telecasted.mp3", start: 0, end: 142, mood: MusicMoodEnum.chill, }, { file: "Oh Please - Telecasted.mp3", start: 0, end: 154, mood: MusicMoodEnum.chill, }, { file: "Jetski - Telecasted.mp3", start: 0, end: 142, mood: MusicMoodEnum.uneasy, }, { file: "Phantom - Density & Time.mp3", start: 0, end: 178, mood: MusicMoodEnum.uneasy, }, { file: "On The Hunt - Andrew Langdon.mp3", start: 0, end: 95, mood: MusicMoodEnum.uneasy, }, { file: "Name The Time And Place - Telecasted.mp3", start: 0, end: 142, mood: MusicMoodEnum.excited, }, { file: "Delayed Baggage - Ryan Stasik.mp3", start: 3, end: 108, mood: MusicMoodEnum.euphoric, }, { file: "Like It Loud - Dyalla.mp3", start: 4, end: 160, mood: MusicMoodEnum.euphoric, }, { file: "Organic Guitar House - Dyalla.mp3", start: 2, end: 160, mood: MusicMoodEnum.euphoric, }, { file: "Honey, I Dismembered The Kids - Ezra Lipp.mp3", start: 2, end: 144, mood: MusicMoodEnum.dark, }, { file: "Night Hunt - Jimena Contreras.mp3", start: 0, end: 88, mood: MusicMoodEnum.dark, }, { file: "Curse of the Witches - Jimena Contreras.mp3", start: 0, end: 102, mood: MusicMoodEnum.dark, }, { file: "Restless Heart - Jimena Contreras.mp3", start: 0, end: 94, mood: MusicMoodEnum.sad, }, { file: "Heartbeat Of The Wind - Asher Fulero.mp3", start: 0, end: 124, mood: MusicMoodEnum.sad, }, { file: "Hopeless - Jimena Contreras.mp3", start: 0, end: 250, mood: MusicMoodEnum.sad, }, { file: "Touch - Anno Domini Beats.mp3", start: 0, end: 165, mood: MusicMoodEnum.happy, }, { file: "Cafecito por la Manana - Cumbia Deli.mp3", start: 0, end: 184, mood: MusicMoodEnum.happy, }, { file: "Aurora on the Boulevard - National Sweetheart.mp3", start: 0, end: 130, mood: MusicMoodEnum.happy, }, { file: "Buckle Up - Jeremy Korpas.mp3", start: 0, end: 128, mood: MusicMoodEnum.angry, }, { file: "Twin Engines - Jeremy Korpas.mp3", start: 0, end: 120, mood: MusicMoodEnum.angry, }, { file: "Hopeful - Nat Keefe.mp3", start: 0, end: 175, mood: MusicMoodEnum.hopeful, }, { file: "Hopeful Freedom - Asher Fulero.mp3", start: 1, end: 172, mood: MusicMoodEnum.hopeful, }, { file: "Crystaline - Quincas Moreira.mp3", start: 0, end: 140, mood: MusicMoodEnum.contemplative, }, { file: "Final Soliloquy - Asher Fulero.mp3", start: 1, end: 178, mood: MusicMoodEnum.contemplative, }, { file: "Seagull - Telecasted.mp3", start: 0, end: 123, mood: MusicMoodEnum.funny, }, { file: "Banjo Doops - Joel Cummins.mp3", start: 0, end: 98, mood: MusicMoodEnum.funny, }, { file: "Baby Animals Playing - Joel Cummins.mp3", start: 0, end: 124, mood: MusicMoodEnum.funny, }, { file: "Sinister - Anno Domini Beats.mp3", start: 0, end: 215, mood: MusicMoodEnum.dark, }, { file: "Traversing - Godmode.mp3", start: 0, end: 95, mood: MusicMoodEnum.dark, }, ]; constructor(private config: Config) {} public musicList(): MusicForVideo[] { return MusicManager.musicList.map((music: Music) => ({ ...music, url: `http://localhost:${this.config.port}/api/music/${encodeURIComponent(music.file)}`, })); } private musicFileExist(music: Music): boolean { return fs.existsSync(path.join(this.config.musicDirPath, music.file)); } public ensureMusicFilesExist(): void { for (const music of this.musicList()) { if (!this.musicFileExist(music)) { throw new Error(`Music file not found: ${music.file}`); } } } } ``` -------------------------------------------------------------------------------- /src/components/videos/PortraitVideo.tsx: -------------------------------------------------------------------------------- ```typescript import { AbsoluteFill, Sequence, useCurrentFrame, useVideoConfig, Audio, OffthreadVideo, } from "remotion"; import { z } from "zod"; import { loadFont } from "@remotion/google-fonts/BarlowCondensed"; import { calculateVolume, createCaptionPages, shortVideoSchema, } from "../utils"; const { fontFamily } = loadFont(); // "Barlow Condensed" export const PortraitVideo: React.FC<z.infer<typeof shortVideoSchema>> = ({ scenes, music, config, }) => { const frame = useCurrentFrame(); const { fps } = useVideoConfig(); const captionBackgroundColor = config.captionBackgroundColor ?? "blue"; const activeStyle = { backgroundColor: captionBackgroundColor, padding: "10px", marginLeft: "-10px", marginRight: "-10px", borderRadius: "10px", }; const captionPosition = config.captionPosition ?? "center"; let captionStyle = {}; if (captionPosition === "top") { captionStyle = { top: 100 }; } if (captionPosition === "center") { captionStyle = { top: "50%", transform: "translateY(-50%)" }; } if (captionPosition === "bottom") { captionStyle = { bottom: 100 }; } const [musicVolume, musicMuted] = calculateVolume(config.musicVolume); return ( <AbsoluteFill style={{ backgroundColor: "white" }}> <Audio loop src={music.url} startFrom={music.start * fps} endAt={music.end * fps} volume={() => musicVolume} muted={musicMuted} /> {scenes.map((scene, i) => { const { captions, audio, video } = scene; const pages = createCaptionPages({ captions, lineMaxLength: 20, lineCount: 1, maxDistanceMs: 1000, }); // Calculate the start and end time of the scene const startFrame = scenes.slice(0, i).reduce((acc, curr) => { return acc + curr.audio.duration; }, 0) * fps; let durationInFrames = scenes.slice(0, i + 1).reduce((acc, curr) => { return acc + curr.audio.duration; }, 0) * fps; if (config.paddingBack && i === scenes.length - 1) { durationInFrames += (config.paddingBack / 1000) * fps; } return ( <Sequence from={startFrame} durationInFrames={durationInFrames} key={`scene-${i}`} > <OffthreadVideo src={video} muted /> <Audio src={audio.url} /> {pages.map((page, j) => { return ( <Sequence key={`scene-${i}-page-${j}`} from={Math.round((page.startMs / 1000) * fps)} durationInFrames={Math.round( ((page.endMs - page.startMs) / 1000) * fps, )} > <div style={{ position: "absolute", left: 0, width: "100%", ...captionStyle, }} > {page.lines.map((line, k) => { return ( <p style={{ fontSize: "6em", fontFamily: fontFamily, fontWeight: "black", color: "white", WebkitTextStroke: "2px black", WebkitTextFillColor: "white", textShadow: "0px 0px 10px black", textAlign: "center", width: "100%", // uppercase textTransform: "uppercase", }} key={`scene-${i}-page-${j}-line-${k}`} > {line.texts.map((text, l) => { const active = frame >= startFrame + (text.startMs / 1000) * fps && frame <= startFrame + (text.endMs / 1000) * fps; return ( <> <span style={{ fontWeight: "bold", ...(active ? activeStyle : {}), }} key={`scene-${i}-page-${j}-line-${k}-text-${l}`} > {text.text} </span> {l < line.texts.length - 1 ? " " : ""} </> ); })} </p> ); })} </div> </Sequence> ); })} </Sequence> ); })} </AbsoluteFill> ); }; ``` -------------------------------------------------------------------------------- /src/components/videos/LandscapeVideo.tsx: -------------------------------------------------------------------------------- ```typescript import { AbsoluteFill, Sequence, useCurrentFrame, useVideoConfig, Audio, OffthreadVideo, } from "remotion"; import { z } from "zod"; import { loadFont } from "@remotion/google-fonts/BarlowCondensed"; import { calculateVolume, createCaptionPages, shortVideoSchema, } from "../utils"; const { fontFamily } = loadFont(); // "Barlow Condensed" export const LandscapeVideo: React.FC<z.infer<typeof shortVideoSchema>> = ({ scenes, music, config, }) => { const frame = useCurrentFrame(); const { fps } = useVideoConfig(); const captionBackgroundColor = config.captionBackgroundColor ?? "blue"; const activeStyle = { backgroundColor: captionBackgroundColor, padding: "10px", marginLeft: "-10px", marginRight: "-10px", borderRadius: "10px", }; const captionPosition = config.captionPosition ?? "center"; let captionStyle = {}; if (captionPosition === "top") { captionStyle = { top: 100 }; } if (captionPosition === "center") { captionStyle = { top: "50%", transform: "translateY(-50%)" }; } if (captionPosition === "bottom") { captionStyle = { bottom: 100 }; } const [musicVolume, musicMuted] = calculateVolume(config.musicVolume); return ( <AbsoluteFill style={{ backgroundColor: "white" }}> <Audio loop src={music.url} startFrom={music.start * fps} endAt={music.end * fps} volume={() => musicVolume} muted={musicMuted} /> {scenes.map((scene, i) => { const { captions, audio, video } = scene; const pages = createCaptionPages({ captions, lineMaxLength: 30, lineCount: 1, maxDistanceMs: 1000, }); // Calculate the start and end time of the scene const startFrame = scenes.slice(0, i).reduce((acc, curr) => { return acc + curr.audio.duration; }, 0) * fps; let durationInFrames = scenes.slice(0, i + 1).reduce((acc, curr) => { return acc + curr.audio.duration; }, 0) * fps; if (config.paddingBack && i === scenes.length - 1) { durationInFrames += (config.paddingBack / 1000) * fps; } return ( <Sequence from={startFrame} durationInFrames={durationInFrames} key={`scene-${i}`} > <OffthreadVideo src={video} muted /> <Audio src={audio.url} /> {pages.map((page, j) => { return ( <Sequence key={`scene-${i}-page-${j}`} from={Math.round((page.startMs / 1000) * fps)} durationInFrames={Math.round( ((page.endMs - page.startMs) / 1000) * fps, )} > <div style={{ position: "absolute", left: 0, width: "100%", ...captionStyle, }} > {page.lines.map((line, k) => { return ( <p style={{ fontSize: "8em", fontFamily: fontFamily, fontWeight: "black", color: "white", WebkitTextStroke: "2px black", WebkitTextFillColor: "white", textShadow: "0px 0px 10px black", textAlign: "center", width: "100%", // uppercase textTransform: "uppercase", }} key={`scene-${i}-page-${j}-line-${k}`} > {line.texts.map((text, l) => { const active = frame >= startFrame + (text.startMs / 1000) * fps && frame <= startFrame + (text.endMs / 1000) * fps; return ( <> <span style={{ fontWeight: "bold", ...(active ? activeStyle : {}), }} key={`scene-${i}-page-${j}-line-${k}-text-${l}`} > {text.text} </span> {l < line.texts.length - 1 ? " " : ""} </> ); })} </p> ); })} </div> </Sequence> ); })} </Sequence> ); })} </AbsoluteFill> ); }; ``` -------------------------------------------------------------------------------- /src/ui/pages/VideoList.tsx: -------------------------------------------------------------------------------- ```typescript import React, { useState, useEffect } from 'react'; import { useNavigate } from 'react-router-dom'; import axios from 'axios'; import { Box, Typography, Paper, Button, CircularProgress, Alert, List, ListItem, ListItemText, ListItemSecondaryAction, IconButton, Divider } from '@mui/material'; import AddIcon from '@mui/icons-material/Add'; import PlayArrowIcon from '@mui/icons-material/PlayArrow'; import DeleteIcon from '@mui/icons-material/Delete'; interface VideoItem { id: string; status: string; } const VideoList: React.FC = () => { const navigate = useNavigate(); const [videos, setVideos] = useState<VideoItem[]>([]); const [loading, setLoading] = useState(true); const [error, setError] = useState<string | null>(null); const fetchVideos = async () => { try { const response = await axios.get('/api/short-videos'); setVideos(response.data.videos || []); setLoading(false); } catch (err) { setError('Failed to fetch videos'); setLoading(false); console.error('Error fetching videos:', err); } }; useEffect(() => { fetchVideos(); }, []); const handleCreateNew = () => { navigate('/create'); }; const handleVideoClick = (id: string) => { navigate(`/video/${id}`); }; const handleDeleteVideo = async (id: string, event: React.MouseEvent<HTMLButtonElement>) => { event.stopPropagation(); try { await axios.delete(`/api/short-video/${id}`); fetchVideos(); } catch (err) { setError('Failed to delete video'); console.error('Error deleting video:', err); } }; const capitalizeFirstLetter = (str: string) => { if (!str || typeof str !== 'string') return 'Unknown'; return str.charAt(0).toUpperCase() + str.slice(1); }; if (loading) { return ( <Box display="flex" justifyContent="center" alignItems="center" height="80vh"> <CircularProgress /> </Box> ); } return ( <Box maxWidth="md" mx="auto" py={4}> <Box display="flex" justifyContent="space-between" alignItems="center" mb={4}> <Typography variant="h4" component="h1"> Your Videos </Typography> <Button variant="contained" color="primary" startIcon={<AddIcon />} onClick={handleCreateNew} > Create New Video </Button> </Box> {error && ( <Alert severity="error" sx={{ mb: 3 }}>{error}</Alert> )} {videos.length === 0 ? ( <Paper sx={{ p: 4, textAlign: 'center' }}> <Typography variant="body1" color="text.secondary" gutterBottom> You haven't created any videos yet. </Typography> <Button variant="outlined" startIcon={<AddIcon />} onClick={handleCreateNew} sx={{ mt: 2 }} > Create Your First Video </Button> </Paper> ) : ( <Paper> <List> {videos.map((video, index) => { const videoId = video?.id || ''; const videoStatus = video?.status || 'unknown'; return ( <div key={videoId}> {index > 0 && <Divider />} <ListItem button onClick={() => handleVideoClick(videoId)} sx={{ py: 2, '&:hover': { backgroundColor: 'rgba(0, 0, 0, 0.04)' } }} > <ListItemText primary={`Video ${videoId.substring(0, 8)}...`} secondary={ <Typography component="span" variant="body2" color={ videoStatus === 'ready' ? 'success.main' : videoStatus === 'processing' ? 'info.main' : videoStatus === 'failed' ? 'error.main' : 'text.secondary' } > {capitalizeFirstLetter(videoStatus)} </Typography> } /> <ListItemSecondaryAction> {videoStatus === 'ready' && ( <IconButton edge="end" aria-label="play" onClick={() => handleVideoClick(videoId)} color="primary" > <PlayArrowIcon /> </IconButton> )} <IconButton edge="end" aria-label="delete" onClick={(e) => handleDeleteVideo(videoId, e)} color="error" sx={{ ml: 1 }} > <DeleteIcon /> </IconButton> </ListItemSecondaryAction> </ListItem> </div> ); })} </List> </Paper> )} </Box> ); }; export default VideoList; ``` -------------------------------------------------------------------------------- /src/short-creator/libraries/Pexels.ts: -------------------------------------------------------------------------------- ```typescript /* eslint-disable @remotion/deterministic-randomness */ import { getOrientationConfig } from "../../components/utils"; import { logger } from "../../logger"; import { OrientationEnum, type Video } from "../../types/shorts"; const jokerTerms: string[] = ["nature", "globe", "space", "ocean"]; const durationBufferSeconds = 3; const defaultTimeoutMs = 5000; const retryTimes = 3; export class PexelsAPI { constructor(private API_KEY: string) {} private async _findVideo( searchTerm: string, minDurationSeconds: number, excludeIds: string[], orientation: OrientationEnum, timeout: number, ): Promise<Video> { if (!this.API_KEY) { throw new Error("API key not set"); } logger.debug( { searchTerm, minDurationSeconds, orientation }, "Searching for video in Pexels API", ); const headers = new Headers(); headers.append("Authorization", this.API_KEY); const response = await fetch( `https://api.pexels.com/videos/search?orientation=${orientation}&size=medium&per_page=80&query=${encodeURIComponent(searchTerm)}`, { method: "GET", headers, redirect: "follow", signal: AbortSignal.timeout(timeout), }, ) .then((res) => { if (!res.ok) { if (res.status === 401) { throw new Error( "Invalid Pexels API key - please make sure you get a valid key from https://www.pexels.com/api and set it in the environment variable PEXELS_API_KEY", ); } throw new Error(`Pexels API error: ${res.status} ${res.statusText}`); } return res.json(); }) .catch((error: unknown) => { logger.error(error, "Error fetching videos from Pexels API"); throw error; }); const videos = response.videos as { id: string; duration: number; video_files: { fps: number; quality: string; width: number; height: number; id: string; link: string; }[]; }[]; const { width: requiredVideoWidth, height: requiredVideoHeight } = getOrientationConfig(orientation); if (!videos || videos.length === 0) { logger.error( { searchTerm, orientation }, "No videos found in Pexels API", ); throw new Error("No videos found"); } // find all the videos that fits the criteria, then select one randomly const filteredVideos = videos .map((video) => { if (excludeIds.includes(video.id)) { return; } if (!video.video_files.length) { return; } // calculate the real duration of the video by converting the FPS to 25 const fps = video.video_files[0].fps; const duration = fps < 25 ? video.duration * (fps / 25) : video.duration; if (duration >= minDurationSeconds + durationBufferSeconds) { for (const file of video.video_files) { if ( file.quality === "hd" && file.width === requiredVideoWidth && file.height === requiredVideoHeight ) { return { id: video.id, url: file.link, width: file.width, height: file.height, }; } } } }) .filter(Boolean); if (!filteredVideos.length) { logger.error({ searchTerm }, "No videos found in Pexels API"); throw new Error("No videos found"); } const video = filteredVideos[ Math.floor(Math.random() * filteredVideos.length) ] as Video; logger.debug( { searchTerm, video: video, minDurationSeconds, orientation }, "Found video from Pexels API", ); return video; } async findVideo( searchTerms: string[], minDurationSeconds: number, excludeIds: string[] = [], orientation: OrientationEnum = OrientationEnum.portrait, timeout: number = defaultTimeoutMs, retryCounter: number = 0, ): Promise<Video> { // shuffle the search terms to randomize the search order const shuffledJokerTerms = jokerTerms.sort(() => Math.random() - 0.5); const shuffledSearchTerms = searchTerms.sort(() => Math.random() - 0.5); for (const searchTerm of [...shuffledSearchTerms, ...shuffledJokerTerms]) { try { return await this._findVideo( searchTerm, minDurationSeconds, excludeIds, orientation, timeout, ); } catch (error: unknown) { if ( error instanceof Error && error instanceof DOMException && error.name === "TimeoutError" ) { if (retryCounter < retryTimes) { logger.warn( { searchTerm, retryCounter }, "Timeout error, retrying...", ); return await this.findVideo( searchTerms, minDurationSeconds, excludeIds, orientation, timeout, retryCounter + 1, ); } logger.error( { searchTerm, retryCounter }, "Timeout error, retry limit reached", ); throw error; } logger.error(error, "Error finding video in Pexels API for term"); } } logger.error( { searchTerms }, "No videos found in Pexels API for the given terms", ); throw new Error("No videos found in Pexels API"); } } ``` -------------------------------------------------------------------------------- /src/ui/pages/VideoDetails.tsx: -------------------------------------------------------------------------------- ```typescript import React, { useState, useEffect, useRef } from 'react'; import { useParams, useNavigate } from 'react-router-dom'; import axios from 'axios'; import { Box, Typography, Paper, Button, CircularProgress, Alert, Grid } from '@mui/material'; import ArrowBackIcon from '@mui/icons-material/ArrowBack'; import DownloadIcon from '@mui/icons-material/Download'; import { VideoStatus } from '../../types/shorts'; const VideoDetails: React.FC = () => { const { videoId } = useParams<{ videoId: string }>(); const navigate = useNavigate(); const [loading, setLoading] = useState(true); const [error, setError] = useState<string | null>(null); const [status, setStatus] = useState<VideoStatus>('processing'); const intervalRef = useRef<NodeJS.Timeout | null>(null); const isMounted = useRef(true); const checkVideoStatus = async () => { try { const response = await axios.get(`/api/short-video/${videoId}/status`); const videoStatus = response.data.status; if (isMounted.current) { setStatus(videoStatus || 'unknown'); console.log("videoStatus", videoStatus); if (videoStatus !== 'processing') { console.log("video is not processing"); console.log("interval", intervalRef.current); if (intervalRef.current) { console.log("clearing interval"); clearInterval(intervalRef.current); intervalRef.current = null; } } setLoading(false); } } catch (error) { if (isMounted.current) { setError('Failed to fetch video status'); setStatus('failed'); setLoading(false); console.error('Error fetching video status:', error); if (intervalRef.current) { clearInterval(intervalRef.current); intervalRef.current = null; } } } }; useEffect(() => { checkVideoStatus(); intervalRef.current = setInterval(() => { checkVideoStatus(); }, 5000); return () => { isMounted.current = false; if (intervalRef.current) { clearInterval(intervalRef.current); intervalRef.current = null; } }; }, [videoId]); const handleBack = () => { navigate('/'); }; const renderContent = () => { if (loading) { return ( <Box display="flex" justifyContent="center" alignItems="center" minHeight="30vh"> <CircularProgress /> </Box> ); } if (error) { return <Alert severity="error">{error}</Alert>; } if (status === 'processing') { return ( <Box textAlign="center" py={4}> <CircularProgress size={60} sx={{ mb: 2 }} /> <Typography variant="h6">Your video is being created...</Typography> <Typography variant="body1" color="text.secondary"> This may take a few minutes. Please wait. </Typography> </Box> ); } if (status === 'ready') { return ( <Box> <Box mb={3} textAlign="center"> <Typography variant="h6" color="success.main" gutterBottom> Your video is ready! </Typography> </Box> <Box sx={{ position: 'relative', paddingTop: '56.25%', mb: 3, backgroundColor: '#000' }}> <video controls autoPlay style={{ position: 'absolute', top: 0, left: 0, width: '100%', height: '100%', }} src={`/api/short-video/${videoId}`} /> </Box> <Box textAlign="center"> <Button component="a" href={`/api/short-video/${videoId}`} download variant="contained" color="primary" startIcon={<DownloadIcon />} sx={{ textDecoration: 'none' }} > Download Video </Button> </Box> </Box> ); } if (status === 'failed') { return ( <Alert severity="error" sx={{ mb: 3 }}> Video processing failed. Please try again with different settings. </Alert> ); } return ( <Alert severity="info" sx={{ mb: 3 }}> Unknown video status. Please try refreshing the page. </Alert> ); }; const capitalizeFirstLetter = (str: string) => { if (!str || typeof str !== 'string') return 'Unknown'; return str.charAt(0).toUpperCase() + str.slice(1); }; return ( <Box maxWidth="md" mx="auto" py={4}> <Box display="flex" alignItems="center" mb={3}> <Button startIcon={<ArrowBackIcon />} onClick={handleBack} sx={{ mr: 2 }} > Back to videos </Button> <Typography variant="h4" component="h1"> Video Details </Typography> </Box> <Paper sx={{ p: 3 }}> <Grid container spacing={2} mb={3}> <Grid item xs={12} sm={6}> <Typography variant="body2" color="text.secondary"> Video ID </Typography> <Typography variant="body1"> {videoId || 'Unknown'} </Typography> </Grid> <Grid item xs={12} sm={6}> <Typography variant="body2" color="text.secondary"> Status </Typography> <Typography variant="body1" color={ status === 'ready' ? 'success.main' : status === 'processing' ? 'info.main' : status === 'failed' ? 'error.main' : 'text.primary' } > {capitalizeFirstLetter(status)} </Typography> </Grid> </Grid> {renderContent()} </Paper> </Box> ); }; export default VideoDetails; ``` -------------------------------------------------------------------------------- /src/server/routers/rest.ts: -------------------------------------------------------------------------------- ```typescript import express from "express"; import type { Request as ExpressRequest, Response as ExpressResponse, } from "express"; import fs from "fs-extra"; import path from "path"; import { validateCreateShortInput } from "../validator"; import { ShortCreator } from "../../short-creator/ShortCreator"; import { logger } from "../../logger"; import { Config } from "../../config"; // todo abstract class export class APIRouter { public router: express.Router; private shortCreator: ShortCreator; private config: Config; constructor(config: Config, shortCreator: ShortCreator) { this.config = config; this.router = express.Router(); this.shortCreator = shortCreator; this.router.use(express.json()); this.setupRoutes(); } private setupRoutes() { this.router.post( "/short-video", async (req: ExpressRequest, res: ExpressResponse) => { try { const input = validateCreateShortInput(req.body); logger.info({ input }, "Creating short video"); const videoId = this.shortCreator.addToQueue( input.scenes, input.config, ); res.status(201).json({ videoId, }); } catch (error: unknown) { logger.error(error, "Error validating input"); // Handle validation errors specifically if (error instanceof Error && error.message.startsWith("{")) { try { const errorData = JSON.parse(error.message); res.status(400).json({ error: "Validation failed", message: errorData.message, missingFields: errorData.missingFields, }); return; } catch (parseError: unknown) { logger.error(parseError, "Error parsing validation error"); } } // Fallback for other errors res.status(400).json({ error: "Invalid input", message: error instanceof Error ? error.message : "Unknown error", }); } }, ); this.router.get( "/short-video/:videoId/status", async (req: ExpressRequest, res: ExpressResponse) => { const { videoId } = req.params; if (!videoId) { res.status(400).json({ error: "videoId is required", }); return; } const status = this.shortCreator.status(videoId); res.status(200).json({ status, }); }, ); this.router.get( "/music-tags", (req: ExpressRequest, res: ExpressResponse) => { res.status(200).json(this.shortCreator.ListAvailableMusicTags()); }, ); this.router.get("/voices", (req: ExpressRequest, res: ExpressResponse) => { res.status(200).json(this.shortCreator.ListAvailableVoices()); }); this.router.get( "/short-videos", (req: ExpressRequest, res: ExpressResponse) => { const videos = this.shortCreator.listAllVideos(); res.status(200).json({ videos, }); }, ); this.router.delete( "/short-video/:videoId", (req: ExpressRequest, res: ExpressResponse) => { const { videoId } = req.params; if (!videoId) { res.status(400).json({ error: "videoId is required", }); return; } this.shortCreator.deleteVideo(videoId); res.status(200).json({ success: true, }); }, ); this.router.get( "/tmp/:tmpFile", (req: ExpressRequest, res: ExpressResponse) => { const { tmpFile } = req.params; if (!tmpFile) { res.status(400).json({ error: "tmpFile is required", }); return; } const tmpFilePath = path.join(this.config.tempDirPath, tmpFile); if (!fs.existsSync(tmpFilePath)) { res.status(404).json({ error: "tmpFile not found", }); return; } if (tmpFile.endsWith(".mp3")) { res.setHeader("Content-Type", "audio/mpeg"); } if (tmpFile.endsWith(".wav")) { res.setHeader("Content-Type", "audio/wav"); } if (tmpFile.endsWith(".mp4")) { res.setHeader("Content-Type", "video/mp4"); } const tmpFileStream = fs.createReadStream(tmpFilePath); tmpFileStream.on("error", (error) => { logger.error(error, "Error reading tmp file"); res.status(500).json({ error: "Error reading tmp file", tmpFile, }); }); tmpFileStream.pipe(res); }, ); this.router.get( "/music/:fileName", (req: ExpressRequest, res: ExpressResponse) => { const { fileName } = req.params; if (!fileName) { res.status(400).json({ error: "fileName is required", }); return; } const musicFilePath = path.join(this.config.musicDirPath, fileName); if (!fs.existsSync(musicFilePath)) { res.status(404).json({ error: "music file not found", }); return; } const musicFileStream = fs.createReadStream(musicFilePath); musicFileStream.on("error", (error) => { logger.error(error, "Error reading music file"); res.status(500).json({ error: "Error reading music file", fileName, }); }); musicFileStream.pipe(res); }, ); this.router.get( "/short-video/:videoId", (req: ExpressRequest, res: ExpressResponse) => { try { const { videoId } = req.params; if (!videoId) { res.status(400).json({ error: "videoId is required", }); return; } const video = this.shortCreator.getVideo(videoId); res.setHeader("Content-Type", "video/mp4"); res.setHeader( "Content-Disposition", `inline; filename=${videoId}.mp4`, ); res.send(video); } catch (error: unknown) { logger.error(error, "Error getting video"); res.status(404).json({ error: "Video not found", }); } }, ); } } ``` -------------------------------------------------------------------------------- /src/short-creator/ShortCreator.test.ts: -------------------------------------------------------------------------------- ```typescript process.env.LOG_LEVEL = "debug"; import { test, expect, vi } from "vitest"; import fs from "fs-extra"; import { ShortCreator } from "./ShortCreator"; import { Kokoro } from "./libraries/Kokoro"; import { Remotion } from "./libraries/Remotion"; import { Whisper } from "./libraries/Whisper"; import { FFMpeg } from "./libraries/FFmpeg"; import { PexelsAPI } from "./libraries/Pexels"; import { Config } from "../config"; import { MusicManager } from "./music"; // mock fs-extra vi.mock("fs-extra", async () => { const { createFsFromVolume, Volume } = await import("memfs"); const vol = Volume.fromJSON({ "/Users/gyoridavid/.ai-agents-az-video-generator/videos/video-1.mp4": "mock video content 1", "/Users/gyoridavid/.ai-agents-az-video-generator/videos/video-2.mp4": "mock video content 2", "/Users/gyoridavid/.ai-agents-az-video-generator/temp": null, "/Users/gyoridavid/.ai-agents-az-video-generator/libs": null, "/static/music/happy-music.mp3": "mock music content", "/static/music/sad-music.mp3": "mock music content", "/static/music/chill-music.mp3": "mock music content", }); const memfs = createFsFromVolume(vol); const fsExtra = { ...memfs, // fs-extra specific methods ensureDirSync: vi.fn((path) => { try { memfs.mkdirSync(path, { recursive: true }); } catch (error) {} }), removeSync: vi.fn((path) => { try { const stats = memfs.statSync(path); if (stats.isDirectory()) { // This is simplified and won't handle nested directories memfs.rmdirSync(path); } else { memfs.unlinkSync(path); } } catch (error) {} }), createWriteStream: vi.fn(() => ({ on: vi.fn(), write: vi.fn(), end: vi.fn(), })), readFileSync: vi.fn((path) => { return memfs.readFileSync(path); }), }; return { ...fsExtra, default: fsExtra, }; }); // Mock fluent-ffmpeg vi.mock("fluent-ffmpeg", () => { const mockOn = vi.fn().mockReturnThis(); const mockSave = vi.fn().mockReturnThis(); const mockPipe = vi.fn().mockReturnThis(); const ffmpegMock = vi.fn(() => ({ input: vi.fn().mockReturnThis(), audioCodec: vi.fn().mockReturnThis(), audioBitrate: vi.fn().mockReturnThis(), audioChannels: vi.fn().mockReturnThis(), audioFrequency: vi.fn().mockReturnThis(), toFormat: vi.fn().mockReturnThis(), on: mockOn, save: mockSave, pipe: mockPipe, })); ffmpegMock.setFfmpegPath = vi.fn(); return { default: ffmpegMock }; }); // mock kokoro-js vi.mock("kokoro-js", () => { return { KokoroTTS: { from_pretrained: vi.fn().mockResolvedValue({ generate: vi.fn().mockResolvedValue({ toWav: vi.fn().mockReturnValue(new ArrayBuffer(8)), audio: new ArrayBuffer(8), sampling_rate: 44100, }), }), }, }; }); // mock remotion vi.mock("@remotion/bundler", () => { return { bundle: vi.fn().mockResolvedValue("mocked-bundled-url"), }; }); vi.mock("@remotion/renderer", () => { return { renderMedia: vi.fn().mockResolvedValue(undefined), selectComposition: vi.fn().mockResolvedValue({ width: 1080, height: 1920, fps: 30, durationInFrames: 300, }), ensureBrowser: vi.fn().mockResolvedValue(undefined), }; }); // mock whisper vi.mock("@remotion/install-whisper-cpp", () => { return { downloadWhisperModel: vi.fn().mockResolvedValue(undefined), installWhisperCpp: vi.fn().mockResolvedValue(undefined), transcribe: vi.fn().mockResolvedValue({ transcription: [ { text: "This is a mock transcription.", offsets: { from: 0, to: 2000 }, tokens: [ { text: "This", timestamp: { from: 0, to: 500 } }, { text: " is", timestamp: { from: 500, to: 800 } }, { text: " a", timestamp: { from: 800, to: 1000 } }, { text: " mock", timestamp: { from: 1000, to: 1500 } }, { text: " transcription.", timestamp: { from: 1500, to: 2000 } }, ], }, ], }), }; }); test("test me", async () => { const kokoro = await Kokoro.init("fp16"); const ffmpeg = await FFMpeg.init(); vi.spyOn(ffmpeg, "saveNormalizedAudio").mockResolvedValue("mocked-path.wav"); vi.spyOn(ffmpeg, "saveToMp3").mockResolvedValue("mocked-path.mp3"); const pexelsAPI = new PexelsAPI("mock-api-key"); vi.spyOn(pexelsAPI, "findVideo").mockResolvedValue({ id: "mock-video-id-1", url: "https://example.com/mock-video-1.mp4", width: 1080, height: 1920, }); const config = new Config(); const remotion = await Remotion.init(config); // control the render promise resolution let resolveRenderPromise: () => void; const renderPromiseMock: Promise<void> = new Promise((resolve) => { resolveRenderPromise = resolve; }); vi.spyOn(remotion, "render").mockReturnValue(renderPromiseMock); const whisper = await Whisper.init(config); vi.spyOn(whisper, "CreateCaption").mockResolvedValue([ { text: "This", startMs: 0, endMs: 500 }, { text: " is", startMs: 500, endMs: 800 }, { text: " a", startMs: 800, endMs: 1000 }, { text: " mock", startMs: 1000, endMs: 1500 }, { text: " transcription.", startMs: 1500, endMs: 2000 }, ]); const musicManager = new MusicManager(config); const shortCreator = new ShortCreator( config, remotion, kokoro, whisper, ffmpeg, pexelsAPI, musicManager, ); const videoId = shortCreator.addToQueue( [ { text: "test", searchTerms: ["test"], }, ], {}, ); // list videos while the video is being processed let videos = shortCreator.listAllVideos(); expect(videos.find((v) => v.id === videoId)?.status).toBe("processing"); // create the video file on the file system and check the status again fs.writeFileSync(shortCreator.getVideoPath(videoId), "mock video content"); videos = shortCreator.listAllVideos(); expect(videos.find((v) => v.id === videoId)?.status).toBe("processing"); // resolve the render promise to simulate the video being processed, and check the status again resolveRenderPromise(); await new Promise((resolve) => setTimeout(resolve, 100)); // let the queue process the video videos = shortCreator.listAllVideos(); expect(videos.find((v) => v.id === videoId)?.status).toBe("ready"); // check the status of the video directly const status = shortCreator.status(videoId); expect(status).toBe("ready"); }); ``` -------------------------------------------------------------------------------- /src/components/root/Root.tsx: -------------------------------------------------------------------------------- ```typescript import { CalculateMetadataFunction, Composition } from "remotion"; import { shortVideoSchema } from "../utils"; import { PortraitVideo } from "../videos/PortraitVideo"; import { LandscapeVideo } from "../videos/LandscapeVideo"; import { TestVideo } from "../videos/Test"; import z from "zod"; import { AvailableComponentsEnum } from "../types"; const FPS = 25; export const calculateMetadata: CalculateMetadataFunction< z.infer<typeof shortVideoSchema> > = async ({ props }) => { const durationInFrames = Math.floor((props.config.durationMs / 1000) * FPS); return { ...props, durationInFrames, }; }; export const RemotionRoot: React.FC = () => { return ( <> <Composition id={AvailableComponentsEnum.PortraitVideo} component={PortraitVideo} durationInFrames={30} fps={FPS} width={1080} height={1920} defaultProps={{ music: { url: "http://localhost:3123/api/music/" + encodeURIComponent( "Aurora on the Boulevard - National Sweetheart.mp3", ), file: "mellow-smooth-rap-beat-20230107-132480.mp3", start: 0, end: 175, }, scenes: [ { captions: [ { text: " Hello", startMs: 390, endMs: 990 }, { text: " World.", startMs: 990, endMs: 2000 }, ], video: "https://videos.pexels.com/video-files/4625747/4625747-hd_1080_1920_24fps.mp4", audio: { url: "http://localhost:3123/api/tmp/cma1lgean0001rlsi52b8h3n3.mp3", duration: 3.15, }, }, ], config: { durationMs: 4650, paddingBack: 1500, captionBackgroundColor: "blue", captionPosition: "bottom", }, }} calculateMetadata={calculateMetadata} /> <Composition id={AvailableComponentsEnum.LandscapeVideo} component={LandscapeVideo} durationInFrames={30} fps={FPS} width={1920} height={1080} defaultProps={{ music: { url: "http://localhost:3123/api/music/" + encodeURIComponent( "Aurora on the Boulevard - National Sweetheart.mp3", ), file: "mellow-smooth-rap-beat-20230107-132480.mp3", start: 0, end: 175, }, scenes: [ { captions: [ { text: " A", startMs: 110, endMs: 320, }, { text: " week", startMs: 320, endMs: 590, }, { text: " ago,", startMs: 590, endMs: 1220, }, { text: " a", startMs: 1220, endMs: 1280, }, { text: " friend", startMs: 1280, endMs: 1490, }, { text: " invited", startMs: 1490, endMs: 1820, }, { text: " a", startMs: 1820, endMs: 1880, }, { text: " couple", startMs: 1880, endMs: 2310, }, { text: " of", startMs: 2310, endMs: 2350, }, { text: " other", startMs: 2350, endMs: 2640, }, { text: " couples", startMs: 2640, endMs: 3080, }, { text: " over", startMs: 3080, endMs: 3400, }, { text: " for", startMs: 3400, endMs: 3620, }, { text: " dinner.", startMs: 3620, endMs: 4340, }, { text: " Eventually,", startMs: 4340, endMs: 5520, }, { text: " the", startMs: 5520, endMs: 5550, }, { text: " food,", startMs: 5550, endMs: 6300, }, { text: " but", startMs: 6300, endMs: 6360, }, { text: " not", startMs: 6360, endMs: 6540, }, { text: " the", startMs: 6540, endMs: 6780, }, { text: " wine,", startMs: 6780, endMs: 7210, }, { text: " was", startMs: 7210, endMs: 7400, }, { text: " cleared", startMs: 7400, endMs: 7870, }, { text: " off", startMs: 7870, endMs: 7980, }, { text: " the", startMs: 7980, endMs: 8180, }, { text: " table", startMs: 8180, endMs: 8480, }, { text: " for", startMs: 8480, endMs: 8770, }, { text: " what", startMs: 8770, endMs: 8880, }, { text: " turned", startMs: 8880, endMs: 9230, }, { text: " out", startMs: 9230, endMs: 9390, }, { text: " to", startMs: 9390, endMs: 9510, }, { text: " be", startMs: 9510, endMs: 9620, }, { text: " some", startMs: 9620, endMs: 9850, }, { text: " fierce", startMs: 9850, endMs: 10200, }, { text: " scrabbling.", startMs: 10200, endMs: 11000, }, ], video: "https://videos.pexels.com/video-files/1168989/1168989-hd_1920_1080_30fps.mp4", audio: { url: "http://localhost:3123/api/tmp/cma9ctvpo0001aqsia12i82db.mp3", duration: 12.8, }, }, ], config: { durationMs: 14300, paddingBack: 1500, captionBackgroundColor: "#ff0000", captionPosition: "center", }, }} calculateMetadata={calculateMetadata} /> <Composition id="TestVideo" component={TestVideo} durationInFrames={14} fps={23} width={100} height={100} /> </> ); }; ``` -------------------------------------------------------------------------------- /src/short-creator/ShortCreator.ts: -------------------------------------------------------------------------------- ```typescript import { OrientationEnum } from "./../types/shorts"; /* eslint-disable @remotion/deterministic-randomness */ import fs from "fs-extra"; import cuid from "cuid"; import path from "path"; import https from "https"; import http from "http"; import { Kokoro } from "./libraries/Kokoro"; import { Remotion } from "./libraries/Remotion"; import { Whisper } from "./libraries/Whisper"; import { FFMpeg } from "./libraries/FFmpeg"; import { PexelsAPI } from "./libraries/Pexels"; import { Config } from "../config"; import { logger } from "../logger"; import { MusicManager } from "./music"; import type { SceneInput, RenderConfig, Scene, VideoStatus, MusicMoodEnum, MusicTag, MusicForVideo, } from "../types/shorts"; export class ShortCreator { private queue: { sceneInput: SceneInput[]; config: RenderConfig; id: string; }[] = []; constructor( private config: Config, private remotion: Remotion, private kokoro: Kokoro, private whisper: Whisper, private ffmpeg: FFMpeg, private pexelsApi: PexelsAPI, private musicManager: MusicManager, ) {} public status(id: string): VideoStatus { const videoPath = this.getVideoPath(id); if (this.queue.find((item) => item.id === id)) { return "processing"; } if (fs.existsSync(videoPath)) { return "ready"; } return "failed"; } public addToQueue(sceneInput: SceneInput[], config: RenderConfig): string { // todo add mutex lock const id = cuid(); this.queue.push({ sceneInput, config, id, }); if (this.queue.length === 1) { this.processQueue(); } return id; } private async processQueue(): Promise<void> { // todo add a semaphore if (this.queue.length === 0) { return; } const { sceneInput, config, id } = this.queue[0]; logger.debug( { sceneInput, config, id }, "Processing video item in the queue", ); try { await this.createShort(id, sceneInput, config); logger.debug({ id }, "Video created successfully"); } catch (error: unknown) { logger.error(error, "Error creating video"); } finally { this.queue.shift(); this.processQueue(); } } private async createShort( videoId: string, inputScenes: SceneInput[], config: RenderConfig, ): Promise<string> { logger.debug( { inputScenes, config, }, "Creating short video", ); const scenes: Scene[] = []; let totalDuration = 0; const excludeVideoIds = []; const tempFiles = []; const orientation: OrientationEnum = config.orientation || OrientationEnum.portrait; let index = 0; for (const scene of inputScenes) { const audio = await this.kokoro.generate( scene.text, config.voice ?? "af_heart", ); let { audioLength } = audio; const { audio: audioStream } = audio; // add the paddingBack in seconds to the last scene if (index + 1 === inputScenes.length && config.paddingBack) { audioLength += config.paddingBack / 1000; } const tempId = cuid(); const tempWavFileName = `${tempId}.wav`; const tempMp3FileName = `${tempId}.mp3`; const tempVideoFileName = `${tempId}.mp4`; const tempWavPath = path.join(this.config.tempDirPath, tempWavFileName); const tempMp3Path = path.join(this.config.tempDirPath, tempMp3FileName); const tempVideoPath = path.join( this.config.tempDirPath, tempVideoFileName, ); tempFiles.push(tempVideoPath); tempFiles.push(tempWavPath, tempMp3Path); await this.ffmpeg.saveNormalizedAudio(audioStream, tempWavPath); const captions = await this.whisper.CreateCaption(tempWavPath); await this.ffmpeg.saveToMp3(audioStream, tempMp3Path); const video = await this.pexelsApi.findVideo( scene.searchTerms, audioLength, excludeVideoIds, orientation, ); logger.debug(`Downloading video from ${video.url} to ${tempVideoPath}`); await new Promise<void>((resolve, reject) => { const fileStream = fs.createWriteStream(tempVideoPath); https .get(video.url, (response: http.IncomingMessage) => { if (response.statusCode !== 200) { reject( new Error(`Failed to download video: ${response.statusCode}`), ); return; } response.pipe(fileStream); fileStream.on("finish", () => { fileStream.close(); logger.debug(`Video downloaded successfully to ${tempVideoPath}`); resolve(); }); }) .on("error", (err: Error) => { fs.unlink(tempVideoPath, () => {}); // Delete the file if download failed logger.error(err, "Error downloading video:"); reject(err); }); }); excludeVideoIds.push(video.id); scenes.push({ captions, video: `http://localhost:${this.config.port}/api/tmp/${tempVideoFileName}`, audio: { url: `http://localhost:${this.config.port}/api/tmp/${tempMp3FileName}`, duration: audioLength, }, }); totalDuration += audioLength; index++; } if (config.paddingBack) { totalDuration += config.paddingBack / 1000; } const selectedMusic = this.findMusic(totalDuration, config.music); logger.debug({ selectedMusic }, "Selected music for the video"); await this.remotion.render( { music: selectedMusic, scenes, config: { durationMs: totalDuration * 1000, paddingBack: config.paddingBack, ...{ captionBackgroundColor: config.captionBackgroundColor, captionPosition: config.captionPosition, }, musicVolume: config.musicVolume, }, }, videoId, orientation, ); for (const file of tempFiles) { fs.removeSync(file); } return videoId; } public getVideoPath(videoId: string): string { return path.join(this.config.videosDirPath, `${videoId}.mp4`); } public deleteVideo(videoId: string): void { const videoPath = this.getVideoPath(videoId); fs.removeSync(videoPath); logger.debug({ videoId }, "Deleted video file"); } public getVideo(videoId: string): Buffer { const videoPath = this.getVideoPath(videoId); if (!fs.existsSync(videoPath)) { throw new Error(`Video ${videoId} not found`); } return fs.readFileSync(videoPath); } private findMusic(videoDuration: number, tag?: MusicMoodEnum): MusicForVideo { const musicFiles = this.musicManager.musicList().filter((music) => { if (tag) { return music.mood === tag; } return true; }); return musicFiles[Math.floor(Math.random() * musicFiles.length)]; } public ListAvailableMusicTags(): MusicTag[] { const tags = new Set<MusicTag>(); this.musicManager.musicList().forEach((music) => { tags.add(music.mood as MusicTag); }); return Array.from(tags.values()); } public listAllVideos(): { id: string; status: VideoStatus }[] { const videos: { id: string; status: VideoStatus }[] = []; // Check if videos directory exists if (!fs.existsSync(this.config.videosDirPath)) { return videos; } // Read all files in the videos directory const files = fs.readdirSync(this.config.videosDirPath); // Filter for MP4 files and extract video IDs for (const file of files) { if (file.endsWith(".mp4")) { const videoId = file.replace(".mp4", ""); let status: VideoStatus = "ready"; const inQueue = this.queue.find((item) => item.id === videoId); if (inQueue) { status = "processing"; } videos.push({ id: videoId, status }); } } // Add videos that are in the queue but not yet rendered for (const queueItem of this.queue) { const existingVideo = videos.find((v) => v.id === queueItem.id); if (!existingVideo) { videos.push({ id: queueItem.id, status: "processing" }); } } return videos; } public ListAvailableVoices(): string[] { return this.kokoro.listAvailableVoices(); } } ``` -------------------------------------------------------------------------------- /src/ui/pages/VideoCreator.tsx: -------------------------------------------------------------------------------- ```typescript import React, { useState, useEffect } from "react"; import axios from "axios"; import { useNavigate } from "react-router-dom"; import { Box, Button, TextField, Typography, Paper, Grid, FormControl, InputLabel, Select, MenuItem, CircularProgress, Alert, IconButton, Divider, InputAdornment, } from "@mui/material"; import AddIcon from "@mui/icons-material/Add"; import DeleteIcon from "@mui/icons-material/Delete"; import { SceneInput, RenderConfig, MusicMoodEnum, CaptionPositionEnum, VoiceEnum, OrientationEnum, MusicVolumeEnum, } from "../../types/shorts"; interface SceneFormData { text: string; searchTerms: string; // Changed to string } const VideoCreator: React.FC = () => { const navigate = useNavigate(); const [scenes, setScenes] = useState<SceneFormData[]>([ { text: "", searchTerms: "" }, ]); const [config, setConfig] = useState<RenderConfig>({ paddingBack: 1500, music: MusicMoodEnum.chill, captionPosition: CaptionPositionEnum.bottom, captionBackgroundColor: "blue", voice: VoiceEnum.af_heart, orientation: OrientationEnum.portrait, musicVolume: MusicVolumeEnum.high, }); const [loading, setLoading] = useState(false); const [error, setError] = useState<string | null>(null); const [voices, setVoices] = useState<VoiceEnum[]>([]); const [musicTags, setMusicTags] = useState<MusicMoodEnum[]>([]); const [loadingOptions, setLoadingOptions] = useState(true); useEffect(() => { const fetchOptions = async () => { try { const [voicesResponse, musicResponse] = await Promise.all([ axios.get("/api/voices"), axios.get("/api/music-tags"), ]); setVoices(voicesResponse.data); setMusicTags(musicResponse.data); } catch (err) { console.error("Failed to fetch options:", err); setError( "Failed to load voices and music options. Please refresh the page.", ); } finally { setLoadingOptions(false); } }; fetchOptions(); }, []); const handleAddScene = () => { setScenes([...scenes, { text: "", searchTerms: "" }]); }; const handleRemoveScene = (index: number) => { if (scenes.length > 1) { const newScenes = [...scenes]; newScenes.splice(index, 1); setScenes(newScenes); } }; const handleSceneChange = ( index: number, field: keyof SceneFormData, value: string, ) => { const newScenes = [...scenes]; newScenes[index] = { ...newScenes[index], [field]: value }; setScenes(newScenes); }; const handleConfigChange = (field: keyof RenderConfig, value: any) => { setConfig({ ...config, [field]: value }); }; const handleSubmit = async (e: React.FormEvent) => { e.preventDefault(); setLoading(true); setError(null); try { // Convert scenes to the expected API format const apiScenes: SceneInput[] = scenes.map((scene) => ({ text: scene.text, searchTerms: scene.searchTerms .split(",") .map((term) => term.trim()) .filter((term) => term.length > 0), })); const response = await axios.post("/api/short-video", { scenes: apiScenes, config, }); navigate(`/video/${response.data.videoId}`); } catch (err) { setError("Failed to create video. Please try again."); console.error(err); } finally { setLoading(false); } }; if (loadingOptions) { return ( <Box display="flex" justifyContent="center" alignItems="center" height="80vh" > <CircularProgress /> </Box> ); } return ( <Box maxWidth="md" mx="auto" py={4}> <Typography variant="h4" component="h1" gutterBottom> Create New Video </Typography> {error && ( <Alert severity="error" sx={{ mb: 3 }}> {error} </Alert> )} <form onSubmit={handleSubmit}> <Typography variant="h5" component="h2" gutterBottom> Scenes </Typography> {scenes.map((scene, index) => ( <Paper key={index} sx={{ p: 3, mb: 3 }}> <Box display="flex" justifyContent="space-between" alignItems="center" mb={2} > <Typography variant="h6">Scene {index + 1}</Typography> {scenes.length > 1 && ( <IconButton onClick={() => handleRemoveScene(index)} color="error" size="small" > <DeleteIcon /> </IconButton> )} </Box> <Grid container spacing={3}> <Grid item xs={12}> <TextField fullWidth label="Text" multiline rows={4} value={scene.text} onChange={(e) => handleSceneChange(index, "text", e.target.value) } required /> </Grid> <Grid item xs={12}> <TextField fullWidth label="Search Terms (comma-separated)" value={scene.searchTerms} onChange={(e) => handleSceneChange(index, "searchTerms", e.target.value) } helperText="Enter keywords for background video, separated by commas" required /> </Grid> </Grid> </Paper> ))} <Box display="flex" justifyContent="center" mb={4}> <Button variant="outlined" startIcon={<AddIcon />} onClick={handleAddScene} > Add Scene </Button> </Box> <Divider sx={{ mb: 4 }} /> <Typography variant="h5" component="h2" gutterBottom> Video Configuration </Typography> <Paper sx={{ p: 3, mb: 3 }}> <Grid container spacing={3}> <Grid item xs={12} sm={6}> <TextField fullWidth type="number" label="End Screen Padding (ms)" value={config.paddingBack} onChange={(e) => handleConfigChange("paddingBack", parseInt(e.target.value)) } InputProps={{ endAdornment: ( <InputAdornment position="end">ms</InputAdornment> ), }} helperText="Duration to keep playing after narration ends" required /> </Grid> <Grid item xs={12} sm={6}> <FormControl fullWidth> <InputLabel>Music Mood</InputLabel> <Select value={config.music} onChange={(e) => handleConfigChange("music", e.target.value)} label="Music Mood" required > {Object.values(MusicMoodEnum).map((tag) => ( <MenuItem key={tag} value={tag}> {tag} </MenuItem> ))} </Select> </FormControl> </Grid> <Grid item xs={12} sm={6}> <FormControl fullWidth> <InputLabel>Caption Position</InputLabel> <Select value={config.captionPosition} onChange={(e) => handleConfigChange("captionPosition", e.target.value) } label="Caption Position" required > {Object.values(CaptionPositionEnum).map((position) => ( <MenuItem key={position} value={position}> {position} </MenuItem> ))} </Select> </FormControl> </Grid> <Grid item xs={12} sm={6}> <TextField fullWidth label="Caption Background Color" value={config.captionBackgroundColor} onChange={(e) => handleConfigChange("captionBackgroundColor", e.target.value) } helperText="Any valid CSS color (name, hex, rgba)" required /> </Grid> <Grid item xs={12} sm={6}> <FormControl fullWidth> <InputLabel>Default Voice</InputLabel> <Select value={config.voice} onChange={(e) => handleConfigChange("voice", e.target.value)} label="Default Voice" required > {Object.values(VoiceEnum).map((voice) => ( <MenuItem key={voice} value={voice}> {voice} </MenuItem> ))} </Select> </FormControl> </Grid> <Grid item xs={12} sm={6}> <FormControl fullWidth> <InputLabel>Orientation</InputLabel> <Select value={config.orientation} onChange={(e) => handleConfigChange("orientation", e.target.value) } label="Orientation" required > {Object.values(OrientationEnum).map((orientation) => ( <MenuItem key={orientation} value={orientation}> {orientation} </MenuItem> ))} </Select> </FormControl> </Grid> <Grid item xs={12} sm={6}> <FormControl fullWidth> <InputLabel>Volume of the background audio</InputLabel> <Select value={config.musicVolume} onChange={(e) => handleConfigChange("musicVolume", e.target.value) } label="Volume of the background audio" required > {Object.values(MusicVolumeEnum).map((voice) => ( <MenuItem key={voice} value={voice}> {voice} </MenuItem> ))} </Select> </FormControl> </Grid> </Grid> </Paper> <Box display="flex" justifyContent="center"> <Button type="submit" variant="contained" color="primary" size="large" disabled={loading} sx={{ minWidth: 200 }} > {loading ? ( <CircularProgress size={24} color="inherit" /> ) : ( "Create Video" )} </Button> </Box> </form> </Box> ); }; export default VideoCreator; ```