AI video tools

Course 2 · Ch 8

AI Video Generation Tools

Sora, Runway, Pika, Kling — what they can generate, what they cost, and how to use them in a real YouTube workflow

AI video generation moved from research curiosity to production-usable tool faster than almost anyone expected. You can now generate seconds to minutes of photorealistic or stylised video from a text prompt, extend existing footage, animate still images, and apply cinematic effects — entirely without a camera. This chapter covers the four tools that matter most for YouTube creators right now, what each one is actually good at, and how to integrate them without letting AI become a crutch that makes your content generic.

This space moves fast

AI video generation is evolving faster than any other tool category in this course. Pricing, quality benchmarks, and available features will have changed since this chapter was written. Treat the capability comparisons here as directional — always run your own tests before committing to a paid plan.

The Four Tools That Matter for YouTube Creators

🌐

Sora (OpenAI)

Text-to-video · Image-to-video · Video extension

ChatGPT Pro / Plus

OpenAI's video model — the most-hyped entry in this category. Generates up to 20-second clips at 1080p from text or image prompts, with strong physical realism and cinematic composition. Available inside ChatGPT on Pro (~£180/mo) and Plus (~£18/mo) plans, with usage credits determining how many generations you get per month.

Best for: physically realistic scenes, cinematic establishing shots, documentary-style B-roll, slow-motion nature footage.

Credit limits on Plus are tight for production use. Hands, complex motion, and text in-scene are still unreliable.

Text-to-video Image-to-video 1080p output 20s max (Plus) No API yet

🎬

Runway Gen-3 Alpha

Text-to-video · Image-to-video · Video-to-video · Effects

Free trial / from ~£12/mo

The most complete production toolkit in this category. Runway goes beyond raw video generation — it includes motion brush (animate specific regions of an image), act-one (animate a face with your own expressions via webcam), background removal, green-screen removal, and lip sync. The workflow is designed to integrate with real editing projects rather than just generate standalone clips.

Best for: creators who want to integrate AI video into an existing production pipeline. Motion brush and act-one have no equivalent in competing tools.

Generation credits are consumed quickly. The Standard plan (~£12/mo) gives 625 credits — roughly 100 × 5-second clips.

Motion brush Act-one face BG removal Lip sync API access

✨

Pika Labs

Text-to-video · Image-to-video · Style transfer

Free / from ~£8/mo

The most accessible entry point in the category — quick to generate, generous free tier, and particularly strong at stylised and artistic outputs (anime, painterly, cel-shaded). Pikaffects add physics-based animations (explode, melt, deflate, cake-ify) to still images — quirky but genuinely fun for creative transitions and comedic B-roll. Lower realism ceiling than Runway or Kling for photorealistic scenes.

Best for: stylised content, animated B-roll, comedic/creative channels, budget-conscious creators who want to experiment.

Photorealistic outputs lag behind Runway and Kling. Best thought of as a creative effect tool rather than a cinematic realism tool.

Free tier Pikaffects Anime style Lower realism Fast gen

🐉

Kling AI (Kuaishou)

Text-to-video · Image-to-video · Long-form generation

Free tier / from ~£8/mo

Developed by Kuaishou (China's TikTok competitor), Kling consistently rivals or exceeds Runway in photorealism benchmarks at a lower price point. Uniquely generates clips up to 3 minutes long — far beyond the 5–20 second limits of most competitors. Strong motion coherence, especially for human subjects and hands. Rapidly closing the gap with Western tools on every metric.

Best for: photorealistic B-roll at lower cost, longer AI-generated sequences, human subject animation where hand accuracy matters.

Data residency on Chinese infrastructure — consider whether this is an issue for your use case.

Up to 3 min High realism Good hands Generous free China servers

Head-to-Head Capability Comparison

Feature	Sora	Runway Gen-3	Pika	Kling
Max clip length	20s	10s	10s	3 min
Max resolution	1080p	1080p	720p–1080p	1080p
Photorealism	Excellent	Excellent	Good (stylised)	Excellent
Human/hand accuracy	Moderate	Moderate	Weak	Strong
Stylised / anime output	Moderate	Moderate	Excellent	Good
Image-to-video	✓	✓	✓	✓
Video-to-video effects	✗	✓	Limited	Limited
Motion brush (selective)	✗	✓	✗	✗
Face animation (act-one)	✗	✓	✗	✗
Free tier	✗	Trial only	✓	✓
Entry paid price	~£18/mo	~£12/mo	~£8/mo	~£8/mo
API access	TBA	✓	TBA	Limited

How to Actually Use These in a YouTube Video

The real question isn't which tool is technically best — it's what role AI-generated footage plays in your videos. Almost no successful creator uses AI video as the main footage. The value is in specific roles within a human-led production.

🎥

B-roll for inaccessible scenes

Runway · Sora · Kling

Footage you couldn't realistically film — space, historical events, extreme weather, abstract concepts. AI generates it in seconds; you cut it as B-roll over your voiceover.

🖼️

Animating still images

Pika · Runway · Kling

Upload a photo or AI-generated image and bring it to life — camera pan, subtle movement, environmental motion (rain, fire, leaves). Turns static graphics into engaging B-roll.

🌅

Establishing shots & intros

Sora · Runway · Kling

Cinematic wide shots, time-lapses, slow-motion nature scenes. AI handles the scenes that would need a film crew or drone licence to capture practically.

😂

Comedic effects & transitions

Pika (Pikaffects)

Explode, melt, deflate, cake-ify. Applied to your own face or a relevant object, Pikaffects create the kind of absurd visual gag that gets clipped and shared independently.

🎞️

Visualising abstract concepts

Any tool

Economics, philosophy, psychology — topics with no obvious visual. AI can generate impressionistic or abstract footage that gives the eye something to follow while the voice carries the content.

🌍

Location footage you can't visit

Sora · Kling · Runway

Talking about a city, landmark, or landscape you can't film yourself? Generate a plausible visual instead of reusing Creative Commons stock for the hundredth time.

👤

Animated fictional characters

Runway Act-One · Kling

Animate a character or illustration with your own facial expressions. Storytelling, educational content, and fiction channels can put a face on their content without filming themselves.

🖥️

Product mock-ups & concept visuals

Pika · Runway · Sora

Tech, gadget, and review channels can generate concept imagery of unreleased products or visualise features that don't yet exist in consumer form.

Writing Prompts That Get Usable Results

AI video generation is extremely sensitive to prompt structure. Vague prompts produce generic, unusable clips. Structured prompts with specific cinematographic language produce clips that actually cut into a real video.

Weak prompt

a city at night

Result: generic stock-footage-style clip with no consistent style, framing, or atmosphere. Could have come from anywhere.

Strong prompt — cinematographic structure

Slow push-in on a rain-soaked Tokyo street at 2am. Neon reflections on wet asphalt. Empty except for one figure with an umbrella walking away from camera. Anamorphic lens flare from a red traffic light. Cinematic, moody, desaturated colour grade. No camera shake.

Specifies: shot movement, location, time of day, atmosphere, subject, camera perspective, lens character, colour treatment, and what to avoid. The resulting clip will actually match your video's tone.

Prompt structure cheat sheet

[Shot type + movement] of [subject] [doing what] in [location/environment]. [Time of day / lighting]. [Atmosphere / mood]. [Lens / camera style]. [Colour / grade]. [What to avoid or exclude].

Not every element is needed for every clip — but including more specifics always outperforms leaving things to the model's defaults.

Useful cinematographic terms to include in prompts

Shot movement: slow push-in, dolly back, crane shot, static locked-off, aerial descending, handheld follow
Lens character: anamorphic lens flare, shallow depth of field, 35mm film grain, tilt-shift, fisheye
Lighting: golden hour, overcast diffused, neon-lit, candlelit, harsh midday sun, blue-hour, studio three-point
Colour: desaturated, high-contrast, warm tones, teal and orange grade, monochrome, vibrant saturated
Atmosphere: cinematic, documentary style, dreamlike, gritty, ethereal, hyper-real

Current Limitations — Know Before You Rely On It

Text in scene. Every current model struggles with readable text inside generated video. Logos, signs, and captions will be garbled. Add text in your NLE after generation, not via prompt.
Hands and fingers. Still a weak point across the board, though Kling has the strongest performance. Close-ups of hands are a lottery — use medium shots or wider framing.
Consistent characters across clips. If you generate multiple clips featuring the same character, they won't look identical unless the platform has character consistency features (Runway has limited support). For multi-clip storylines, generate a reference image first and use image-to-video for all subsequent clips.
Physics for complex interactions. Liquid pouring, cloth simulation, and crowd scenes are improving but still produce artefacts on close inspection. Wide shots forgive more than close-ups.
Duration. Most tools cap at 5–20 seconds per generation (Kling's 3-minute capability is exceptional). Long sequences require multiple generations, which must be cut together — not always seamless.
Cost at scale. Credit limits are fine for B-roll. If you plan to use AI video heavily, map out your monthly generation needs against plan costs before committing — it adds up faster than expected.

Content policy and disclosure

All four tools prohibit generating realistic depictions of real people without consent, deepfakes, and harmful or illegal content. YouTube's own policy requires disclosure when AI-generated content could be mistaken for real footage of real events or real people — this applies to AI video in news, political, or documentary contexts. A label like "AI-generated B-roll" in your description is good practice regardless. Using AI video as creative B-roll for clearly non-deceptive content (sci-fi visuals, abstract backgrounds, fictional scenes) carries no disclosure obligation, but transparency always builds more trust than it costs.

Recommended Starter Workflow

Identify your B-roll gaps first. Edit your main footage and note everywhere you need coverage you don't have — then generate specifically for those gaps. Don't generate speculatively and try to find a use for it afterwards.
Start with Kling or Pika free tiers. Both let you generate without a credit card. Get comfortable with prompt-writing before spending anything.
Use image-to-video for more control. Generate a still image with Midjourney or ideogram first to nail the look, then animate it. This gives you far more control over the visual result than pure text-to-video.
Generate 3–5 variations of each clip. AI generation is non-deterministic — the same prompt produces different results each time. Always run multiple generations and pick the best one.
Keep clips short and cut frequently. AI artefacts are less visible in 2–4 second cuts than in 8–10 second holds. Faster editing rhythm also keeps AI footage from drawing too much attention to itself.
Grade to match your existing footage. AI video has its own default look. Apply the same LUT or grade you use on your camera footage so AI B-roll doesn't feel visually disconnected from the rest of the video.

The two-tool setup that covers most needs

If you want to start with AI video and not overthink the tool choice: Kling for photorealistic B-roll (best quality per pound, generous free tier, longest clips), and Pika for stylised or comedic clips and animating stills. Between those two free tiers, you can experiment extensively before committing to a paid plan. Add Runway if you need motion brush or act-one specifically.

Chapter 8 Quick Reference

Sora: Best realism, 20s max, requires ChatGPT Plus/Pro (~£18–£180/mo) — no free tier
Runway Gen-3: Most complete toolkit (motion brush, act-one, lip sync, BG removal) — from ~£12/mo
Pika: Best for stylised/anime, Pikaffects (explode/melt), fastest to start — free tier available
Kling: Best photorealism per pound, up to 3-min clips, strong on hands — free tier + ~£8/mo
Best starter combo: Kling (realism) + Pika (style) on free tiers
Add Runway if you need: motion brush, face animation (act-one), or API access
Prompt structure: [shot movement] + [subject + action] + [location] + [lighting] + [lens/grade] + [exclude]
Avoid in prompts: readable text in scene, extreme close-ups of hands, long unbroken takes
For character consistency: generate still image first, then image-to-video for all clips
Grade AI footage to match your camera LUT — raw AI video has a distinctive default look
Disclosure rule: Label AI B-roll if it depicts real events/people; creative B-roll is fine without it