You've watched the demos. The AI creates stunning videos from simple descriptions. So you type your prompt, hit generate, and get... something that looks nothing like what you imagined.
Welcome to the prompt engineering gap.
The difference between "a beautiful sunset video" and getting an actually beautiful sunset video is prompt engineering—the skill of translating your creative vision into language AI can interpret correctly.
This guide teaches you the systematic approach professionals use to write prompts that work consistently.
The Anatomy of an Effective Prompt
Every strong AI video prompt contains these components:
1. Subject (Who/What)
The main focus of your video. Be specific.
- ❌ "A person walking"
- ✅ "A young woman in a red coat walking"
2. Action (What's happening)
The motion, activity, or change over time. Video needs movement.
- ❌ "A coffee shop"
- ✅ "Steam rising from a coffee cup on a wooden table"
3. Environment (Where)
The setting that contains your subject. Context matters.
- ❌ "Outside"
- ✅ "Narrow cobblestone alley in evening rain"
4. Lighting (How it's lit)
Light defines mood more than any other element.
- ❌ (omitted entirely)
- ✅ "Warm golden hour sunlight through window blinds"
5. Camera (How we see it)
Perspective, movement, and framing.
- ❌ (omitted entirely)
- ✅ "Close-up shot, slow push-in, shallow depth of field"
6. Style (What it looks like)
Aesthetic, reference, quality level.
- ❌ "Make it look good"
- ✅ "Cinematic film look, 35mm film grain, muted color palette"
The Prompt Formula
Combine the elements into a structure:
[Camera] of [Subject] [Action] in [Environment], [Lighting], [Style]
Example:
Slow tracking shot of a vintage red bicycle leaning against a stone wall
as autumn leaves fall gently around it, soft overcast natural lighting,
nostalgic film photography aesthetic, 35mm grain, muted warm tones
This single sentence gives the AI:
- Camera: Slow tracking shot
- Subject: Vintage red bicycle
- Action: Leaves falling (implied motion)
- Environment: Stone wall, autumn setting
- Lighting: Soft overcast natural
- Style: Film photography, grain, muted warm
Camera Language That Works
AI models understand cinematography terms. Use them.
Shot Types:
| Term | What it means |
|---|---|
| Extreme close-up (ECU) | Fill frame with detail (eyes, texture) |
| Close-up | Face or object fills frame |
| Medium shot | Waist up, person-sized framing |
| Wide shot | Full body, environment visible |
| Extreme wide shot | Vast landscape, subject small |
| Over-the-shoulder (OTS) | Behind one subject, looking at another |
| POV shot | First-person perspective |
| Bird's eye | Looking straight down from above |
| Low angle | Looking up at subject |
| Dutch angle | Tilted frame for tension |
Camera Movements:
| Term | What it means |
|---|---|
| Static/locked off | No camera movement |
| Pan | Camera rotates left/right on tripod |
| Tilt | Camera rotates up/down on tripod |
| Dolly/push in | Camera moves toward subject |
| Pull back | Camera moves away from subject |
| Tracking shot | Camera moves alongside subject |
| Crane shot | Camera rises or lowers |
| Handheld | Subtle organic movement |
| Steadicam | Smooth floating movement |
| Orbit | Camera circles around subject |
Depth of Field:
| Term | What it means |
|---|---|
| Shallow DOF | Subject sharp, background blurred |
| Deep focus | Everything in focus |
| Rack focus | Focus shifts between subjects |
| Bokeh | Beautiful blur circles in background |
Lighting Terms That Transform Results
Lighting is where amateur prompts fail. Learn these:
Natural Light:
- Golden hour — Warm, orange, dramatic shadows
- Blue hour — Cool, twilight atmosphere
- Overcast — Soft, even, diffused
- Harsh midday sun — Strong shadows, high contrast
- Dappled light — Light through leaves/patterns
- Backlit — Subject in silhouette, light behind
Artificial Light:
- Tungsten — Warm orange indoor light
- Fluorescent — Cool greenish tint
- Neon — Colorful city night aesthetic
- Candlelight — Warm, flickering, intimate
- Practical lighting — Visible light sources in frame
- Studio lighting — Clean, professional, controlled
Lighting Setups:
- Key light only — Single dramatic source
- Three-point lighting — Classic balanced setup
- Rim lighting — Edge light separating subject from background
- Chiaroscuro — High contrast, dramatic shadows
- High key — Bright, minimal shadows
- Low key — Dark, moody, heavy shadows
Style Anchors: Reference What Works
Instead of vague adjectives, anchor to known references:
Film Stock References:
- "Kodak Portra 400 color palette" — Warm, slightly muted
- "Kodak Ektar 100" — Vivid, saturated
- "Fuji Velvia" — Punchy, high contrast
- "Cinestill 800T" — Tungsten balanced, halation around lights
- "35mm film grain texture"
Era/Decade References:
- "1970s film aesthetic" — Warm, organic, textured
- "1980s VHS look" — Scan lines, color bleed
- "1990s indie film" — Natural, documentary feel
- "2000s digital video" — Clean, slightly flat
Director/Cinematographer References:
- "Roger Deakins cinematography" — Precise, naturalistic
- "Emmanuel Lubezki natural light" — Organic, atmospheric
- "Wes Anderson symmetry" — Centered, pastel, precise
- "David Fincher aesthetic" — Dark, desaturated, precise
Genre References:
- "Film noir lighting" — High contrast, shadows, Venetian blinds
- "Documentary style" — Natural, observational
- "Music video aesthetic" — Stylized, dynamic
- "Commercial photography" — Clean, polished
- "Indie film look" — Natural, authentic
Model-Specific Prompting
Different models interpret prompts differently. Here's what works:
Veo 3.2:
- Responds well to cinematography terms
- Excels with naturalistic lighting descriptions
- Handles complex environments well
- Tip: Be specific about camera behavior
Sora 2:
- Strong with narrative descriptions
- Understands temporal language ("begins with... then...")
- Good at physics and causality
- Tip: Describe the story, not just the image
Kling 2.0:
- Faster generation, good for iteration
- Responds to simple, clear prompts
- Good motion quality with action verbs
- Tip: Focus on one clear action
Runway Gen-4:
- Excellent style transfer from references
- Strong character consistency
- Good with abstract concepts
- Tip: Include style references
With 160+ models on aiVideo.fm:
Test the same prompt across 4-5 models. You'll quickly learn which models interpret your specific style of prompting best. Some creators find one model "gets them" and becomes their default.
The Iteration Method
Professional prompt engineering is iterative, not one-shot.
Step 1: Start broad
Begin with a simple prompt covering the basics.
A woman walking through a busy city street at night
Step 2: Evaluate the output
What did the AI get right? What's missing? What's wrong?
"The woman looks good, but the city feels generic and the lighting is flat."
Step 3: Add specificity to weak areas
Don't rewrite everything—target what's missing.
A woman in a long coat walking through a busy Tokyo street at night,
neon signs reflecting on wet pavement, shallow depth of field
Step 4: Refine style and mood
Once the basics work, add atmosphere.
A woman in a long coat walking through a busy Tokyo street at night,
neon signs reflecting on wet pavement, shallow depth of field,
cinematic film look, Wong Kar-wai color palette, moody atmosphere
Step 5: Lock and iterate
Found a prompt that works? Create variations:
- Change one variable at a time
- Generate 3-4 versions and pick the best
- Build a library of working prompts for your style
Common Prompt Mistakes
❌ Too vague
"A beautiful sunset" gives the AI nothing to work with.
❌ Too long and contradictory
"Bright sunny day with dark moody shadows in a happy sad melancholy scene" confuses the model.
❌ Listing everything
"A cat, dog, bird, fish, horse, and elephant in a park" usually produces a mess.
❌ Impossible physics
"Camera zooms in while pulling back and panning left" creates conflicting motion.
❌ Text and logos
Most models struggle with readable text and specific brand elements.
❌ Forgetting motion
Static description = static video. Include action, change, or camera movement.
Prompt Templates You Can Use
Cinematic Scene Template:
[Shot type] of [subject with details] [action verb-ing] in/at [specific location],
[time of day] [lighting type], [camera movement],
[film stock/style reference], [mood adjectives]
Product Shot Template:
[Product] [action: rotating/floating/sliding] against [background],
[lighting setup], professional commercial photography,
clean composition, [color palette]
Atmospheric B-Roll Template:
[Environmental detail] [subtle motion: swaying/falling/drifting],
[natural lighting description], [depth of field],
[style: documentary/cinematic/artistic], ambient mood
Character Moment Template:
[Shot type] of [character description] [emotion/action],
[environment context], [lighting], [camera behavior],
[style reference], intimate/dramatic/contemplative atmosphere
Testing Framework
When developing a new prompt style, use this framework:
Test across models
Same prompt on 4-5 different models shows you how interpretations vary.
Test with one change
Create variations changing only one element. This teaches you what moves the needle.
Base: Close-up of coffee being poured, morning light
Variations:
- Change subject:
Close-up of tea being poured, morning light - Change lighting:
Close-up of coffee being poured, neon bar lighting - Change camera:
Wide shot of coffee being poured, morning light - Change style:
Close-up of coffee being poured, morning light, 1970s film stock
Build a prompt library
Save prompts that work. Organize by:
- Style (cinematic, documentary, commercial)
- Subject (people, products, environments)
- Mood (dramatic, peaceful, energetic)
- Model (which AI this prompt works best with)
FAQ
How long should my prompt be?
Generally 2-4 sentences. Long enough to be specific, short enough to not contradict yourself. Most models handle 75-150 words well.
Should I use commas or periods?
Commas for listing related attributes, periods for separating distinct concepts. Both work—consistency matters more than which you choose.
Why does the same prompt give different results?
AI models have randomness (stochasticity). The same prompt can produce different outputs. Generate 3-4 times and pick the best.
How do I get the same character in multiple shots?
Use image-to-video workflow: generate the perfect character image first, then animate it. For pure text-to-video, use extremely specific character descriptions and accept some variation.
Start engineering better prompts
Prompt engineering is a skill—it improves with practice. The fastest way to improve is to:
- Generate often — Volume teaches patterns
- Compare across models — See how interpretations differ
- Iterate systematically — Change one thing at a time
- Build your library — Save what works
aiVideo.fm gives you:
- 160+ models to test prompts across
- Side-by-side comparison to see which interprets best
- Fast generation for high-volume iteration
- Director Studio to sequence your best results
Start experimenting free — Test your prompts across 160+ models.
Related guides: Beginner's Guide to AI Video Generation | Text-to-Video vs Image-to-Video | The Art of Happy Accidents
