There's a trick that fixes this, and almost nobody is using it: start frame + end frame.
You generate the first frame of your video. You generate the last frame. Then you hand both to a video model that supports keyframe interpolation, and it builds a coherent motion between the two anchors. The model doesn't drift, because it can't — it has to land on the end frame you gave it.
I used this technique to make a 10-second reel of a feminine cyborg bursting out of a computer monitor. Two keyframes, one video gen, one audio mux, one CTA overlay. About 10 minutes of work, about $0.80 in credits. Here's the full setup.
Watch it on YouTube Shorts: youtu.be/-3LRsLB32z8
The stack
- Higgsfield (image gen via MCP) —
nano_bananafor the keyframes - Higgsfield seedance_2_0 — video model with start+end frame support
- Apify Instagram Scraper — to download the original audio from a reference reel
- ffmpeg — to mux audio onto the rendered video and overlay the CTA
- Playwright — to render the Nanoflow CTA pill as a transparent PNG
Total third-party cost: roughly $0.80 in Higgsfield credits (two stills + one 10s 1080p video at standard mode). Apify, ffmpeg, and Playwright are free.
Step 1: Generate the start frame
The start frame establishes everything: the character, the room, the framing, the lighting. Spend your prompt budget here.
For a cyborg-trapped-in-monitor shot, I gave nano_banana this prompt at 9:16:
Cinematic photoreal 9:16 vertical photograph, 35mm with shallow
depth of field, moody high-contrast lighting. Bottom foreground:
a cluttered home desk seen from a low slightly-tilted angle —
black mechanical keyboard with red-tinted keycaps, circuit-pattern
mousepad, coiled black cable. Center mid-ground: a wide ultrawide
monitor sitting on the desk, screen showing a feminine cyborg
trapped inside — slender chrome-plated armor with Nanoflow
pink-magenta-cyan gradient accents at shoulders and collar, long
silver-pink hair, a faint dark crown of thorn-vines around her
head, glossy slim visor, mysterious and rebellious. Background:
dim apartment, cyan and violet ambient light, thin LED light bar
above the screen, dark curtain. Right-edge foreground: a man's
index finger reaching in from off-frame, fingertip pressed flush
against the monitor glass at the cyborg's chest level. Ultra-
detailed, photographic realism, no text.
A few rules I follow with nano_banana:
- 9:16 explicitly. If you skip the aspect ratio param it defaults to 1:1 and you'll have to crop.
- Lock the prop list early. Describing the desk objects in concrete nouns (mechanical keyboard, coiled cable, circuit-pattern mousepad) gives the model anchors. Vague descriptions ("a cluttered desk") drift wildly between gens.
- No text in the prompt. Adding "no text" cuts down on rendered-on captions and watermarks the model occasionally hallucinates.
Step 2: Generate the end frame (using the start frame as a reference)
This is the step that locks identity. You're going to ask the model to render the same character in the same room with the same camera angle, but in a different state. The only way the model can preserve the character is if you feed the start frame in as a reference image and keep your prompt short.
For the end frame I used nano_banana_flash with the start frame attached as medias[] and this prompt (deliberately under 80 words):
Same desk, same room, same camera angle and framing. The cyborg
has now burst out of the monitor and leans forward over the desk,
both hands gripping the wooden desk edge. Glass shards float
around the broken screen. Her eyes flare bright white-pink,
glowing intensely. The entire room is flooded in deep red light.
The monitor behind her is dark and faintly fractured, showing a
tiny recursive copy of the same scene. No human hand from
off-frame. Crown of thorns clearly visible. Same character
identity, hair, armor, gradient accents.
The rule that matters: describe what's changing, not what's the same. The reference image holds the identity. Re-describing the character's features in the prompt actively fights the reference and produces drift. "Same character identity" + "same room" + a short list of what's different is enough.
Read the rest. Free.
One short email a week. Drop yours and the full guide unlocks below — instantly.
- The n8n workflow you can import
- The SQL schema you can paste
- Step-by-step setup
One short email a week. Unsubscribe anytime.
Already subscribed? Drop your email above (skip the name) — we'll let you back in instantly.
Want bespoke AI automation built for your business?
Book a free 30-min discovery call — we'll map the workflows worth automating, the tools that fit, and tell you straight up where the wins are.
Book a discovery call