/

,

/

Kling 3.0 Initial Test

The tests highlighted varying results of Kling Video models, revealing strengths in character consistency but challenges with prompt adherence and continuity.

This is a test I conducted for the Kling Video 3.0 and Video 3.0 Omni models. I will attach the initial frame and character elements I used. Since this wasn’t an intensive test, please keep in mind that my hypotheses or assessments might be incorrect.


Fail 01

(Video 3.0) [Start Frame + Element (Character), 720p]:

The multishot failed in this example. It turned out to be a single continuous take that only followed the first set of instructions, which was a rotating camera doing a push in.

Prompt Fail 01

The scene opens with an extreme high-angle, birds-eye view of the figure lying motionless on the brutalist concrete ground, bathed in diffuse, stark light. The camera begins a slow, descending crane shot combined with a gentle counter-clockwise rotation, gradually pushing in towards the central figure. A low, resonant hum of distant ventilation systems subtly permeates the profound, almost suffocating silence.
At the 00:02 mark, a quick cut transitions to a tight close-up on the figure’s face. Their eyelids flicker, then eyes snap wide open with a sudden, sharp gasp.
[SFX: Sudden, sharp intake of breath].
Immediately, the camera executes a smooth reverse tracking shot, pulling back from the figure’s face. The figure slowly and laboriously pushes up from a prone position into a seated, upright stance, legs still bent, revealing a dazed, confused expression. They raise a hand to their temple, rubbing it gently as if fighting dizziness.
[Character: Disoriented Figure, strained, raspy voice]: “Where am I…?” [SFX: Faint, ringing tinnitus fades].
The camera then cuts back to the extreme high-angle, God’s Eye view, slowly pushing in as the figure finishes their slow, deliberate rise to a fully standing position on the vast concrete expanse.
Finally, the camera executes a fluid circular tracking shot, settling on a cowboy shot of the figure. The camera slowly orbits around them, emphasizing their isolation against the towering, empty brutalist concrete structures. The wind whistles faintly through the concrete corridors, a chilling reminder of solitude.

Fail 02

(Video 3.0) [Start Frame + Element (Character), 1080p]:

Same prompt with fail_01, and same assessment (failed in multishot, or low prompt adherence)

Prompt Fail 02

The scene opens with an extreme high-angle, birds-eye view of the figure lying motionless on the brutalist concrete ground, bathed in diffuse, stark light. The camera begins a slow, descending crane shot combined with a gentle counter-clockwise rotation, gradually pushing in towards the central figure. A low, resonant hum of distant ventilation systems subtly permeates the profound, almost suffocating silence.
At the 00:02 mark, a quick cut transitions to a tight close-up on the figure’s face. Their eyelids flicker, then eyes snap wide open with a sudden, sharp gasp.
[SFX: Sudden, sharp intake of breath].
Immediately, the camera executes a smooth reverse tracking shot, pulling back from the figure’s face. The figure slowly and laboriously pushes up from a prone position into a seated, upright stance, legs still bent, revealing a dazed, confused expression. They raise a hand to their temple, rubbing it gently as if fighting dizziness.
[Character: Disoriented Figure, strained, raspy voice]: “Where am I…?” [SFX: Faint, ringing tinnitus fades].
The camera then cuts back to the extreme high-angle, God’s Eye view, slowly pushing in as the figure finishes their slow, deliberate rise to a fully standing position on the vast concrete expanse.
Finally, the camera executes a fluid circular tracking shot, settling on a cowboy shot of the figure. The camera slowly orbits around them, emphasizing their isolation against the towering, empty brutalist concrete structures. The wind whistles faintly through the concrete corridors, a chilling reminder of solitude.

Fail 03

(Video 3.0) [Start Frame + Element (Character), 1080p]:

Similar prompt with fail_01/02, but using the custom multishot. Same failed result. After 3 tries, I gather that binding to element (Character) is the main reason for the low prompt adherence.

Prompt Fail 03

Shot 1 4s: An extreme high-angle, God’s Eye view shot of the figure lying prone on the brutalist concrete. The camera slowly pushes in while performing a subtle counter-clockwise rotation, descending towards the figure. The scene is quiet.
Shot 2 2s: A rapid cut to a close-up on the figure’s face as their eyes snap open with a sharp gasp.
Shot 3 5s: The camera pulls back in a slow reverse tracking shot as the figure slowly pushes up to a seated, upright position, clutching their head. [Character: Figure, confused voice]: “Where am I?”
Shot 4 2s: A quick cut back to the initial extreme high-angle, God’s Eye view, slowly zooming in as the figure slowly rises to a standing position.
Shot 5 2s: A cowboy shot where the camera slowly orbits the now-standing figure, revealing the empty brutalist buildings around them.

Success 04

(Video 3.0 Omni) [Start Frame + Character, 720p]:

From this experiment, it’s clear that Video 3.0 Omni adapts its editing and pacing according to the instructions provided. Although the clothing isn’t always consistent, I think including specific descriptions in the prompt would solve the issue. The cinematic quality and prompt adherence reflect my vision well. It isn’t quite 100% yet, but I’m very happy with the outcome.

Note: *Image not pinned as start frame.

Prompt Success 04

[Image*] [Character] A disorienting cinematic sequence opens with an extreme high-angle crane shot, slowly pushing in on a lone figure lying on desolate brutalist concrete, the frame subtly rotating amidst a silent, echoing environment with a faint, distant hum. At the 2-second mark, a quick cut reveals a tight close-up as the character’s eyes snap open with an abrupt, desperate gasp, their head still on the cold ground. Immediately, a slow reverse tracking shot pulls back as the character disorientedly pushes into a seated position, clutching their head, uttering “Where am I?” in a weak, confused voice. The scene then cuts to an extreme high-angle wide shot, mirroring the opening, as the character slowly rises to their feet. Finally, a medium-long cowboy shot executes a slow, sweeping arc around the now-standing character, starkly revealing the vast, isolating brutalist architecture, as the oppressive silence returns to amplify their profound disorientation.

Success 05

(Video 3.0) [Start Frame + Element (Character), 720p]:

As I thought, binding elements makes the model focus on a one-take or continuous shot. In this test, characters match their elements well, which is a strength of Kling 3.0. However, my hypothesis about prompt adherence was right: it drops when you bind elements.

Prompt Success 05

A high-intensity one-shot tracking take. The camera starts in a Medium Shot, perfectly synchronized to the man’s sprinting pace with a raw handheld shake. [Audio: Deep bass rumble of a collapsing building, crackling fire, and heavy, rapid boots].As he runs, he begins to wheeze with exhaustion. [Character: The Survivor, raspy trembling voice]: “Come on… just a little further!”Immediately, a massive steel girder slams into the ground in front of him with a deafening [Sound Effect: Metallic crash]. The camera orbits 180 degrees around his side as he performs a desperate sliding maneuver to avoid the debris. His face is illuminated by the orange glow of the fire.Without cutting, the camera pulls back (Dolly-out) into a Wide Shot, revealing the scale of the collapsing warehouse behind him as he reaches the blinding light of the exit. The soundscape transitions from the roar of the fire to the sudden, muffled silence of the outdoors.

Good 06

(Video 3.0 Omnni) [Image + Character, 720p]:

I’ve categorized this as “good” because while the results didn’t match the prompt at all, the output is great and worth keeping. I suspect the “15-second cinematic” trigger word forced the model to generate a continuous shot (and this is omni version). I have to say the character consistency is very impressive. Also, remember that the character’s face is typically not very clear in the first frame image.

Prompt Good 06

A 15-second cinematic narrative sequence in @Image. Shot 1: the camera slowly dollies to the right. ro @Robert man leans against the pillar, his thumb brushing against the paper he holds. In the background, the train emits a steady hiss of white steam, and the crowd moves with a blurred, busy rhythm. The audio is a rich soundscape of station murmurs, the distant clanking of tracks, and the heavy thrum of the idling locomotive. Shot 2: The camera cuts to a tight close-up of the man’s face. He squint-eyes at the paper, his expression shifting from confusion to a subtle, weary smile. He exhales a visible breath and mutters softly to himself, “Finally, the right track.” The audio focuses on his low, gravelly voice and the rustle of the paper against his jacket. Shot 3: The camera cuts to a low-angle medium shot. A local man carrying a large cardboard box walks briskly past the protagonist, momentarily obscuring the frame. As the passerby clears, the protagonist pushes himself off the pillar, picking up the heavy tan backpack at his feet. The audio features the heavy ‘thump’ of the backpack being lifted and the rapid footsteps of the crowd on the concrete. Shot 4: The camera pulls back into an ultra-wide tracking shot from behind the man. The train lets out a deafening, high-pitched steam whistle. The man begins walking toward the open train door.

Fail 07

(Video 3.0) [Start Frame + Element (character), 720p]:

There is a lot of hallucination happening here. Along with the continuous shot, it generates dialogue or sounds that have nothing to do with the prompt. The lip-syncing is also broken because the mouth movements do not match the sound.

Prompt Fail 07

Shot 1 3s: The scene opens in a medium-wide shot; the camera slowly dollies to the right. The Western man leans against the pillar, his thumb brushing against the paper he holds. In the background, the train emits a steady hiss of white steam, and the crowd moves with a blurred, busy rhythm. The audio is a rich soundscape of station murmurs, the distant clanking of tracks, and the heavy thrum of the idling locomotive.
Shot 2 3s: The camera cuts to a tight close-up of the man’s face. He squint-eyes at the paper, his expression shifting from confusion to a subtle, weary smile. He exhales a visible breath and mutters softly to himself, “Finally, the right track.” The audio focuses on his low, gravelly voice and the rustle of the paper against his jacket.
Shot 3 2s: The camera cuts to a low-angle medium shot. A local man carrying a large cardboard box walks briskly past the protagonist, momentarily obscuring the frame. As the passerby clears, the protagonist pushes himself off the pillar, picking up the heavy tan backpack at his feet. The audio features the heavy ‘thump’ of the backpack being lifted and the rapid footsteps of the crowd on the concrete.
Shot 4 2s: The camera pulls back into an ultra-wide tracking shot from behind the man. The train lets out a deafening, high-pitched steam whistle. The man begins walking toward the open train door, his figure silhouetted against the golden sunset light. The screen fades slightly as the train starts to chug forward with a rhythmic metallic grinding sound, leaving the pillar and the empty bench in the foreground.

Success 08

(Video 3.0) [Start Frame, 720p]:

I’m impressed with the results since it followed the prompt exactly. My hypothesis was correct?. The test turned out great by skipping the character binding and using narrative-style prompts with “Then” and “Finally.” The output is cinematic, but the downside is the character wasn’t consistent (without the binding feature).

Prompt Success 08

Steady tracking shot maintaining a consistent distance as the tactical man runs toward the camera. he glances over his shoulder with a determined yet panicked expression. the camera then smoothly transitions into a god’s eye view, making the character look small against the vast, rain-slicked urban landscape. Finally, the scene cuts to a medium close-up of his face as he continues to run. Audio features the rhythmic metallic clinking of his tactical vest and his ragged, heavy breathing.


In my opinion, Kling Video 3.0 and Video 3.0 Omni represent a major step forward. The key features are character consistency using binding and the multi-shot capability. I can’t give a detailed review of the audio yet because my projects usually use a narrator instead of dialogue, but the foley and ambience are great so far. Keep in mind that this wasn’t a thorough test. With limited credits and time, this is the only feedback I can provide for now, but I hope it’s useful.

posted by

Recent Works