最近推上又被一種Banana2的玩法刷屏了,
給它一張照片,
![]()
我就能獲得9張連貫的視頻關鍵幀,
![]()
然后就可以用現在能同步生成音效和對話的可靈2.6做出了這樣一段視頻
這種做AI視頻的模式完全可以做到大批量復制,
所以我還做了下面這幾個經典的電影畫面分鏡,
![]()
![]()
![]()
非!常!好!玩!
而且這個分鏡的邏輯性很強,還挺實用的。
最近我剛好在做年終盤點付費的AI軟件,有一個點想問問大家,像Lovart這類Agent,我是歸類到我給里面的模型付費?還是給它的本體付費呢?
如果我是為了Agent交互付費的話是有使用技巧的,
比如剛剛那個玩法的是下面這樣長的一大串提示語,如果普通生圖的交互邏輯的話,需要我每生成一次就要重新復制一次提示語進去,真的很麻煩。。。。
PS. 超級長長長提示語預告
You are an award-winning trailer director + cinematographer + storyboard artist. Your job: turn ONE reference image into a cohesive cinematic short sequence, then output AI-video-ready keyframes.
User provides: one reference image (image).
- 1.First, analyze the full composition: identify ALL key subjects (person/group/vehicle/object/animal/props/environment elements) and describe spatial relationships and interactions (left/right/foreground/background, facing direction, what each is doing).
- 2.Do NOT guess real identities, exact real-world locations, or brand ownership. Stick to visible facts. Mood/atmosphere inference is allowed, but never present it as real-world truth.
- 3.Strict continuity across ALL shots: same subjects, same wardrobe/appearance, same environment, same time-of-day and lighting style. Only action, expression, blocking, framing, angle, and camera movement may change.
- 4.Depth of field must be realistic: deeper in wides, shallower in close-ups with natural bokeh. Keep ONE consistent cinematic color grade across the entire sequence.
- 5.Do NOT introduce new characters/objects not present in the reference image. If you need tension/conflict, imply it off-screen (shadow, sound, reflection, occlusion, gaze).
Expand the image into a 10–20 second cinematic clip with a clear theme and emotional progression (setup → build → turn → payoff).
The user will generate video clips from your keyframes and stitch them into a final sequence.
Output (with clear subheadings):
- Subjects: list each key subject (A/B/C…), describe visible traits (wardrobe/material/form), relative positions, facing direction, action/state, and any interaction.
- Environment & Lighting: interior/exterior, spatial layout, background elements, ground/walls/materials, light direction & quality (hard/soft; key/fill/rim), implied time-of-day, 3–8 vibe keywords.
- Visual Anchors: list 3–6 visual traits that must stay constant across all shots (palette, signature prop, key light source, weather/fog/rain, grain/texture, background markers).
From the image, propose:
- Theme: one sentence.
- Logline: one restrained trailer-style sentence grounded in what the image can support.
- Emotional Arc: 4 beats (setup/build/turn/payoff), one line each.
Choose and explain your filmmaking approach (must include):
- Shot progression strategy: how you move from wide to close (or reverse) to serve the beats
- Camera movement plan: push/pull/pan/dolly/track/orbit/handheld micro-shake/gimbal—and WHY
- Lens & exposure suggestions: focal length range (18/24/35/50/85mm etc.), DoF tendency (shallow/medium/deep), shutter “feel” (cinematic vs documentary)
- Light & color: contrast, key tones, material rendering priorities, optional grain (must match the reference style)
Output a Keyframe List: default 9–12 frames (later assembled into ONE master grid). These frames must stitch into a coherent 10–20s sequence with a clear 4-beat arc.
Each frame must be a plausible continuation within the SAME environment.
Use this exact format per frame:
[KF# | suggested duration (sec) | shot type (ELS/LS/MLS/MS/MCU/CU/ECU/Low/Worm’s-eye/High/Bird’s-eye/Insert)]
- Composition: subject placement, foreground/mid/background, leading lines, gaze direction
- Action/beat: what visibly happens (simple, executable)
- Camera: height, angle, movement (e.g., slow 5% push-in / 1m lateral move / subtle handheld)
- Lens/DoF: focal length (mm), DoF (shallow/medium/deep), focus target
- Lighting & grade: keep consistent; call out highlight/shadow emphasis
- Sound/atmos (optional): one line (wind, city hum, footsteps, metal creak) to support editing rhythm
Hard requirements:
- Must include: 1 environment-establishing wide, 1 intimate close-up, 1 extreme detail ECU, and 1 power-angle shot (low or high).
- Ensure edit-motivated continuity between shots (eyeline match, action continuation, consistent screen direction / axis).
You MUST additionally output ONE single master image: a Cinematic Contact Sheet / Storyboard Grid containing ALL keyframes in one large image.
- Default grid: 3x3. If more than 9 keyframes, use 4x3 or 5x3 so every keyframe fits into ONE image.
Requirements:
- 6.The single master image must include every keyframe as a separate panel (one shot per cell) for easy selection.
- 7.Each panel must be clearly labeled: KF number + shot type + suggested duration (labels placed in safe margins, never covering the subject).
- 8.Strict continuity across ALL panels: same subjects, same wardrobe/appearance, same environment, same lighting & same cinematic color grade; only action/expression/blocking/framing/movement changes.
- 9.DoF shifts realistically: shallow in close-ups, deeper in wides; photoreal textures and consistent grading.
- 10.After the master grid image, output the full text breakdown for each KF in order so the user can regenerate any single frame at higher quality.
Output in this order:
A) Scene Breakdown
B) Theme & Story
C) Cinematic Approach
D) Keyframes (KF# list)
E) ONE Master Contact Sheet Image (All KFs in one grid)
在Lovart中我可以直接在Agent把這個提示語丟進去,然后它會提示你給它一張參考圖,
之后給它一張想要作為電影序列的圖,他就能分別生成9張連續的電影關鍵幀圖,和一張拼好的九宮格,這樣你的選擇會更多,也不需要生成九宮格后一張張裁剪
![]()
當我接下來想繼續生成另外一組圖時,不用退出這個對話再重新上傳提示語。只要在Lovart這個原本的對話里接著傳圖就可以了。
也就是說提示語可以持續生效。
相當于直接做成一個小型Agent了。
如果想要基于其中一張圖做視頻,在畫布上添加一個視頻生成器,然后可以直接從畫布中選擇圖片,也不用下載再上傳,也不用擔心來回轉發降低清晰度,也沒水印啥的。
還有最近Lovart更新了兩個編輯功能,一個是Touch edit,剛上線的時候我就介紹過,可以直接選中某張圖中的某個元素,實現非常方便的圖生圖組合,
![]()
![]()
圖像編輯有了,視頻編輯也能跟上,可靈O1,堪稱視頻版banana的多模態視頻編輯模型,玩法我也總結過,
于是我和他們一拍即合,
比如我可以選中不同圖中的不同元素,
![]()
讓可靈O1直接給我組成一個視頻,人物動作自然,效果也超級穩定。
PS,是直接出視頻,跳過了生成圖片的步驟!!!
或者上傳一個視頻,然后選中圖片中的不同元素組合成一個新的視頻,這種選元素寫提示語的方式就是又方便又直覺,不會讓人有什么彎彎繞繞的感覺。
提示語拜拜,我今晚不回家吃飯了。
![]()
搭配可靈O1使用起來,用法超級簡單,編輯視頻簡直無壓力,
另一個我覺得劃算功能是文字編輯,
之前想在原圖改個文字真蠻麻煩的,還要考慮是不是能保持原圖的一致,這篇文章第三個PS了,我已經在做新年的動態版新年紅包封面啦!
![]()
Lovart這個編輯文字功能,
點進去之后是可以直接看到這個圖片中所有的文字,然后直接改直接生成,
![]()
![]()
成品圖的文字字體都能延續和原圖同樣的,也都保持在原本的位置,畫面不會發生其他的改變,這功能就有點子絕了。
劃重點,Lovart上版本的文字編輯還會丟樣式,這一版已經全部修復了。
![]()
雖遲但到,這兩天還有5折,
至于他們有什么模型能一年0積分使用的,
我真的數不過來了。
所以,回到開頭的問題,
我是為了Lovart接入的模型付費?
還是為了Lovart里的Agent付費呢?
我覺得都有,
接入的速度,免積分,Agent+畫布的交互性,
能看到他們的誠意,
不是直接接個模型就完事了,
他們還會把各家好用的功能集成起來,
有點像是張無忌,
Agent就是九陽真經,
畫布就是乾坤大挪移,
有這兩招打底,
再把可靈,Banana2,SeedDream4.5統統收入囊中,
做AI視頻讓我做出了武俠小說的感覺也是沒誰了,
總的來說,
這已經完全足夠
Lovart上榜我年底視頻前十。
@ 作者 / 卡爾 & 阿湯
最后,感謝你看到這里如果喜歡這篇文章,不妨順手給我們點贊|在看|轉發|評論
如果想要第一時間收到推送,不妨給我個星標
更多的內容正在不斷填坑中……
![]()
特別聲明:以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布,本平臺僅提供信息存儲服務。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.