Grok Imagine v2

One prompt, every dimension. Combine text, images, clips, and audio into cinematic AI video in one studio.

Featured Workflow

Lead with Grok Imagine

Keep Grok front and center, then decide whether this run starts from a prompt or from reference imagery.

Grok Input Mode

No models are available for this mode right now.

Prompt

0/5000

Resolution

Duration8s

5s60s

Aspect Ratio

Cost: 6 credits

Loved by Creators

Creators, marketers, and filmmakers use Grok Imagine v2 every day.

We replaced three separate tools with one workflow and cut turnaround time for ad tests dramatically.

Marcus Rivera

Video Producer

The reference-image workflow makes it much easier to keep a campaign visually consistent across many short videos.

Priya Sharma

Social Media Manager

I can move from idea to polished clip in minutes, and that changes how often I publish.

Sarah Chen

Content Creator

We replaced three separate tools with one workflow and cut turnaround time for ad tests dramatically.

Marcus Rivera

Video Producer

The reference-image workflow makes it much easier to keep a campaign visually consistent across many short videos.

Priya Sharma

Social Media Manager

I can move from idea to polished clip in minutes, and that changes how often I publish.

Sarah Chen

Content Creator

Fast model switching lets me test multiple aesthetics before I commit to a final direction for a song.

Alex Turner

Music Video Director

The motion quality is strong enough for storyboard exploration, style tests, and quick concept proofs.

Jessica Liu

Animation Director

The studio feels practical instead of experimental. My team can brief, generate, review, and iterate in one place.

Tom Brennan

Creative Director

Fast model switching lets me test multiple aesthetics before I commit to a final direction for a song.

Alex Turner

Music Video Director

The motion quality is strong enough for storyboard exploration, style tests, and quick concept proofs.

Jessica Liu

Animation Director

The studio feels practical instead of experimental. My team can brief, generate, review, and iterate in one place.

Tom Brennan

Creative Director

It is a useful teaching tool because students can compare how different models interpret the same shot description.

Dr. Linda Park

Film Professor

Reference-driven generation helps preserve facial feel and staging better than most AI video tools I have tried.

Robert Chen

Character Animator

For mood pieces and lyric visuals, the speed is good enough that I can prototype ideas while the track is still evolving.

Maya Okonkwo

Music Artist

It is a useful teaching tool because students can compare how different models interpret the same shot description.

Dr. Linda Park

Film Professor

Reference-driven generation helps preserve facial feel and staging better than most AI video tools I have tried.

Robert Chen

Character Animator

For mood pieces and lyric visuals, the speed is good enough that I can prototype ideas while the track is still evolving.

Maya Okonkwo

Music Artist

It is great for previz. I can explore shot rhythm, camera energy, and lighting direction before a full production pass.

Daniel Yamamoto

VFX Artist

Keeping motion readable matters in my niche, and the generated clips stay clear enough for product and coaching content.

Amanda Foster

Fitness Creator

I use it to test styling concepts, campaign moods, and editorial pacing before we book a full shoot.

Zara Williams

Fashion Creative

It is great for previz. I can explore shot rhythm, camera energy, and lighting direction before a full production pass.

Daniel Yamamoto

VFX Artist

Keeping motion readable matters in my niche, and the generated clips stay clear enough for product and coaching content.

Amanda Foster

Fitness Creator

I use it to test styling concepts, campaign moods, and editorial pacing before we book a full shoot.

Zara Williams

Fashion Creative

Frequently Asked Questions

Everything you need to know before you generate.

You can switch between leading text-to-video and image-to-video models, including Grok Imagine, Kling, Seedance, Sora, Veo, Wan, Runway, and more.

Reference images help lock composition, characters, styling, and overall visual direction so the generated video stays closer to your intent.

Generated videos are exported as MP4. You can choose 480p, 720p, or 1080p and common aspect ratios such as 16:9, 9:16, 4:3, and 1:1.

Most jobs finish within a few minutes. Timing depends on the selected model, the target duration, queue load, and output quality.

Credits are charged by task type. Text-to-video costs fewer credits than image-to-video, and your remaining balance updates after each job.

In general, yes. You should still review the policy of the underlying model provider before using outputs in ads, client work, or commercial campaigns.