unwind ai
Posts
Two AI Video Generation Models in 1 Week

Two AI Video Generation Models in 1 Week

PLUS: Google DeepMind's video-to-audio, 4D scene reconstruction AI model

Shubham Saboo & Gargi Gupta
June 19, 2024

Today’s top AI Highlights:

Runway’s latest AI model generates videos more realistic than ever
Luma Labs announces new features in Dream Machine
Google DeepMind’s technology breathes life into AI-generated silent videos
Wayve’s new AI model for 4D scenes to train self-driving cars

& so much more!

Read time: 3 mins

Latest Developments 🌍

Gen-3 Sets a New Standard for Video Realism 🌉

Runway has released Gen-3 Alpha, a new AI model for generating high-quality videos. Gen-3 can create hyperrealistic videos, up to 10 seconds long, with vivid expressions. This model is the first in a series trained on a new infrastructure for large-scale multimodal training, and it significantly improves in fidelity, consistency, and motion over Gen-2.

Key Highlights:

Improved temporal control: Gen-3 Alpha has been trained on highly descriptive, temporally dense captions for smooth transitions and adjustment of elements in the scene.
Photorealistic humans: The model generates expressive human characters with a wide range of actions, gestures, and emotions, opening up new possibilities for storytelling.
Comparison and Availability: While the samples might have been cherry-picked by Runway, we used the same prompts in Luma Labs Dream Machine and the difference in quality and realism is significant. However, Dream Machine is available to try for free while Runway has not made Gen-3 available for use.

Prompt: Dragon-toucan walking through the Serengeti.

Gen-3 Output

Dream Machine Output

Luma Labs is fiercely competing with Runway in AI video generation. They have released a new feature in Dream Machine to Extend AI-generated videos from 5 seconds to >10 seconds. They also gave us a sneak peek of the upcoming editing features giving greater control and an intuitive creative experience.

Hear Your AI Videos Come to Life 🥁

As we can see above, AI video generation models are advancing every day bringing our imaginations to reality. But most of these models generate silent videos. Without audio, these videos lack the emotional depth, the richness of storytelling, and the immersive experience that sound can provide.

Google DeepMind is developing a new technology called V2A (video-to-audio) to fill this gap by generating synchronized soundtracks for silent videos. V2A uses video pixels and text prompts to create realistic sound effects, dialogue, and music that match the scene.

Key Highlights:

Flexibility and Control: V2A can generate an unlimited number of soundtracks for any video input. Users can also use positive and negative prompts to guide the generated audio toward or away from specific sounds, giving them more creative control.
Diffusion-Based Approach: V2A uses a diffusion-based approach to audio generation, which is more effective than autoregressive models in synchronizing video and audio. This approach starts with random noise and iteratively refines the audio until it closely aligns with the visual input and prompts.
Potential for Creative Applications: V2A can be used to create soundtracks for a wide range of videos, including generated movies, archival footage, and silent films. When paired with AI video generation models, this opens up new possibilities for filmmakers and other creatives to bring their visions to life.

Prompt for Audio: A drummer on a stage at a concert surrounded by flashing lights and a cheering crowd.

4D Realism to Autonomous Driving Simulations 🚘

London-based Wayve is pioneering the technology to train and develop reliable autonomous driving. They have released a new scene reconstruction model called PRISM-1 that creates 4D simulations of real-world environments. This model combines 3D spatial information with time, crucial for training and testing autonomous driving systems. PRISM-1 is specifically designed to handle complex and dynamic urban scenes, capturing the movements of vehicles, pedestrians, and other objects.

Key Highlights:

Self-Supervised Disentanglement: PRISM-1 can separate static and dynamic elements within a scene without needing explicit labels, making it adaptable to different camera setups. This means more natural and intuitive scene representation without complex annotations.
Flexible Framework: PRISM-1 handles various elements found in urban environments, like vehicles with features like brake lights and windshield wipers, cyclists, pedestrians, traffic lights, and even transient details like roadside debris and changing light conditions. It creates a highly comprehensive simulation experience.
Scalable Representation: PRISM-1 is efficient even with complex scenes, minimizing engineering effort while reducing error propagation. This ensures that the model scales well as urban environments become more intricate.
Novel View Synthesis: PRISM-1 can reconstruct scenes from different viewpoints, even those not captured in the original sensor data. This is crucial for simulating scenarios such as testing safety-critical situations.

😍 Enjoying so far, share it with your friends!

Tools of the Trade ⚒️

Zenes AI: AI tool for automating software testing. It generates test cases from your product documentation, converts them into user stories, and creates test scripts, saving you time and effort.
SuperKalam: AI mentor to give you personalized guidance and resources for UPSC exam preparation, including daily news analysis, editorial summaries, quizzes, and practice MCQs. It also gives instant evaluation of handwritten Mains answers with detailed feedback and model answers.

OptimusFlow: Integrate AI into your app quickly via API and automate tasks using an intuitive no-code platform. You can upload documents to create AI assistants, automate workflows, and configure complex actions with non-linear decision-making.

Awesome LLM Apps: Build awesome LLM apps using RAG for interacting with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple texts. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes 🔥

A lot folks are negative on the application layer of ai. I keep hearing that the "LLM wrapper applications" will get disrupted by the LLM's. I think you could argue that back in 2006, when HubSpot was founded and SaaS was exploding that all of us were "database wrapper applications." The industry blossomed by adding data, workflow, and deep domain expertise on top of those databases. I think LLM wrapper application companies will thrive by adding data, workflow, and deep domain expertise on top of the LLM's. ~Brian Halligan
It bothers me that schools—as well as parents—rarely tell kids about careers with uncapped upside, like tech or real estate. And instead, kids end up entering college thinking the best outcome is that they become a lawyer, a consultant or—even worse—thinking it would be cool to get a job at a government agency. ~Nikita Bier

Meme of the Day 🤡

Video Credits: Eduardo Ordax

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!

Reply

or to participate.