- unwind ai
- Posts
- Era of Hyper-Real AI Videos is here 🤯
Era of Hyper-Real AI Videos is here 🤯
PLUS: Boston Dynamics unreal robot flex, Mistral AI releases details of 8x22B MoE
Today’s top AI Highlights:
Microsoft unveils AI that transforms portraits into real-time talking videos
Boston Dynamics replaces hydraulic Atlas robot with new electric model
OpenAI updates its Assistants API to handle 10,000 files per assistant
Mistral AI reveals details of new cost-effective, superior MoE model
Organize notes just from your voice with this new AI tool
& so much more!
Read time: 3 mins
Exciting Opportunity: Share how you use AI and get featured in Unwind AI! Details below.
Latest Developments 🌍
Portraits Come to Life with Lip Sync and Expressions 🙆♀️
Imagine a portrait coming to life that can talk with proper expressions just like a human! Microsoft’s VASA-1 can turn any portrait photo into a hyper-realistic talking head video with just a speech audio. It syncs lip movements with the spoken audio in real-time while capturing subtle facial expressions and natural head movements. This technology opens doors for creating interactive virtual assistants, realistic digital characters for movies or games, and much more. Bringing images of deceased artists or rulers back to life, the possibilities are infinite!
Key Highlights:
Facial Expressions and Movements: VASA-1 doesn’t just animate the lips to match the audio; it also generates realistic head movements, eye gaze, and a variety of emotional expressions, making the digital person incredibly lifelike.
Controllable Generation: You can adjust various aspects of the video, such as the direction of the eyes, perceived distance of the head, and even specific emotions like happiness or anger on the face.
Versatility: Whether you provide an artistic photo, a singing audio clip, or even speech in a language other than English, VASA-1 can handle it and create a realistic talking head video.
High-Quality Real-Time Performance: The model generates high-resolution (512x512) videos at an impressive speed of up to 40 frames per second, all running on a single NVIDIA RTX 4090 GPU. You get both stunning visuals and smooth, real-time performance.
Humanoid Robot Masters Movements that Humans Can’t 🕺
It seems Boston Dynamics has realized that building a humanoid robot that closely matches humans can suffer from the limitations of human anatomy. Last week, the company officially retired its hydraulics robot “Atlas.” They have now released the next-gen of Atlas which is fully electric, and designed for real-world applications.
This new Atlas boasts a sleeker design and even greater agility. In an amazing (a little scary though) demo, Atlas performs a series of impressive gymnastic and parkour-like movements. It executes a full 180-degree flip and shows remarkable whole-body coordination, balance, and strength, all more smoothly and quietly than the previous version.
Key Highlights:
Aesthetics & Design: The new Atlas sports a more streamlined and compact build. The absence of external hydraulic lines contributes to a cleaner appearance while adding a LiDAR sensor on its head hints at enhanced navigation capabilities. It also features a new gripper to handle a variety of objects.
Movement & Agility: Atlas comes with more strength and a broader range of motion and dexterity. It performs fluid parkour maneuvers and walks smoother. The most notable change is the way it turns in the opposite direction: while the old Atlas would jump to turn, the new version flips its parts 180 degrees for a faster turn, overcoming the human anatomy limitations.
Under the Hood: The new Atlas benefits from Boston Dynamics’ advancements in software and AI. The company’s expertise in simulation, model predictive control, reinforcement learning, and computer vision ensures that Atlas can adapt and operate efficiently in complex real-world situations.
Why the Change? The switch from hydraulics to electric actuation brings several advantages. The new Atlas operates more quietly and is more energy-efficient, for use in various environments and longer operation times. The electric actuators offer finer control, resulting in smoother and more precise movements.
Hyundai Partnership: Boston Dynamics is partnering with Hyundai to explore real-world applications for the new Atlas. Hyundai’s advanced manufacturing facilities will serve as a testing ground for the robot's capabilities.
OpenAI Introduces Major Updates to Assistants API ✨
OpenAI has rolled out a series of updates to its Assistants API, named OpenAI-Beta: assistants v2. The update introduces a revamped file search tool that significantly ups the ante by managing up to 10,000 files per assistant, a leap from the earlier 20 files. These enhancements enhance the functionality and ease of use for developers across various sectors.
Key Highlights:
Enhanced file search: The file_search capability has been expanded to handle up to 10,000 files per assistant. It also features faster, multi-threaded searches with advanced reranking and query rewriting for improved accuracy and speed in retrieval.
Vector Store Integration: With the introduction of vector store objects, files added are automatically parsed, chunked, and embedded, making them ready for search. This streamlines file management across different assistants and threads, simplifying usage and billing.
Customizable Token Usage and Runs: Developers can now control the maximum number of tokens used per run, allowing for better management of associated costs. Limits can also be set on the number of previous messages considered in each run to boost performance and relevance.
Streaming and Fine-Tuned Models Support: The API now supports streaming, enhancing its utility for real-time applications. Additionally, it allows for integrating fine-tuned models, initially including versions of gpt-3.5-turbo-0125, to provide more accurate and specific responses tailored to particular data sets.
Mixtral 8x22B Surpasses Peers with Fewer Parameters 🤌
Last week, Mistral AI casually dropped a new Mixture-of-Experts modelLLaMA 2 via a link on X. It has finally posted a blog revealing more information on the model. Mixtral 8x22B is a sparse MoE model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. It is opensourced under Apache 2.0, the most permissive open-source license.
Key Highlights:
Cost Efficiency: The model’s sparse activation patterns make it faster than any dense 70B model, while being more capable than any other open-weight model. It offers unmatched cost efficiency considering its size, delivering the best performance-to-cost ratio.
Multilingual: With native multilingual capabilities, it strongly outperforms LLaMA 2 70B on reasoning and common sense benchmarks in French, German, Spanish, and Italian.
Function calling: It is natively capable of function calling; along with the constrained output mode implemented on la Plateforme, this enables application development and tech stack modernization at scale.
Performance: It shows extremely impressive performance across knowledge, reasoning, common sense, math, and coding, outperforming models like LLaMA 2 and Command R models with almost double or triple the active parameters.
😍 Enjoying so far, share it with your friends!
Tools of the Trade ⚒️
Voxio: Record audio and convert it into organized, formatted text. You can use it to create notes from voice recordings, lectures, or any spoken content, and even customize the output with templates to suit different needs such as emails or formal documents.
Insight7: Extract insights from interviews at scale. It automatically analyzes and visualizes data from conversations in video, audio, or text format, allows integrating these insights into existing workflows through APIs, and triggers actions in your CRM, marketing automation, and collaboration tools, helping organizations make data-driven decisions and improve experiences.
RecurseChat: Chat with LLMs locally and securely. It lets you interact with PDFs, markdown, and text files without an internet connection. It supports full-text search, importing ChatGPT history, multimodal input, and customization of the AI’s appearance and personality, all secured by macOS App Sandbox.
Sachiv.AI: An AI-powered program manager and secretary to assist in meetings efficiently. It can join video-conferencing platforms like Google Meet, Zoom, and Slack, take notes, update platforms like Jira and Notion with actionable, book follow-up meetings, and chase your tasks.
Text within this block will maintain its original spacing when published🌟 Spotlight on You: Share Your AI Use Case and Get Featured!
At Unwind AI, we’re all about real-world applications of AI tools. Whether you’re simplifying daily tasks, enhancing your projects, or exploring new possibilities, we want to celebrate how you use AI.
Participate in just a few easy steps. Send the following on Unwind AI’s email [email protected]
We will feature your story in our newsletter as detailed tutorials. It’s a great way to share your insights and get recognized.
We are eager to showcase your experiences and expand our collective understanding of practical AI applications. Let’s learn from each other and grow together!
Hot Takes 🔥
Humanoid robots will exceed the supply of iPhones in the next decade. Gradually, then suddenly. ~Jim Fan
the global optimum of human language preference is lists and couplet poetry unfortunately ~roon
People who don't want to learn how to program can always find a reason why not to. This time it's AI, last time it was that tech was over because the Internet Bubble burst, the time before that it was that all the programming jobs were going to be outsourced to India. ~Paul Graham
Meme of the Day 🤡
That’s all for today! See you tomorrow with more such AI-filled content.
Real-time AI Updates 🚨
⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!
PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!
Reply