unwind ai
Posts
OpenAI Rolls out ChatGPT Voice Mode

OpenAI Rolls out ChatGPT Voice Mode

PLUS: Meta's AI Model detects Object from Video, OpenAI Voice Mode for ChatGPT Plus users

Shubham Saboo & Gargi Gupta
July 31, 2024

Today’s top AI Highlights:

Meta’s AI model can identify any object from Image or Video
Nvidia using Apple Vision Pro to train AI Robots
OpenAI started rolling out Voice Mode to ChatGPT Plus users
Meta’s opensource repo for building Llama 3.1 Agentic systems
Canva acquires Image generation AI startup Leonardo AI

& so much more!

Read time: 3 mins

Latest Developments

Meta Unveils AI for Real-time Video Object Detection 🚀

Meta has released SAM 2, the next generation of its Segment Anything Model for real-time object detection in both videos and images. Its predecessor SAM focused solely on image segmentation. SAM 2 can segment any object in any video or image, even if it hasn’t seen it before. It opens up a wide range of applications like video editing, mixed reality experiences, and various scientific fields.

Key Highlights:

Memory Mechanism - SAM 2 includes a memory mechanism, allowing the AI model to “remember” past frames and segment objects in motion.
Unified Model - SAM 2 can handle both images and videos, simplifying the development process for applications requiring object segmentation across different media formats.
Improved Accuracy - SAM 2 improves in accuracy for both image and video segmentation, outperforming its predecessor SAM and other state-of-the-art models.
Opensource - SAM 2 was trained on a dataset called SA-V that contains over 51,000 real-world videos. The model and the dataset have been opensourced. You can try the demo for free.

NVIDIA uses Apple Vision Pro for Training Robots 🦾

The problem of limited training data is not just for LLMs, it also persists for robotic training. NVIDIA is tackling this issue with innovative AI models. Imagine real-world data collected from humans and making it 1000x by using AI-generated simulations – that's precisely what NVIDIA is making possible. By combining cutting-edge hardware like the Apple Vision Pro with sophisticated AI models, this new approach accelerates the learning process for humanoid robots.

Key Highlights:

Apple Vision Pro for Robot Control - Apple Vision Pro tracks hand motions of humans and translates them to robotic movements in real-time. This helps in collecting high-quality training data.
Generating Thousands of Environments - The data collected above is multiplied by using an AI model to generate various simulations of environments with different layouts, textures, and appearances.
Teaching Robots to Improvise - NVIDIA uses another AI model that takes a single human demonstration and creates many variations of that action. It generates a multitude of new action trajectories based on the initial human demonstration, teaching the robot to perform the same task in countless different ways.
Expanding with Open Research - NVIDIA is providing robot manufacturers, AI model developers and software makers with these AI models to develop, train and build the next generation of humanoid robotics.

Quick Bites

OpenAI has started rolling out the new Voice Mode to a small group of ChatGPT Plus users. They will keep giving access to more people on a rolling basis. The video and screen-sharing capabilities of the Voice Mode won’t be available for now.

Harvard dropout Avi Schiffmann has released Friend, an AI wearable locket that listens to everything you say, memorizes it, and talks to you (via text) how you want it to talk, based on its memory about you. Priced at $99, it’s available for pre-order now.
Perplexity has introduced the Perplexity Publishers Program where partners can pay to ask specific related follow-up questions in Perplexity’s answer engine interface. When Perplexity earns revenue from this interaction with the partner’s content, that partner will also earn a share.
Canva has acquired Leonardo AI to integrate its advanced generative AI tools into Canva’s platform. This acquisition is set to enhance Canva’s Magic Studio by incorporating Leonardo’s technology, which provides users with a high level of control over AI-generated art

😍 Enjoying so far, share it with your friends!

Tools of the Trade

llama-agentic-system: Run Llama 3.1 as a system capable of performing "agentic" tasks like breaking a task down into multi-step reasoning, using tools, and adapting to new tool definitions.
Genie: A conversational AI companion for kids that can create art, play games, write stories, and provide age-appropriate answers. It gives a safe, creative, and supportive environment for children to learn and play.
Mapify: Transform documents, websites, images, infographics and even audio into clear mind maps for better learning and information retention.
Awesome LLM Apps: Build awesome LLM apps using RAG for interacting with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple texts. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

My early Siri AI experience has just underlined the fact that, while there is a lot of practical, useful things that can be done with small models, they really lack the horsepower to do anything super interesting.
They are a supplement to, not a replacement for, larger models. ~
Ethan Mollick
The next trillion dollar company is the one that ships a mass market humanoid robot that is sub $30k and can do house work! ~
Bindu Reddy

Meme of the Day

Source

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

PS: We curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!

Reply

or to participate.