• unwind ai
  • Posts
  • Google's Answer to OpenAI's Voice Mode

Google's Answer to OpenAI's Voice Mode

PLUS: Next-gen AI agents, Developer platform for on-device AI/ML

Today’s top AI Highlights:

  1. Google releases real-time AI voice assistant

  2. MultiOn’s next-gen AI Agent which can plan and self-correct

  3. The Add Me feature in Pixel adds the photographer to a group photo, later

  4. OpenAI’s subset of SWE-bench to correctly evaluate AI models’ engineering capabilities

  5. Stanford’s LLM tools to write Wikipedia-style articles

& so much more!

Read time: 3 mins

Latest Developments

Google just released what we’ve been waiting for OpenAI to do for ages. At the Made by Google 2024 event yesterday, Google introduced Gemini Live, a conversational mode that allows for natural back-and-forth voice interactions. Leveraging Gemini 1.5 Flash model, Gemini Live’s responses are almost real-time and the experience is very smooth. Plus, you can now choose from 10 distinct voices for Gemini. It is being rolled out to Gemini Advanced subscribers on Android.

Key Highlights:

  1. Dynamic conversations - Engage in free-flowing discussions with Gemini, brainstorm ideas, and even interrupt or pause the conversation and return to it later.

  2. Deeper app integrations - Gemini is integrated with more Google apps like Keep, Tasks, and YouTube Music. You can add recipe ingredients to your shopping list or create a playlist based on a simple voice request, all without switching between apps.

  3. Contextual help - Gemini is becoming more integrated with the Android operating system, offering context-aware assistance based on what's on your screen. You can now drag and drop generated images into other apps and utilize features like "Ask about this screen" for immediate support.

  4. Step towards Project Astra - At the last I/O event, Google teased its in-development Project Astra which can assist in real-time on video calls, completely changing the way we’ve been using AI chatbots. Gemini Live is a step towards using and adapting to technologies to Project Astra.

An AI agent that can learn from its mistakes and plan ahead like a human, that's Agent Q! Developed by researchers at MultiOn and Stanford University, Agent Q is a significant leap in autonomous web navigation. This AI agent uses a unique combination of search, self-critique, and learning from experience to complete complex tasks on websites.

Agent Q explores a website like a human would, trying different actions to see what works best. It then uses AI to critique its own actions and learn from its mistakes. This allows Agent Q to improve its performance over time, even without explicit instructions.

Key Highlights:

  1. Planning ahead - Agent Q uses Monte Carlo Tree Search (MCTS) to explore different paths and anticipate the outcomes of its actions. This allows it to make smarter decisions and avoid dead ends.

  2. Learning from mistakes - Agent Q uses AI feedback to evaluate its own performance and identify areas for improvement. This self-critique mechanism is crucial for learning in dynamic, complex, multi-step tasks.

  3. Improving with experience - Agent Q uses Direct Preference Optimization (DPO) to learn from its past experiences and refine its decision-making process, continuously improving its performance over time.

  4. Practical tests - Agent Q surpasses other AI agents in completing tasks, with a 340% improvement over Llama 3's baseline zero-shot performance. In tests, it exceeded average human performance on a simulated e-commerce website and achieved a high success rate on a real-world booking website.

You can apply for the waitlist here.

Quick Bites

  1. At the Made by Google 2024 event yesterday, Google debuted some really cool AI features in its new Pixel phones. The AI features are powered by Google’s multimodal model Gemini Nano which is small yet powerful to process AI on-device and deliver a great experience.

    • Add Me feature: Never be left out of a group photo again! The Add Me feature in Pixel uses AI to seamlessly insert you into a group picture even if you weren't originally in the shot. Just position yourself using the AR guidance, and Pixel will magically combine the images to create a perfect group photo with you included.

    • Pixel Screenshots: Search your saved screenshots using natural language.

    • Other features include Call Notes to transcribe and summarize your calls, Pixel Studio to generate images with on-device diffusion model, and new Magic Studio features.

    • New Pixel watch introduces a groundbreaking safety feature Loss of Pulse Detection that detects if someone has lost their pulse and automatically calls emergency services.

    • The new Pixel Buds Pro 2 are the first earbuds with Gemini Live that allows for hands-free, eyes-free conversations with Gemini.

  2. OpenAI has released SWE-bench Verified, a human-validated subset of SWE-bench, to accurately assess AI models’ software engineering capabilities. The team identified some SWE-bench tasks that were hard or impossible to solve, leading to SWE-bench underestimating AI models’ autonomous software engineering capabilities.

Tools of the Trade

  1. Neuralize: Speeds up on-device AI/ML development by providing a single interface for model optimization, benchmarking, and performance evaluation.

  2. STORM by Stanford: Creates Wikipedia-like articles by first conducting research using the web and generating an outline, then using that outline to write a full article with citations. It’s truly incredible, you’d definitely want to try.

  3. Charts by Notion: Quickly turn your Notion data into visual charts so you can monitor progress and spot trends without leaving your workspace. The charts are customizable, update automatically, and can be easily shared.

  4. Awesome LLM Apps: Build awesome LLM apps using RAG for interacting with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple texts. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. I look at AGI as a consolation prize that we got instead of being destroyed by the Sweet Meteor of Death. ~
    Bojan Tunguz

  2. It's maddening how we're bogged down in this idiotic left vs. right nonsense. Future generations with ASI will mock us for our stupidity. We're like medieval peasants. If we poured our resources into ASI, we'd have utopia in three years. But we waste time on dumb identity battles ~
    FLOWERS

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

Unwind AI - Twitter | LinkedIn | Instagram | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one (or 20) of your friends!

Reply

or to participate.