• unwind ai
  • Posts
  • Video-to-Text is Here! 🎥📝

Video-to-Text is Here! 🎥📝

PLUS: ChatGPT-Powered Robot Becomes a Tour Guide, Shutterstock Library Meets Generative AI

Today’s top AI Highlights:

  1. Twelve Labs Introduces Video-to-Text Model and APIs

  2. Make Shutterstock’s Vast Library Your Own

  3. CNNs Match Vision Transformers at Scale

  4. ChatGPT to Make Chat (Ro)Bot

  5. Google Maps more like Google Search

& so much more!

Read time: 3 mins

Latest Developments 🌍

Pegasus-1 Transforms Video-Understanding 🎥

Twelve Labs introduces Pegasus-1, a cutting-edge video-language foundation model, alongside a suite of Video-to-Text APIs, capable of video summarizing, custom text generation, and video analysis. With 80B parameters, the model exhibits a significant 61% performance improvement, outperforming existing models.

Key Highlights:

  • Their novel 'Video First' approach focuses on efficient long-form video processing, multimodal understanding, and deep alignment between video-native embeddings and language models, a leap towards comprehensive understanding of video content.

  • Pegasus-1 integrates a video encoder, video-language alignment model, and an LLM decoder, and was trained on a vast 300 million diverse video-text pairs, with a specific emphasis on a high-quality fine-tuning dataset.

  • Contrary to existing solutions that either utilizes speech-to-text conversions or rely solely on visual frame data, Pegasus-1 integrates visual, audio, and speech information to generate more holistic text from videos.

Generative AI Meets Shutterstock’s Library

Shutterstock now lets you harness the power of AI on its extensive library of over 700 million images. Customize and refine their stock images with the new suite of AI-powered tools in unparalleled ease.

Key Highlights:

  • Shutterstock's AI tools offer six capabilities, including the Magic Brush for targeted modifications, Variations for generating alternate options, and the Smart Resize tool for effortless image dimension adjustments.

  • With the Background Remover feature, you can easily replace or eliminate backgrounds, while the Expand Image tool allows for a broader view of scenes behind central images, mimicking the effect of zooming out through a camera lens.

  • Shutterstock's AI Image Generator launched earlier in beta, will soon be updated integrating DALL.E-3, allowing you to create high-quality, ethically-sourced visuals in seconds by simply describing what you want. This content is ready for licensing and indemnifiable for commercial use.

CNNs Match Vision Transformers at Scale 👯‍♂️

Researchers at Google DeepMind did a study compares the performance of Convolutional Neural Networks (ConvNets) and Vision Transformers (ViTs) on large-scale datasets. The study reveals intriguing insights about the competitive capabilities of pre-trained NFNets in the field of computer vision.

Key Highlights:

  • There exists a log-log scaling law between held-out loss and compute budget, indicating that NFNets match the reported performance of Vision Transformers with similar computational resources, particularly on ImageNet.

  • Pre-trained NFNets are found to exhibit competitive performance with Vision Transformers on ImageNet, with the best-performing model achieving an impressive Top-1 accuracy of 90.4%.

  • Despite the differences in architecture, the study underlines the striking similarity in performance between pre-trained NFNets and Vision Transformers, challenging the prevalent notion that ViTs consistently outperform ConvNets.

ChatGPT to Make Chat (Ro)Bot 🤖

Boston Dynamics turned its four-legged robot, Spot, into a chatbot to serve as a talking tour guide in its facility, using ChatGPT along with VQA, speech-to-text API (OpenAI’s Whisper) and text-to-speech API (Eleven Labs) for interactive capabilities. In a video posted by Boston Dynamics, we can see the robot effortlessly engaging with audiences, giving witty and informative responses.

The LLM's ability to craft coherent and contextually relevant dialogues, along with its improvisational skills, enabled Spot to create an immersive and interactive tour experience. Despite occasional challenges such as latency and susceptibility to network disruptions, the overall integration of AI seamlessly amplified Spot's capabilities.

Google Maps more like Search 🛣️

Google Maps is getting more AI-driven enhancements, including immersive navigation experiences, more user-friendly driving directions and more intuitive, closely resembling the functionalities of Google Search, and letting you easily discover new places and experiences.

Key Highlights:

  • You can type specific questions into Maps, much in the way they do with Search, and get a list of results for nearby businesses or locations that match the query based on a real-time analysis of user photos.

  • Immersive View announced earlier, that offers a 3D view of locations along with additional information like local businesses, weather, and traffic is being rolled out. The augmented reality feature, "Lens in Maps," is designed to make searching for nearby businesses and locations more convenient and is expanding to 50 cities.

  • Google is promoting sustainable transportation options by encouraging users to choose transit or cycling. Google Maps will also provide real-time updates on the status and compatibility of electric vehicle (EV) charging stations.

Tools of the Trade ⚒️

  • Personalization by You.com: You.com has introduced Personalization with “Smart learn” that learns about your preferences and habits as interact more with it, providing increasingly better and tailored responses to your queries over time.

  • Fabric AI: All-in-one AI assistant for workspace that helps you organize, search, and extract insights from your files, notes, and cloud data, while also offering smart recommendations

  • Lexis + AI: AI legal assistant for conversational search, drafting, summarization, document analysis, and hallucination-free legal citations.

  • Spoke: AI-powered Priority Inbox for product builders to streamline their workflow by summarizing, prioritizing, and automating follow-ups within one centralized platform.

  • tl;dr: AI-powered Safari web extension to generate concise bullet points from long web articles, blog posts, or news stories, with multilinguistic support.

😍 Enjoying so far, TWEET NOW to share with your friends!

Hot Takes 🔥

  • Google Search won't die. Just like horses didn't disappear once we had cars. Just that the value of the traffic will decline. ~ Aravind Srinivas

  • Giving a talk to 80 high-performing innovation managers & everyone had tried ChatGPT… but only 2 had used GPT-4. I increasingly see this as the barrier to AI adoption. People use Bard or free ChatGPT & come away with a bit of a shrug. They don’t understand what they are missing ~ Ethan Mollick

Meme of the Day 🤡

How to design generative AI experiences to be truly helpful | by Tony Jin | UX Collective

That’s all for today!

See you tomorrow with more such AI-filled content. Don’t forget to subscribe and give your feedback below 👇

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!

Reply

or to participate.