• unwind ai
  • Posts
  • Chat with YouTube Videos 🙋‍♂️

Chat with YouTube Videos 🙋‍♂️

PLUS: 8x Faster Llama 2 on CPUs, Long Video Understanding with Google’s LLM

Today’s top AI Highlights:

  1. Bard can Understand “YouTube” Videos

  2. 8x Faster Llama 2 on CPUs

  3. Meta’s Benchmark for General AI Assistants

  4. Google Scales Multimodal Understanding to Long Videos

  5. Grammarly within ChatGPT

& so much more!

Read time: 3 mins

Latest Developments 🌍

Interact with YouTube Videos with Google’s Bard 🏞️

Google's latest update to the Bard AI chatbot marks a notable advancement in AI-user interaction, especially with YouTube content. The update lets you ask detailed questions about the content within videos to which the chatbot can give specific answers. Originally, Bard's YouTube Extension enabled users to find specific types of videos, such as comedy clips.

For example, if you are curious about the ingredients in a recipe video or about locations in travel videos or other specifics while they watch, Bard can provide specific answers, creating a more engaging and informative experience and satisfying your natural curiosity.

Optimizing Llama 2 for CPU Performance🚊

From a time when proprietary models like GPT-3 dominated the field, we now witness a surge in high-quality, open-source LLMs such as Meta’s Llama 2. Neural Magic’s latest research integrates sparse fine-tuning with DeepSparse technology, bringing significant efficiency in Llama 2 on standard CPU infrastructure, opening new horizons for enterprise-level AI applications.

Key Highlights:

  1. DeepSparse now supports accelerated inference of sparse-quantized Llama 2 models, offering 6-8x faster inference speeds over the baseline at 60-80% sparsity.

  2. Neural Magic has introduced comprehensive quantization strategies, addressing challenges in quantizing Llama 2's activations and weights. These strategies, available through SparseML and SparseZoo, facilitate the creation of optimized Llama 2 models for improved operational efficiency in enterprise environments.

  3. Llama 2 model, after being fine-tuned and processed through SparseGPT for pruning and quantization, maintained full accuracy even at 60% sparsity. It showed significantly improved performance on challenging tasks like the GSM8k dataset.

How Intelligent Must AI Assistant Actually Be? 🤓

Researchers at Meta and Hugging Face are pushing the boundaries of what AI systems can achieve. Dubbed GAIA (General AI Assistants), this benchmark presents a unique set of challenges designed to test AI systems on tasks that are straightforward for humans but complex for AI. Unlike previous benchmarks, GAIA focuses on real-world applicability and multifaceted problem-solving abilities.

Key Highlights:

  1. GAIA tests AI systems on their ability to handle real-world questions that require fundamental abilities such as reasoning, multi-modality handling, web browsing, and tool-use proficiency. Even after being conceptually simple for humans, there existed a performance disparity where human respondents achieve 92% success compared to 15% for GPT-4 equipped with plugins.

  2. The questions in GAIA are created and validated by human annotators to ensure they reflect realistic AI assistant use cases and have a single correct answer. This meticulous process involves checking against web sources, ensuring the questions remain unambiguous and applicable over time.

  3. GAIA not only assesses current AI capabilities but also indicates the direction for future AI systems, highlighting the need for full automation and integration of multiple capabilities. It also recognizes its limitations, such as reliance on web sources and lack of linguistic diversity.

Google’s Innovation in Long-form Video Understanding ⏳

Mirroring human capacity to assimilate varied inputs like audio, video, and text is complex, particularly when harmonizing these modalities. Audio and video may align well, but integrating them with text is more intricate. The overwhelming data volume in video and audio, compared to text, often requires significant compression, a challenge that grows with longer video inputs. Google has introduced Mirasol 3B that expertly balances the dense data of audio and video with the intricate context of text.

Key Highlights:

  1. The model separates multimodal processing into distinct autoregressive models for each modality, effectively managing longer videos (up to 512 frames) and significantly reducing the parameter size to 3 billion, compared to larger predecessors.

  2. Mirasol3B integrates an autoregressive model with a 'Combiner' module for both time-aligned and unaligned modalities, facilitating the processing of extensive video/audio inputs while preserving temporal information and reducing data dimensionality.

  3. Demonstrating superior performance over existing models, Mirasol3B excels in video question answering and audio-video-text benchmarks. It is particularly effective in long video inputs and open-ended text generation tasks, showcasing enhanced capability in multimodal understanding and analysis.

Tools of the Trade ⚒️

  • editGPT: Proofread, edit and track changes to your content, emulating a Grammarly-like interface.

  • Lifepaths: An app that takes in your LinkedIn profile and interests to show you a map of possible future life paths.

  • Magnific AI: Advanced image upscaling and enhancement tool that employs deep learning algorithms to significantly improve the quality, resolution, and sharpness of images.

  • Defog.ai: Specializes in deploying fine-tuned LLMs for enterprise analytics, particularly in SQL, Python, and R. It offers AI assistants and agents that are custom-tailored for specific business needs, allowing for efficient data analysis.

😍 Enjoying so far, TWEET NOW to share with your friends!

Hot Takes 🔥

  1. Definitive proof of AGI - When an AI can get humans to agree and align with each other We would finally have world peace ~ Bindu Reddy

  2. AI is a wonderful tool for the betterment of humanity; AGI is a potential successor species. ~ David Sacks

  3. every civilization is “postscarcity” on some things but still feels the pain of want for others sugar and exotic fruits were luxuries in medieval England a house in a major city might in ours timeshares of the godhead at its peak ability in post AGI world ~ roon

Meme of the Day 🤡

Image

That’s all for today!

See you tomorrow with more such AI-filled content. Don’t forget to subscribe and give your feedback below 👇

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!

Reply

or to participate.