• unwind ai
  • Posts
  • Opensource Competitor of GPT-4 Vision 👁️

Opensource Competitor of GPT-4 Vision 👁️

PLUS: Meta's AI for Decoding Images from Brain Activity, How Transparent AI Models Are

Today’s top AI Highlights:

  1. Fuyu-8B by Adept: New Opensourced Multimodal AI Model

  2. Meta’s AI System for Real Time Decoding of Images from Brain Activity

  3. Stanford HAI’s Foundation Model Transparency Index 2023

  4. YouTube’s AI Tool to Sound like Popular Musicians

  5. Fastest Generative AI Text-to-Speech API

& so much more!

Read time: 3 mins

Latest Developments 🌍

Another Opensourced GPT-4 Vision 👁️

Adept has released Fuyu-8B, a small version of the multimodal model that powers Adept’s products, designed from the ground up for digital agents, and can get responses for large images in < 100 milliseconds!

Key Highlights:

  • Fuyu-8B boasts a simplified architecture, a vanilla decoder-only transformer with no specialized image encoder. Its unique design supports arbitrary image resolutions, UI-based QA, and fine-grained localization on screen images.

  • Fuyu-8B demonstrates commendable performance in standard image understanding benchmarks such as visual QA and natural image captioning, performing comparably with QWEN-VL and PALM-e-12B even after fewer parameters.

  • It comes equipped with an array of capabilities, including comprehensive chart, complex diagrams and document understanding. The internal models derived from Fuyu-8B offer advanced features like high-resolution OCR, precise localization of text and UI elements, and more.

Real Time Decoding of Images from Brain Activity 🧠

Meta has unveiled an AI system capable of real-time decoding of images from brain activity, utilizing high-resolution magnetoencephalography (MEG), a non-invasive neuroimaging technique in which thousands of brain activity measurements are taken per second.

Key Highlights:

  • The system comprises an image encoder, brain encoder, and image decoder that can reconstruct real-time images perceived by the brain at each instant.

  • By aligning MEG signals with self-supervised AI architectures like DINOv2, Meta's AI system unveils a striking similarity between the neural responses in artificial algorithms and the physical neurons in the brain when exposed to the same visual stimuli.

  • While MEG decoding exhibits some limitations compared to functional Magnetic Resonance Imaging (fMRI), it demonstrates a remarkable 7X improvement in image retrieval and excels at capturing high-level visual features in the brain's responses to images.

Transparency Index Reveals Gaps in AI Disclosure 🔍

Stanford HAI has released the Foundation Model Transparency Index, shedding light on the state of transparency within the AI industry. The index evaluated how forthcoming the creators of the 10 most popular AI models are.

Key Highlights:

  • Among the 10 models assessed, Meta’s Llama 2 secured the highest score, followed by BloomZ and GPT-4. The open models clearly outperformed others but even the top-performing model achieved only 54 out of 100.

  • Developers lack transparency regarding the resources required to build foundation models, specifically in terms of data, labor, and compute. They however score well on indicators related to user data protection, basic model details, model capabilities, and their limitations.

  • They can significantly improve transparency by adopting best practices from their competitors, especially in disclosing the limitations of their models and providing access to usage data.

Create Videos in Your Favorite Singer’s Voice 🎤

YouTube is planning to roll out a new AI tool that will allow creators to make videos using the voices of popular recording artists, but negotiations with major labels are slowing down the project's beta release.

While the label executives are keen on projecting a progressive image, top artists are hesitant to participate, concerned about potential misuse of their voices by unknown creators.

Tools of the Trade ⚒️

  • PlayHT 2.0 Turbo: The fastest generative AI text-to-speech API that generates speech in under 300milliseconds via network and < 100ms for on-premise solutions.

  • AnyClip: A visual intelligence platform that transforms traditional videos into intelligent content using AI, making it searchable, measurable, personalized, merchandised, and interactive for a variety of applications.

  • Impaction.ai: An analytics platform specializing in the in-depth analysis of subjective conversational data generated from AI-native products, along with an AI Copilot for intuitive search and analysis.

  • FlowRL: Real-time UI personalization that customizes the UI for each individual, automatically adapting and learning with each user interaction, leading to improved target metrics.

😍 Enjoying so far, TWEET NOW to share with your friends!

Hot Takes 🔥

  • Reading productivity books is the number one leading cause of procrastination. ~ Santiago

  • Tabular data is not going anywhere. Most industry is using it to get their daily work done, and will continue to do so for as long as it exists. Even AGI will use XGBoost. ~ Bojan Tunguz

  • Software Engineer -> AI Engineer This is the future more than most realize. ~ Logan

Meme of the Day 🤡

r/ProgrammerHumor - reallyScaryShitMan

That’s all for today!

See you tomorrow with more such AI-filled content. Don’t forget to subscribe and give your feedback below 👇

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!

Reply

or to participate.