unwind ai
Posts
Mistral AI Releases Opensource Multimodal Model

Mistral AI Releases Opensource Multimodal Model

PLUS: AI Agent for literature review, Adobe's genAI model for video editing

Shubham Saboo & Gargi Gupta
September 12, 2024

Today’s top AI Highlights:

Mistral AI debuts its multimodal AI model Pixtral 12B
AI Agents now conduct scientific literature reviews better than PhDs
Adobe brings generative AI to video editing
LMSYS Chatbot Arena’s new feature to see how LLM’s writing style changes ranks
Opensource tool for instant AI voice cloning in 8 languages

& so much more!

Read time: 3 mins

Latest Developments

Mistral’s Multimodal AI competes with Claude and Gemini 1.5

Image Source

Mistral AI has released its first-ever multimodal model Pixtral 12B that can process both text and images. It is built upon their existing Nemo 12B language model. This 24GB model is freely available for download and fine-tuning under the permissive Apache 2.0 license. The model boasts a .

Key Highlights:

Versatile Image Handling - Pixtral 12B natively supports images of arbitrary sizes and resolutions, processing them quickly regardless of dimensions. It excels at handling both small images and complex, high-resolution images, including those requiring OCR.
Long Context Capabilities - The model can process an arbitrary number of images within a single prompt. It also boasts a 128k context window to take in substantial text and visual information.
Strong Performance - Benchmark results show Pixtral 12B outperforms opensource models like LLaVA and Phi-3 vision on key tasks, including MMMU and MathVista) and multimodal question answering (ChartQA, DocVQA, VQAv2). It also strongly competes with closed models Claude 3 Haiku and Gemini 1.5 8B.
Open Source - The model can be easily downloaded via GitHub and Hugging Face under the Apache 2.0 license. Pixtral 12B will soon be accessible through Mistral’s Le Chat and Le Platforme.

PhD-Level Biologists Are Outperformed by AI Agent 🔬

The first AI agent that conducts entire scientific literature reviews on its own! Non-profit AI research organization FutureHouse has introduced PaperQA2, an AI agent that can conduct scientific literature reviews autonomously. It is the first AI agent that surpasses PhD and postdoc-level biology researchers in various literature research tasks, as measured by objective benchmarks and assessments by human experts. The code for PaperQA2 is open-sourced, along with a detailed research paper outlining its development.

Key Highlights:

Superhuman Performance - PaperQA2 exceeds human performance in answering highly specific scientific questions and summarizing relevant literature, providing cited, factually grounded answers more accurate than those provided by experienced biologists.
WikiCrow for Summarization - WikiCrow, an AI agent based on PaperQA2, generates Wikipedia-style articles that are significantly more accurate than human-written Wikipedia articles. FutureHouse is using WikiCrow to write updated summaries for all human genes.
Contradiction Detection - PaperQA2 can systematically identify contradictions within scientific literature and highlight inconsistencies, prompting further investigation.
Open Source - FutureHouse has open-sourced the code for PaperQA2. Use the building blocks to create custom applications, integrate PaperQA2's search, summarization, and contradiction detection capabilities into your own workflows or use it as inspiration for your next AI project.

Quick Bites

Adobe is introducing the Firefly Video Model, bringing generative AI to video editing, which will be available in beta later this year. The model offers tools to help editors generate b-roll, fill gaps, and create visual elements with simple text prompts, speeding up the creative process.

Prompt: Drone shot going between the trees of a snowy forest at sunset golden hour. The lighting is cinematic and gorgeous and soft and sun-kissed, with golden backlight and dreamy bokeh and lens flares. The color grade is cinematic and magical.

LMSYS Chatbot Arena now features a “Style Control” option on the leaderboard so you can see how LLM rankings change when writing style is accounted for. Writing style, like response length and formatting, affects how models are ranked by making some answers appear more detailed or polished.

Jina AI has unveiled two small language models specifically trained to generate clean markdown directly from noisy raw HTML. Both models are multilingual and support a context length of up to 256K tokens. Despite their compact size, these models achieve state-of-the-art performance on this HTML2Markdown task.

Tools of the Trade

Fish Speech 1.4: Opensource tool that offers fast text-to-speech, instant voice cloning, and supports eight languages. You can self-host or use their cloud service with simple pricing.
TuneLlama: Fine-tune Llama 3.1 models easily. Just upload your data, choose between 8B or 70B models, and hit fine-tune. It lets you download the results in QLoRA or GGUF format.
InteractiveDemo.ai: Create interactive screen demos with AI. It extracts keyframes from videos and generates popovers with information. It also includes features like intelligent zoom, click-to-pause interaction, and animated elements for engaging presentations.
Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

I can’t wait until 2028 when AI has gone mainstream and the intrigue around the GPT-7 model release has reached such unbearable levels across the country that in addition to Jimmy Apples and strawberry man we also have Johnny Watermelon and Mr. Grape and each has the influence of a minor state ~
James Campbell
If you feel you are 10x better developer since you started using AI, you were really bad to begin with. ~
Santiago

Meme of the Day

Founder Mode is giving your gf a Calendly link
— Jason (@mytechceoo)
5:04 PM • Sep 10, 2024

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

Unwind AI - Twitter | LinkedIn | Instagram | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one (or 20) of your friends!

Reply

or to participate.