xAI Releases Grok-2 in Beta

PLUS: Autonomous AI Scientist, Save 80% on LLM power bills

In partnership with

The fastest way to build AI apps

  • Writer Framework: build Python apps with drag-and-drop UI

  • API and SDKs to integrate into your codebase

  • Intuitive no-code tools for business users

Today’s top AI Highlights:

  1. Sakana AI's AI Scientist that automates scientific discovery, $15 a paper

  2. xAI releases Grok-2 and Grok-2 mini in beta

  3. Anthropic releases prompt caching for Claude 3.5 Sonnet and Haiku API

  4. MIT released an AI Risk Repository with 700+ AI risks

  5. Save up to 80% on LLM bills with 5 lines of code changes

& so much more!

Read time: 3 mins

Latest Developments

AI's ultimate promise lies in its ability to accelerate scientific research and Sakana AI’s new system is a step towards it. Sakana AI's AI Scientist automates the entire scientific research process using LLMs. This system generates research ideas, writes code, conducts experiments, analyzes results, and even writes the resulting scientific papers. Developed in collaboration with the University of Oxford and the University of British Columbia, 'The AI Scientist' even includes an AI-powered peer review process for iterative improvement.

Key Highlights:

  1. End-to-End Automation - 'The AI Scientist' handles the complete research lifecycle, from initial idea to finalized paper, streamlining the research process for increased efficiency.

  2. AI-Driven Peer Review - The integrated peer review system evaluates generated papers, providing constructive feedback for iterative improvements and maintaining research quality.

  3. Practical Demos - In its initial demonstration focusing on machine learning, the system has generated novel findings in areas like diffusion models and transformers, all at an estimated cost of $15 per paper.

  4. Open Source - All code and experimental results are available on GitHub for you to examine the system's workings and reproduce results.

After many ‘coming soon’, xAI has finally released beta versions of Grok-2 and Grok-2 mini. An earlier version of Grok-2 under the name "sus-column-r" was released into the LMSYS chatbot arena where it outperformed both Claude and GPT-4 in terms of its overall Elo score. The models are available for Premium subscribers on X. You can look forward to accessing these models through xAI's enterprise API later this month, featuring a new tech stack for low-latency access and enhanced security measures.

Key Highlights:

  1. Multimodal - Grok-2 can process both text and vision understanding, and integrates real-time information from X. Grok-2 mini is a small but capable model that balances between speed and answer quality.

  2. Performance - Both Grok-2 and Grok-2 mini perform competitively with frontier models, trailing a little behind GPT-4o, Claude 3.5 Sonnet, and Llama 3 405B in general knowledge, math, coding, and science benchmarks.

  3. Image Generation - xAI has collaborated with Black Forest Labs, to experiment with their FLUX.1 model and expand Grok’s capabilities on X to generate hyper-realistic images, and the results are mind-blowing!

  4. Availability - Grok-2 and Grok-2 mini are being rolled out on X. They will also be available via API later this month. xAI plans to release a preview of multimodal understanding in Grok on X and API.

Anthropic has released prompt caching in public beta for their Claude 3.5 Sonnet and Claude 3 Haiku models. This exciting new feature lets you store frequently used context, like long instructions or code, between API calls. This means you can give Claude more background and examples, all while drastically cutting API costs and latency, offering significant efficiency boosts.

Key Highlights:

  1. Cost and Latency Reductions - Prompt caching can reduce costs by up to 90% and latency by up to 85% for lengthy prompts, especially beneficial for tasks like multi-turn conversations and large document processing.

  2. Pricing Model - You're charged based on how many tokens you cache and how often you use them. Writing to the cache is slightly more expensive than standard input tokens, but reading from the cache is significantly cheaper.

  3. Future Support - While currently available for Claude 3.5 Sonnet and Claude 3 Haiku, support for Claude 3 Opus is on the horizon.

Quick Bites

  1. MIT has released an AI Risk Repository, a living database of over 700 AI risks, categorized by cause and domain, to help researchers, developers, and policymakers understand and address AI risks. The repository includes detailed classifications, taxonomies, and a regularly updated source of information to aid in research, audits, and policy development.

  1. Apple is developing a new home device with a robotic arm that moves a large screen, positioning it as a smart home command center and videoconferencing tool. The project is part of Apple’s broader push into robotics and is expected to launch by 2026 or 2027, with a target price of around $1,000.

  2. Consensus, an AI-powered search engine making scientific research more accessible, has raised $11.5M in Series A funding led by Union Square Ventures, with key investors including Nat Friedman and Daniel Gross.

  3. S&P Global and Accenture are teaming up to drive generative AI innovation in the financial services industry. They plan to roll out a genAI learning program for S&P Global’s 35,000 employees and develop AI for banks, insurers, and capital markets firms.

Tools of the Trade

  1. Empower Auto Fine-Tuning: Reduces LLM costs by up to 80% with just five lines of code. It routes tasks from expensive general-purpose LLMs to optimized SLMs, streamlining performance and cutting expenses.

  2. Taipy: A Python framework for building scalable data and AI web apps, optimized for performance and large datasets. It simplifies development from quick pilots to production-ready solutions.

  1. Village Labs: Automates tedious tasks like Slack trawling, status updates, and creating reports, so your team can focus on meaningful work. It integrates with all your tools to provide personalized, actionable insights and updates.

  2. Awesome LLM Apps: Build awesome LLM apps using RAG for interacting with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple texts. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. What few understand is that the main reason why open-source is lagging behind closed-source in some AI domains (ex: video or LLMs) is because you can't hide with open-source and it forces you to take more ethical decisions that can sometimes be detrimental to performance. It's much more sustainable and healthy though IMO.
    Long-term, open-source = safer, more ethical AI! ~
    Clement Delangue

  2. BTW the only people who haven't visited Stackoverflow in months are the ones who don't write that much code.
    Creators != Programmers ~
    Jaydeep Karale

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

Unwind AI - Twitter | LinkedIn | Instagram | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one (or 20) of your friends!

Reply

or to participate.