• unwind ai
  • Posts
  • OpenAI's New AI Model that can Think & Reason

OpenAI's New AI Model that can Think & Reason

PLUS: OpenAI o1 in GitHub Copilot, Google's DataGemma with RAG and RIG

Today’s top AI Highlights:

  1. OpenAI releases Strawberry models, officially o1 model series

  2. Google uses both RAG and RIG to ground LLM responses

  3. Code with OpenAI’s Strawberry models as your Copilot

  4. Turn your documents into an AI-generated audio discussion with Google’s NotebookLM

  5. IDE for AI-powered large-scale refactoring for massive codebases

& so much more!

Read time: 3 mins

Latest Developments

The rumors were true! OpenAI has finally released its highly-anticipated Strawberry models, officially o1 model series. These models, o1-preview and o1-mini, are designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models in science, coding, and math.

o1-mini is a cost-efficient reasoning model that nearly matches the o1’s performance in math and coding, but is much faster and cheaper. It is perfect for applications that require reasoning without broad world knowledge.

Key Highlights:

  1. Reasoning - o1 models use chain-of-thought reasoning, allowing them to think through problems step by step and solve complex problems. This however comes at the cost of increased latency (time to first token) and a large number of output tokens.

  2. Features - The models have 128K context and an October 2023 knowledge cutoff. At the moment, o1 models cannot browse the web, and process files and images like GPT-4o. These will be rolled out as updates soon.

  3. Use Cases - o1 models are perfect for tasks requiring multi-step workflows like debugging, coding, data analysis, and high-level math problem-solving, especially suited for developers handling complex logic and computations.

  4. Performance - o1-mini scores 1650 Elo on Codeforces and 92.4% accuracy on HumanEval, outperforming GPT-4o and Claude 3.5 Sonnet in coding. o1-preview also excels in areas like cybersecurity CTFs and MATH-500.

  5. Availability - ChatGPT Plus and Team users can access o1 models in ChatGPT with weekly rate limits of 30 messages for o1-preview and 50 for o1-mini. Developers who qualify for API usage tier 5 can access the API with a rate limit of 20 RPM.

  6. Pricing - o1-preview is 3x more expensive than GPT-4o, costing $15 per million input tokens and $60 per million output tokens. o1-mini is 80% cheaper than o1-preview.

Hallucinations in LLMs are a major roadblock in their adoption. While RAG helps ground AI’s responses, its effectiveness hinges on the quality of the retrieval process. Google has released DataGemma, the first set of open models that tackle this hallucination issue in LLMs. These models leverage Google's Data Commons, a massive repository of over 240 billion real-world data points, using RAG as well as RIG, to ground LLM responses in verifiable information.

Key Highlights:

  1. Dual Grounding Techniques - DataGemma employs two grounding techniques:

    • Retrieval-augmented generation (RAG) preemptively supplies LLMs with context from Data Commons before response generation.

    • Retrieval-interleaved generation (RIG) prompts LLMs to cross-reference their outputs with Data Commons.

    • Why both? While RAG offers richer context, it can modify user prompts and relies heavily on good query formulation. RIG, on the other hand, works in all contexts without altering the user's intent, but the LLM doesn't retain learned information for future use.

  2. Simplified Data Integration - DataGemma utilizes Data Commons' existing natural language interface as its API. You can seamlessly integrate real-world data into your apps using natural language queries.

  3. Open Source - DataGemma models are open source and available on Hugging Face and Kaggle, along with accompanying quickstart notebooks for both RIG and RAG implementations.

Quick Bites

GitHub has started testing OpenAI’s new o1 model in GitHub Copilot and the initial testing shows promising results in optimizing algorithms and fixing performance bugs. o1-preview outperforms GPT-4o by providing more deliberate, structured responses, helping you quickly pinpoint issues and implement solutions.

Google is rolling out its Gemini voice assistant Google Live to more Android users for free. Keep an eye out for Gemini Live in the Gemini app.

OpenAI is reportedly in talks to raise $6.5 billion at a $150 billion pre-money valuation. The funding round will be led by Thrive Capital and other reported investors include Microsoft, Apple, and Nvidia.

Tools of the Trade

  1. Audio Overview in NotebookLM: Turn your documents into an engaging audio discussion between two AI hosts.

    • Just go to NotebookLM,

    • create a new Notebook,

    • upload at least one file, and

    • click on Generate button.

    Following is an audio conversation we generated with OpenAI’s o1 model’s technical report.

  1. datavisualization_langgraph: AI agent that lets you ask questions about a dataset and gives insightful visual representations. You can upload a SQLite database or CSV file and ask questions in natural language. The agent generates a SQL query, executes it on the database, and formats the results into a visual representation.

  2. Codegen: AI-powered IDE that automates large-scale code refactoring and analysis safely and efficiently across massive codebases. The AI assistant helps with code transformations, visualization, and maintaining code quality.

  3. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. Nvidia is starting to lose share to AI chip startups for the first time. You can hear it in the hallways of every AI conference in the past few months. ~
    James Wang

  2. Once again, an AI system is not "thinking", it's "processing", "running predictions",... just like Google or computers do.
    Giving the false impression that technology systems are human is just cheap snake oil and marketing to fool you into thinking it's more clever than it is. ~
    Clement Delangue

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

Unwind AI - Twitter | LinkedIn | Instagram | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one (or 20) of your friends!

Reply

or to participate.