• unwind ai
  • Posts
  • Unified Framework for AI Agents & RAG

Unified Framework for AI Agents & RAG

PLUS: Ultra-scale Playbook by Hugging Face, Train reasoning model with 5GB RAM

Today’s top AI Highlights:

  1. Open-source agentic framework with standardized tool cards, Planner, and Executor

  2. Workflows, agents, RAG, integrations, and evals - All in one framework

  3. The Ultra-Scale Playbook by Hugging Face

  4. Train your own DeepSeek R1-like reasoning model with just 5GB RAM

  5. Connect code snippets and APIs in a visual drag-and-drop UI

& so much more!

Read time: 3 mins

AI Tutorials

Finding the perfect property involves sifting through countless listings across multiple websites, analyzing location trends, and making informed investment decisions. For developers and real estate professionals, automating this process can save hours of manual work while providing deeper market insights.

In this tutorial, we'll build an AI Real Estate Agent that automates property search and market analysis. It helps users find properties matching their criteria while providing detailed location trends and investment recommendations. This agent streamlines the property search process by combining data from multiple real estate websites and offering intelligent analysis.

Tech Stack:

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

OctoTools by Stanford University is a training-free, user-friendly, and easily extensible open-source agentic framework. This new system allows LLMs to seamlessly integrate and utilize external tools to extend their capabilities.

What sets it apart from other frameworks is that it uses a unique "tool card" system that wraps external tools, a planner module for high-level and step-by-step execution, and an executor module to generate and apply tool-based actions. OctoTools also outperforms AutoGen, GPT-Functions, and LangChain by up to 10.6% when given the same set of tools.

Key Highlights:

  1. Plug-and-Play Tools - OctoTools lets you enhance LLMs with external tools (like search engines, calculators, or custom APIs). The "tool card" system provides a standardized interface, so integrating a new tool is as simple as defining its input/output and usage guidelines.

  2. Separate Planner and Executor - The framework divides reasoning into distinct planning and execution phases. This separation improves transparency, making it easier to debug and control the LLM's behavior, which is critical when building robust applications.

  3. Auto Tool Selection - OctoTools includes an algorithm that automatically determines the optimal subset of tools for a given task. This eliminates the need for manual configuration, leading to better performance and more efficient resource utilization.

  4. Ready-to-Use and Extensible - OctoTools comes with a pre-built set of tools for common tasks (search, image analysis, calculations, etc.), and detailed instructions for running benchmarks and customizing, allowing you to easily integrate the platform into your existing projects with a simple Conda environment setup.

Mastra is a new TypeScript framework for building AI applications that handles all the complex primitives you need - from workflows and agents to RAG and integrations. It provides a unified interface to work with any LLM provider through the Vercel AI SDK, letting you switch between models by changing a single line of code.

At its core, it provides workflows for chaining LLM operations, agents with memory and tool access, and RAG capabilities for knowledge retrieval. The framework runs locally during development and can be deployed to any serverless cloud. You can use Mastra to create everything from simple chatbots to complex AI systems that combine multiple agents, tools, and data sources.

Key Highlights:

  1. Agents with Memory and Tools - Build agents that can remember past interactions (using recent message history, semantic search, and a unique working memory feature) and execute custom functions (tools). This lets your agents interact with external systems and perform real-world actions.

  2. Deterministic Workflow Engine - Create graph-based workflows for precise control over LLM calls. Define steps, chain them together (with branching and merging), and even pause/resume execution. This is crucial for complex tasks requiring specific sequences of operations.

  3. Built-in RAG Capabilities - Easily implement Retrieval-Augmented Generation. Mastra provides APIs for processing documents, creating embeddings, storing them in vector databases (multiple options supported), and retrieving relevant information to ground LLM responses in your data.

  4. Local Development Environment - Develop and test your agents locally with Mastra's built-in dev environment. Chat with your agents, inspect their state and memory, and iterate quickly without needing to deploy.

  5. Deployment Options - Deploy your agents and workflows within existing React, Next.js, or Node.js applications, or package them as standalone endpoints. Mastra supports serverless deployment to platforms like Vercel, Cloudflare Workers, and Netlify, using Hono.

Quick Bites

Hugging Face has released an extensive open-source guide, "The Ultra-Scale Playbook," detailing how to train massive language models on large GPU clusters. The guide covers critical techniques like data, tensor, pipeline, context, and expert parallelism, as well as ZeRO, with code examples and benchmarks from over 4100 experiments. This is a huge, in-depth contribution for teams looking to master distributed training for cutting-edge LLMs.

Unsloth AI has released a breakthrough implementation of GRPO (Gradient-based Reinforcement learning with Prompt Optimization) that drastically reduces the VRAM needed for training long-context reasoning models (like DeepSeek R1). Using Unsloth, you can now train your own reasoning model with just 5GB VRAM for Qwen2.5-1.5B with no accuracy loss. Here’s the free GRPO guide for Llama 3.1, Phi 4, and Qwen 2.5.

South Korean AI startup DeepAuto.ai has cracked a persistent GPU memory barrier - enabling LLMs to process a huge 3 million tokens on a single 48GB GPU, while running nearly 19 times faster than standard approaches. Their new framework InfiniteHiP achieves this through a novel hierarchical token pruning algorithm and clever memory management, allowing LLMs to handle extremely long contexts without additional training or quality loss.

xAI has made its new model, Grok 3, available for free. While usage is limited for non-paying users (with caps on prompts, image generation, and analysis), it's a chance to test drive features like DeepSearch and Think mode, powered by that massive 200,000 NVIDIA GPU cluster. It is free for a limited time. You can use it on X or download the Grok app.

Meta AI has released MLGym, a new framework and benchmark (MLGym-Bench) for training and evaluating AI research agents. This open-source "Gym environment" includes 13 different open-ended research tasks across computer vision, NLP, reinforcement learning, and game theory domains, to test how well AI research agents can perform real research skills like generating hypotheses, implementing methods, and analyzing results. Meta has open-sourced the framework.

Tools of the Trade

  1. OOMOL: Desktop-based AI workflow IDE that lets you visually connect code snippets and API services through drag-and-drop interactions, with built-in support for Python and JavaScript/Node.js running in containerized environments.

  2. llmcat: Command-line tool that automates copying code from files and directories into LLMs, formatting it as Markdown and respecting .gitignore rules. It supports both direct copying and interactive fuzzy search for selecting multiple files.

  3. dstack: Open-source container orchestration platform for AI workloads that simplifies development, training, and deployment across cloud and on-premises environments. It’s a lightweight alternative to Kubernetes and Slurm, with built-in support for various AI accelerators.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. If the hypothesis is that an AGI-level AI can accomplish a meaningful part of economically valuable work on its own without needing human expertise, it seems extremely unlikely that any such system would be released open weights. The benefits of controlling use would be too high. ~
    Ethan Mollick

  2. Building AI agents is 5% AI and 100% software engineering ~
    Sri Laasya Nutheti

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.