- unwind ai
- Posts
- Build Your Own SWE Agent
Build Your Own SWE Agent
PLUS: LLM Apps with YAML files, Run Llama 3.2 vision locally with Ollama
Today’s top AI Highlights:
New framework to build LLM apps like Docker containers
Build your own coding agents with any agentic framework & LLMs running 100% locally
Build voice apps with Hume’s EVI 2 API integrated with Claude 3.5 and external tools
Ollama’s update lets you run Llama 3.2 vision models 100% locally
Stripped down, stable version of Firecrawl optimized for self-hosting
& so much more!
Read time: 3 mins
AI Tutorials
We’re always looking for ways to automate complex workflows. Building tools that can search, synthesize, and summarize information is a key part of this, especially when dealing with ever-changing data like news.
For this tutorial, we’ll create a multi-agent AI news assistant using OpenAI’s Swarm framework along with Llama 3.2. You’ll be able to run everything locally, using multiple agents to break down the task into manageable, specialized roles—all without cost.
We will use:
Swarm to manage the interactions between agents,
DuckDuckGo for real-time news search, and
Llama 3.2 for processing and summarizing news.
Each agent will handle a specific part of the workflow, resulting in a modular and flexible app that’s easy to adapt or expand.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Building and deploying LLM apps just got easier with GenSphere, a new declarative framework that lets you define LLM apps using YAML files. Think of it as Docker, but specifically for LLMs. The framework breaks down any LLM app into graph structures where each node is either a function call, an LLM API call, or another graph itself, giving you precise control over every component.
You can push your app to GenSphere's open platform without registration, making your work publicly accessible with a generated ID. Other developers can then pull these components and integrate them directly into their own workflows.
Key Highlights:
Direct Component Access - Each part of your app - from API calls to function execution - is accessible and modifiable. You can inspect the exact prompt being sent, the tools being used, and the processing logic at each step. This granular control helps when debugging issues.
Practical Modularity - Build complex applications by referencing other YAML files as nodes in your workflow. For example, you can create a base prompt engineering component, push it to the platform, and reuse it across projects.
Framework Integration - Continue using your preferred tools from LangChain and Composio while gaining better visibility into their operation. The framework doesn't force you to abandon your existing code - instead, it helps you organize and control it better.
Structured Data Handling - Using Pydantic models for schema definitions helps catch data structure issues early. When your LLM call is supposed to return a specific data structure, the framework ensures compliance, reducing runtime errors and making your applications more reliable.
Quick Start - Install with
pip install gensphere
, set your OpenAI API key, and start by creating a simple YAML workflow file with a single LLM service node. Here’s a quick tutorial that has everything including YAML syntax, visualization, YAML parsing, nesting workflows, etc.
Ready to Level up your work with AI?
HubSpot’s free guide to using ChatGPT at work is your new cheat code to go from working hard to hardly working
HubSpot’s guide will teach you:
How to prompt like a pro
How to integrate AI in your personal workflow
Over 100+ useful prompt ideas
All in order to help you unleash the power of AI for a more efficient, impactful professional life.
AI software engineer Devin showed us what AI coding agents can do, but how about building your own custom coding agents with any agentic framework & LLMs of your choice? SWE-Kit is a powerful framework to build SE agents locally with a comprehensive set of tools including code analysis, Git integration, and shell operations.
You can integrate these agents with LangChain, LlamaIndex, CrewAI, or Autogen, maintaining full control over your code and data. The toolkit comes with pre-built templates for PR reviews and codebase Q&A, while allowing you to customize every component for your specific workflows and coding standards.
Key Highlights:
Ready-made Tools - It handles the complexities of file system interaction, code analysis, shell command execution, and even browser automation, so you can focus on your agent's core logic. You get access to functionalities like file manipulation, code indexing, Git interactions, and more.
Framework Freedom - It works out of the box with popular frameworks like LangChain, CrewAI, LlamaIndex, and Autogen with ready-to-use templates. All tools are framework-agnostic and can be combined based on project requirements.
Security-Focused - Run agents in Docker containers, E2B environments, or locally, with all tools working offline. Your code stays within your infrastructure while maintaining deployment flexibility to FlyIO or AWS Lambda when needed.
Production-Ready Components - Built-in RAG for documentation, SQL query execution for database work, vector stores for image handling, and extensive file/shell operations. Each tool is optimized for real coding tasks, not just demos, and can be extended for team-specific needs.
Quick Bites
Black Forest Labs has released FLUX1.1 [pro] Ultra and Raw Modes, now offering 4x higher image resolutions (up to 4MP) with ultra-fast generation speed of 10 seconds per sample. Ultra mode keeps high prompt adherence even at larger resolutions, while raw mode delivers a natural, candid photography feel – all available through their API at $0.06 per image.
Ultra and Raw Mode
LlamaIndex has released chat-ui, a React component library to build chat interfaces for LLM applications. This library offers pre-built fully-customizable components such as message bubbles and input fields. It also integrates with LLM backends like Vercel AI, and includes features like code/LaTeX styling with highlight.js and KaTeX, and PDF viewer integration.
Hume AI released EVI2, a foundational voice-to-voice model that merges speech and text processing into a single, powerful system. The model generates speech in an extremely human-like tone with a very low latency of 500ms. Hume AI has also launched an app that combines EVI2 with other LLMs and tools like Claude 3.5 models and web search for quick, better, and updated responses. You can try the app here.
EVI2 is available via Hume API where EVI 2 listens, analyzes, and responds in real-time with emotional intelligence. You can also customize its voice characteristics such as gender, nasality, and pitch.
Pydantic has released beta version 2.10.0b1 that now supports on-the-fly validation of streamed responses from APIs like OpenAI’s, thanks to its new "allow partial" feature. The feature makes it more easy to build applications that need to ensure the LLM’s outputs match expected schemas while streaming.
Ollama 0.4 is here and it now supports Llama 3.2 vision models (both 11B and 90B). Download the update and run ollama run llama3.2-vision
in your Terminal. To run the larger 90B model, use ollama run llama3.2-vision:90b
.
Tools of the Trade
Aide: AI-powered fork of VS Code that integrates tightly with an agentic framework to proactively help you with code fixes and multi-file edits. It offers context-aware code suggestions, intelligent navigation, inline editing, and real-time collaboration with AI. It has all the features there in Cursor/Copilot, with complete data privacy and plug-and-play LLM integration.
Firecrawl Simple: A stripped-down and stable version of Firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are completely removed.
AgentServe: A Python framework that wraps AI agents in a REST API, allowing them to be deployed and scaled as web services. It has task queuing options and standardized communication endpoints while supporting multiple agent frameworks like LangChain and LlamaIndex.
Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.
Hot Takes
“What are the highest paying AI engineering jobs next year?”
A friend asks
My response:
> Tier 1: Massive training and inference orchestration
> Tier 2: Production grade memory and multi agent
> Tier 3: Implementing RAG for laggardsPeople actually think an LLM can generate a trading bot that will make them millions.
The number of people who ask ChatLLM to develop one and get annoyed and email us, is pretty astounding 🤯No wonder so many people are impatient for super intelligence! They believe they can become rich over nite 🤣🤣 ~
Bindu Reddy
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply