• unwind ai
  • Posts
  • AI Agent Surfs the Web like Humans

AI Agent Surfs the Web like Humans

PLUS: Qwen model with o1-like reasoning, Live AI software engineering arena

Today’s top AI Highlights:

  1. Build RAG systems that work entirely offline without heavy dependencies

  2. AI agent navigates web UI better and faster than Claude Computer Use

  3. Opensource vision language model with OpenAI o1-like reasoning

  4. Watch AI software engineering battle live with LMArena’s new RepoChat

  5. Build LLM apps with RAG, AI agents, Code Interpreter, and more using Qwen models (100% free and local)

& so much more!

Read time: 3 mins

AI Tutorials

Working with multiple LLMs simultaneously can be incredibly useful for comparing their strengths, weaknesses, and response styles. Setting up an app that allows direct comparison between top models would be great for understanding LLM behaviors and selecting the right model for specific tasks.

Does it sound complex to build your own chat playground with multiple LLMs? It’s really not. Just 20 lines of Python code and it’s done!

Let’s build this Multi-LLM Chat Playground that lets you interact with three popular models—GPT-4o, Claude Sonnet 3.5, and Cohere Command R Plus—all within a single app. You can swap these with any other LLMs of your choice too. With a few clicks, you can view responses from each model in a parallel layout for easy comparison.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Here’s a new Python toolkit, RAGLite, for building RAG systems that uniquely supports both PostgreSQL and SQLite databases, offering you options for managing your data, regardless of project scale. What sets RAGLite apart is its lightweight architecture, eliminating heavy dependencies like PyTorch or LangChain, resulting in faster performance and reduced complexity. It also seamlessly integrates with various LLM providers through LiteLLM, including local llama-cpp-python models, giving you the freedom to choose the best tools for your needs.

Key Highlights:

  1. Database Integration - Native support for both PostgreSQL and SQLite with built-in vector search capabilities (pgvector, sqlite-vec1) - developers can start with SQLite for local development and seamlessly transition to PostgreSQL for production, using the same codebase and API calls

  2. Performance Optimization - Implements semantic chunking through binary integer programming and includes a query adapter that auto-tunes based on usage patterns - this means better search results without manual parameter tuning, reducing the development time needed for fine-tuning retrieval performance

  3. Document Processing Pipeline - Ready-to-use PDF to Markdown conversion pipeline with optional Pandoc integration for additional formats, plus built-in evaluation tools through Ragas integration - developers can skip building custom document processing pipelines and focus on their application logic

  4. Deployment Flexibility - Compatible with local LLMs through llama-cpp-python and cloud providers through LiteLLM, with hardware acceleration support - enables developers to prototype locally with smaller models and scale to production with larger models without code changes, while maintaining optimal performance on their hardware

A new AI agent that can autonomously surf the web better than Claude Computer Use. Paris-based AI startup H launched Studio, a platform to effortlessly create production-ready and robust automations at scale. Accompanying this is Runner H, their flagship web automation AI agent that can autonomously interact with web UI and complete tasks with simple natural language commands.

What sets Runner H apart are its specialized in-house models, designed to be both smaller and more cost-effective than generalist models while still delivering superior performance, particularly in UI interaction and localization. The agent outperforms larger models in web automation tasks, scoring 67% on WebVoyager compared to Anthropic Computer Use's 52%.

Key Highlights:

  1. Autonomous Navigation - Runner H independently handles multi-step web processes by understanding context and planning interactions across different pages and UI states. When interfaces change or elements move, Runner H adapts in real-time by visually re-identifying targets and adjusting its interaction patterns. Here are some really cool demos you’d want to check out.

  2. Visual Understanding & Interaction - Runner H uses its specialized vision model (H-VLM) to actively scan and interpret web interfaces, identifying clickable elements and form fields without relying on static selectors. The agent processes instructions like "click the submit button" or "fill out the registration form" and translates them into precise mouse and keyboard inputs.

  3. Direct API Access - H provides API access to Runner H, allowing you to integrate it directly into your existing development tools and pipelines. This means you can trigger automations programmatically, embed them within your CI/CD processes, or build apps using Runner H's capabilities.

  4. Automation Design in Studio - The Studio platform offers a visual interface for building, testing, and debugging your automation workflows. You can create complex multi-step processes by combining Runner H commands, review the agent's execution in real-time, and edit or refine actions as needed.

  5. Availability - Runner H and the Studio are currently available through a private beta program to gather feedback and refine the platform. You can join the waitlist from here.

Quick Bites

Chinese researchers have open-sourced vision language model with OpenAI o1'-like reasoning capabilties. Built on Llama 3.2 Vision, LLaVa-o1 tackles complex visual reasoning tasks by breaking down problems into stages: summarizing, describing the image, reasoning, and concluding—trained on a custom 100k dataset and using a novel stage-level beam search for better inference. It also surpasses the performance of larger and even closed-source models, such as Gemini-1.5-pro and GPT-4o-mini.

Alibaba Cloud just dropped a preview of QwQ, a 32B open-source model with excellent reasoning capabilities, particularly in math and coding, beating even OpenAI o1-mini. While it's still in early stages and has limitations like potential language mixing and recursive reasoning loops, it's already showing promising results on benchmarks like GPQA and LiveCodeBench. You can try it out on AnyChat. You can even run it locally with Ollama using ollama run qwq.

LangChain has introduced Promptim, an experimental library for automated prompt optimization. Feed it your initial prompt, a dataset, and some evaluation metrics, and it'll run tests to find you a better performing prompt. Think of it as a shortcut to improved AI system results, saving you time and adding a dose of rigor to your prompt engineering.

LMSYS launched RepoChat Arena, live AI software engineering battleground where AI models tackle real coding tasks from public GitHub links. You can watch AI models fix bugs, add features, or review PRs, side-by-side, then vote for the best solution. Head to lmarena.ai to see the AI coding battles in action and help rank the top AI software engineer!

Tools of the Trade

  1. Qwen-Agent: Framework for building AI agent apps using Qwen models' (version 2.0 and above) instruction following, tool usage, planning, and memory capabilities. You can create custom agents by defining tools and instructions. It supports Alibaba Cloud's DashScope service or self-hosted models.

  2. ComfyUI Desktop: A packaged application for using ComfyUI, bundling necessary dependencies. It supports Windows (NVIDIA GPUs), macOS (Apple silicon), and Linux, automatically handling Python dependencies, updates, and essential configurations.

  3. Keep: Open source AIOps and alert management platform for comprehensive control over alerts, automation, and noise reduction. It consolidates alerts from various sources for enrichment, automation, and streamlined incident response through a centralized dashboard and API-first approach.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. Sonnet 3.6 being trained to ask questions is an underappreciated innovation. To me it seems that most of the LLM failures are not caused by the model being dumb, but by not having enough context ~
    Tom Dörr

  2. The only thing standing between DeepSeek (probably China's best AI training crew on a per capita basis) and matching the frontier labs in the West is access to compute. ~
    Jack Clark

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.