- unwind ai
- Posts
- Turn Any LLM into a Computer Use Agent
Turn Any LLM into a Computer Use Agent
PLUS: Virtual desktops for AI agents, Perplexity Deep Research agent
Today’s top AI Highlights:
Remote desktop infrastructure to build and scale AI computer use agents
Build Computer Use AI agents with any LLM
Perplexity releases OpenAI-like Deep Research agent
AI search with visual answers and multi-step reasoning
Convert AI workflows and agents developed in Python into web apps
& so much more!
Read time: 3 mins
AI Tutorials
Building powerful AI applications that can reason over documents while maintaining data privacy is a critical need for many organizations. However, most solutions require cloud connectivity and can't operate in air-gapped environments.
In this tutorial, we'll create a powerful reasoning agent that combines local Deepseek models with RAG capabilities. It has a dual mode that can operate in both simple local chat mode and advanced RAG mode with DeepSeek R1.
Local Chat Mode - Direct interaction with DeepSeek models running locally, perfect for general queries and conversations.
RAG Mode - Enhanced reasoning with document processing, vector search, and optional web search integration for comprehensive information retrieval.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments

Scrapybara provides the fundamental infrastructure: remote desktop instances (Ubuntu, Browser-only, Windows) that can be controlled programmatically by AI computer use agents. The platform deploys virtual desktops in under a second and offers extensive tools for browser automation, file operations, and system controls.
You can integrate your AI agents with Scrapybara through Python and TypeScript SDKs, with built-in support for models like Claude Computer Use. The platform also handles session persistence and authenticated browser states, enabling smooth deployments of complex automation workflows.
Key Highlights:
Fast Dev Experience - A single unified interface for computer use agents lets you switch between different LLMs without changing code. Pre-built tools handle common operations like browser control and file management, while the Act SDK streamlines agent development with clear messaging patterns and step tracking.
Browser Automation - Save and reuse authenticated browser states across instances, eliminating repetitive login flows. The platform integrates with Playwright for programmatic control and provides high-level abstractions for common web interactions like clicking, typing, and navigation.
Practical Deployment - Pause and resume instances on demand to optimize resource usage, with automatic timeout controls. The platform handles complex authentication flows and session management, letting you focus on building automation logic rather than infrastructure maintenance.
Tool Integration - Built-in support for multiple instance types (Ubuntu, Windows, Browser-only) with their respective capabilities. Each instance type comes with specific tools optimized for tasks like file operations, system commands, and UI interactions, accessed through a consistent API.
10x Your Outbound With Our AI BDR
Imagine your calendar filling with qualified sales meetings, on autopilot. That's Ava's job. She's an AI BDR who automates your entire outbound demand generation.
Ava operates within the Artisan platform, which consolidates every tool you need for outbound:
300M+ High-Quality B2B Prospects
Automated Lead Enrichment With 10+ Data Sources Included
Full Email Deliverability Management
Personalization Waterfall using LinkedIn, Twitter, Web Scraping & More

Microsoft's OmniParser V2 turns any LLM into a GUI automation tool. The new version runs 60% faster and spots even the smallest UI elements with much higher accuracy than before. The parser ‘tokenizes’ UI screenshots from pixel spaces into structured elements in the screenshot that are interpretable by LLMs. This enables the LLMs to do retrieval-based next-action prediction given a set of parsed interactable elements.
The tool ships with OmniTool, a ready-to-use Windows environment in Docker where you can test different agent setups using GPT-4o, o1 and o3 models, DeepSeek, Qwen, or Claude 3.5 Sonnet.
Key Highlights:
Better Screen Understanding - The model scores 39.6 on the ScreenSpot Pro benchmark, up from GPT-4V's original 0.8. Your agents can now find and click tiny UI elements like icons, buttons, and menu items that were hard to detect before.
Quick Setup for Testing - OmniTool provides a containerized Windows 11 environment that uses 50% less disk space than typical VMs. You can start testing your AI agents with just a few setup commands, and the documentation includes clear instructions for running components across CPU and GPU machines.
Modular Architecture - OmniTool has three main parts: Omniparserserver handles screen parsing, Omnibox runs the Windows VM, and a Gradio interface lets you watch your agents work. Run the heavy parsing on GPU servers while keeping other parts on local CPUs.
Flexible Model Support - Works out of the box with OpenAI o models, GPT-4o, DeepSeek R1, Qwen 2.5VL, and Claude's Computer Use. Switch between models easily or add new ones as they come out.
Quick Bites
Perplexity has released Deep Research, a research AI agent that generates in-depth research reports on any topic. Equipped with search and coding capabilities, Perplexity’s Deep Research iteratively searches, reads documents, and reasons about what to do next, refining its research plan as it learns more about the subject areas.
Optimized for speed and performance, Perplexity’s research agent achieves a 21.1% accuracy score on Humanity’s Last Exam, outperforming Gemini Thinking, o3-mini, o1, DeepSeek-R1, and more. Available to everyone for free — up to 5 queries per day for non-subscribers and 500 queries per day for Pro users.
AI search engine Phind has released the 2.0 version that moves beyond traditional text responses to include images, diagrams, interactive widgets, cards, and other rich visual outputs within the answer itself. The team completely rebuilt the system with a custom-trained model that can autonomously do multiple rounds of additional web searches mid-answer when it realizes that it needs more information. It can also verify calculations by executing code in a Jupyter notebook within the answer.
Codeium has rolled out Wave 3 for its Windsurf AI IDE to enhance the developer experience. The update introduces Model Context Protocol (MCP) support, alongside productivity features like tab-to-jump suggestions, drag-and-drop images, and autonomous terminal command execution.
MCP integration allows Cascade (Windsurf's AI engine) to access various data sources through standardized servers
New "Turbo Mode" enables Cascade to autonomously execute terminal commands without requiring approval
Tab-to-jump feature enhances the editor's predictive capabilities, going beyond traditional code completion
Added support for new models including DeepSeek-v3, DeepSeek-R1, o3-mini, and Gemini 2.0 Flash.
Galileo.ai has launched an Agent Leaderboard on Hugging Face that evaluates how different LLMs handle tool-based interactions across various dimensions. The evaluation spans everything from basic API calls to complex multi-tool scenarios. Among the 17 evaluated models, Gemini-2.0-flash leads the pack followed by GPT-4o, and open-source models like mistral-small-2501 and Qwen-72b showing impressive capabilities in the mid-tier range. The leaderboard will be updated monthly to keep pace with new model releases.
Tools of the Trade
Morph: Full-stack Python framework for building and sharing AI and data applications. You can quickly convert data analysis scripts and custom AI workflows developed in Python into web applications. Also, you can easily share the applications you create with a single command deployment.
Xyne: Open-source, self-hosted search & answer engine that connects to your various workplace applications, indexes their data, and builds a knowledge graph. Receive AI-powered, context-aware answers sourced from across your organization's information landscape. Plugs into any LLM of your choice.
NodeScript: Browser-based, visual programming platform that lets you build and deploy serverless applications and automations using interconnected nodes. It enables API integration, data processing, scheduled tasks, and workflow creation, all visually, with the ability to instantly publish workflows as callable endpoints.
mcp.run: A platform for hosting and managing Model Context Protocol (MCP) servlets, which are WebAssembly-powered plugins that facilitate secure and standardized communication between AI models/agents and various data sources/tools.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes
I’m still amazed how Microsoft fucked up what was a clear first mover advantage in AI by simply making terrible product decisions.
A failure of product, marketing, positioning… everything. Organisational disaster. ~
Wasteland CapitalAI agents like deep research continue to remind me -- for good -- how broken our research tools are.
And this is not just happening for research tools, the same is happening for search, code, work, and creative tools. AI agents will eliminate all the unnecessary. ~
elvis
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply