- unwind ai
- Posts
- OpenSource LLM for Computer Use Web Agents
OpenSource LLM for Computer Use Web Agents
PLUS: Gemini Code Assist with 90x rate limits, Deep Research in ChatGPT Plus
Today’s top AI Highlights:
Build real-time audio and video AI apps in pure Python
Google’s free AI Code Assistant in your IDE with generous rate limits
OpenAI Deep Research now available to ChatGPT Plus users
Opensource 3B model for building Computer Use AI agents
Custom DevOps AI agents to manage your infrastructure
& so much more!
Read time: 3 mins
AI Tutorials
Finding the perfect property involves sifting through countless listings across multiple websites, analyzing location trends, and making informed investment decisions. For developers and real estate professionals, automating this process can save hours of manual work while providing deeper market insights.
In this tutorial, we'll build an AI Real Estate Agent that automates property search and market analysis. It helps users find properties matching their criteria while providing detailed location trends and investment recommendations. This agent streamlines the property search process by combining data from multiple real estate websites and offering intelligent analysis.
Tech Stack:
Firecrawl's Extract Endpoint to collect structured data from websites
Agno (formerly Phidata) for building the AI agent
OpenAI GPT-4o as the LLM
Streamlit for a clean, interactive web interface.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments

There’s a surge in real-time speech models from OpenAI and Google with their Realtime live APIs, and open-source projects like Kyutai's Moshi. But still building real-time AI applications is a significant challenge. Engineers may not have experience with WebRTC and related technologies, and even sophisticated code assistants like Cursor have trouble generating code for audio/video streaming applications.
Hugging Face has released FastRTC, a real-time communication library for Python. This library is designed to make it super easy to build real-time audio and video AI applications entirely in Python, by handling the underlying complexities of streaming protocols. It lets you focus on your application logic rather than wrestling with WebRTC implementation details.
Key Highlights:
Built-in Voice Detection and Turn Taking - FastRTC handles voice detection, pauses, and conversation flow automatically. You can concentrate on your core application logic while the library manages the timing and interaction patterns. There are built-in utilities for text-to-speech and speech-to-text.
Instant Testing Environment - The library comes with a built-in Gradio UI that launches with just one line of code. This allows for quick testing and iteration without needing to build a separate front end during development.
Free Phone Number Integration - Using the fastphone() method, you can get a temporary phone number for people to call and interact with your application. This extends your app's reach beyond web interfaces with minimal setup and just a Hugging Face token.
Deployment Options - FastRTC works seamlessly with FastAPI, making it easy to mount your stream on existing applications or create custom endpoints. You can extend functionality with your own routes or serve custom frontends while maintaining real-time capabilities.
The #1 AI Meeting Assistant
Typing manual meeting notes drains your energy. Let AI handle the tedious work so you can focus on the important stuff.
Fellow is the AI meeting assistant that:
✔️ Auto-joins your Zoom, Google Meet, and Teams calls to take notes for you.
✔️ Tracks action items and decisions so nothing falls through the cracks.
✔️ Answers questions about meetings and searches through your transcripts, like ChatGPT
Try Fellow today and get unlimited AI meeting notes for 30 days.

Google just dropped a free version of Gemini Code Assist, its AI-powered coding companion to build, deploy, and operate applications throughout the software development lifecycle. This isn't just another limited-access freebie; it offers a surprisingly high quota of 180,000 code completions per month.
The tool is powered by Gemini 2.0, which has been fine-tuned specifically for coding tasks by analyzing real-world use cases. It integrates directly into popular IDEs like VS Code and JetBrains. It supports all programming languages, and can understand your code with a large 128,000 token input context.
Key Highlights:
Capabilities - Gemini Code Assist in your IDE provides code completions as you write your code, generate full functions or code blocks from comments, generate unit tests, and help with debugging, understanding, and documenting your code. It provides contextualized responses, including source citations regarding which documentation and code samples it used to generate its responses.
IDE Integration & Generous Limits - Gemini Code Assist works directly in Visual Studio Code and JetBrains IDEs with a daily allowance of 6,000 code-related requests and 240 chat requests. It offers the same code completion, generation, and chat capabilities previously only available to business users.
Large Context Window - The tool supports up to 128,000 input tokens in chat, allowing you to work with large files and provide Gemini with broader understanding of their codebase. This helps generate more relevant and contextual code suggestions tailored to the specific project.
GitHub Code Review Integration - The new Gemini Code Assist for GitHub provides AI-powered code reviews for both public and private repositories. It automatically detects stylistic issues and bugs, suggests code changes and fixes, and can be customized to follow team-specific coding conventions.
Get Started Quickly - Getting started requires only a personal Gmail account with no credit card needed. Developers can install the extension in Visual Studio Code, GitHub, or JetBrains IDEs and immediately begin using it for projects ranging from building interactive visualizations to testing new application ideas.
Quick Bites
DeepSeek has released DeepEP, an open-source library that optimizes communication for Mixture-of-Experts (MoE) models across GPU clusters. It provides high-throughput and low-latency all-to-all GPU kernels, which are also as known as MoE dispatch and combine. The library also supports low-precision operations, including FP8.
Serverless cloud platform Koyeb has launched support for Tenstorrent's RISC-V-based AI accelerators, offering developers access to hardware alternatives to Nvidia GPUs. The platform provides two instance types: TT-N300S with 466 FP8 TFLOPS and TT-LoudBox with up to 1,864 FP8 TFLOPS, both featuring Tenstorrent's open-source TT-NN neural network library and TT-Metalium programming model for custom AI workloads. It's all part of the broader effort to create alternatives in the AI accelerator space, currently dominated by Nvidia.
Perplexity’s Deep Research is now available through their Sonar API, enabling developers to build apps with custom research agents and workflows. It is great at giving comprehensive research reports by performing dozens of searches and analyzing hundreds of sources.
Allen AI has released olmOCR, an open-source tool that extracts clean plain text from PDFs with exceptional efficiency, processing content at over 3000 tokens/second on your own GPU. It can handle complex layouts including columns, tables, and even handwritten text. It significantly outperforms competing tools in human evaluations while costing approximately 1/32 the price of using GPT-4o APIs—about $190 per million pages.
Convergence AI has released proxy-lite-3b, the first small open weights model for building AI web agents that can use your computer and navigate UI, outperforming all other open-source alternatives. This is a lightweight version derived from their more capable web-browsing AI agent Proxy — one of the very few Computer Use AI agents available for free that can autonomously handle online tasks. The model is available on Hugging Face, code on GitHub.
OpenAI has rolled out Deep Research to all ChatGPT Plus, Team, Edu, and Enterprise users, with 10 deep research queries per month.
Further improvements have been made to Deep Research — Embedded images with citations in the output, and better at understanding and referencing uploaded files.
A version of Advanced Voice powered by GPT-4o mini has been rolled out to all ChatGPT free users. The natural conversation pace and tone are similar to the GPT-4o version while being more cost-effective to serve.
ChatGPT Plus users now have access to video and screen sharing with the Advanced Voice Mode. Also, their rate limits have been increased by 5x. Video and screen sharing rate limits for Pro users have also been increased.
Tools of the Trade
OllyChat: An AI-first DevOps platform that creates custom AI agents to diagnose incidents, automate remediation, and learn from infrastructure data, all within existing workflows like Slack and Teams.
GenAIScript: JavaScript toolbox by Microsoft that turns prompting into coding, enabling you to programmatically assemble and orchestrate prompts for LLMs. Instead of just sending a single text prompt, you can write scripts that fetch data, call tools, define output schemas, and even run other LLMs.
rtrvr.ai: AI web agent (as Chrome extension) to automate web tasks, extract data, and conduct research across multiple tabs—including login-protected sites and local files—all through natural language commands at $0.002 per page interaction, running 4× faster than OpenAI's Operator.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes
OpenAI is moving over to selling agents not models. Some thoughts.
1) You will no longer be able to build your own system because OpenAI is already packaging for you
2) You will be buying a level of intelligence - however this is quantified - rather than API calls against a particular model
3) It will be interesting to see how pricing works for these agents - is it intelligence used or intelligence requested
4) Orion was real - interesting - it sounds like it is the GPT-4o replacement - that it is the last non-reasoning model seems like a strange call out
5) It always seemed likely that Orion was real because former OpenAI employees tended to assume that the Claude Opus-3.5 story was real
6) Orion was probably the basis for the o-series models; when they said the "o" stood for openai, I bet it actually stood for "Orion"
7) GPT-5 isn't a model anymore - it's just a collection of different models with a router and maybe even things like RAG - it's an intelligence level
8) This will make safety related to malicious users easier because the boundaries of the system vis-a-vis the public are smaller
9) It will also probably make it harder for competitors to copy OpenAI's work, because you won't know which system the improvement came from (better reasoning model, better base model, better RAG, better tools, etc...)
10) Not convinced the free / plus / pro products are sufficiently distinguished in this model. How much more intelligence do I get? How can I tell that I got more intelligence, etc...?
11) Sama is just not a product person - I wish he didn't try to LARP as one. I get the CEO as public communicator but he just isn't good as a product CEO.
12) I wonder how much of this is related to OpenAI's internal politics being very fragmented / chaotic - wouldn't they want an official blogpost etc... not good for CPO morale. ~
FleetingBits
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply