- unwind ai
- Posts
- OpenAI's Computer-Using Agent
OpenAI's Computer-Using Agent
PLUS: Agentic retreival engine for RAG, Perplexity multi-app use agent
Today’s top AI Highlights:
OpenAI releases an AI agent Operator that uses its own browser
Agentic retrieval engine for RAG with complex queries
Anthropic brings source tracking to Claude with the new Citations API
Perplexity releases AI agent to take multi-app actions on your behalf
O3-mini coming soon for ChatGPT free users
& so much more!
Read time: 3 mins
AI Tutorials
Sales teams spend countless hours manually searching for and qualifying potential leads. This repetitive task not only consumes time but also results in inconsistent lead quality. Let’s automate this process to help sales teams focus on what matters most - building relationships and closing deals.
In this tutorial, we'll build an AI Lead Generation Agent that automatically discovers and qualifies potential leads from Quora. Using Firecrawl for intelligent web scraping, Phidata for agent orchestration, and Composio for Google Sheets integration, you'll create a system that can continuously generate and organize qualified leads with minimal human intervention.
Our lead generation agent will help sales teams identify potential customers who are actively discussing or seeking solutions in their target market.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
OpenAI has launched Operator, an AI agent that can autonomously browse the web to complete tasks through its own browser window. The agent can handle many routine browser-based tasks like filling forms, ordering groceries, and interacting with websites by typing, clicking, and scrolling – just like a human would.
Operator is powered by OpenAI’s new Computer-Using Agent (CUA) model that combines GPT-4o's vision capabilities with reinforcement learning. Where most open-source computer use agents require API access to work with the websites, Operator can naturally understand and interact with interfaces without requiring API integrations. It is currently available to Pro users in the US and will soon be available to Plus, Team, and Enterprise users, and via API.
Key Highlights:
Direct Browser Control - Operator can interact with websites naturally through screenshots and mouse/keyboard actions, handling tasks like form filling and navigation. It asks for user confirmation before significant actions and hands over control for sensitive operations like payments or logins.
Visual Action Tracking - Every action Operator takes is documented with a screenshot, which you can view directly in the chat interface. This visual trail lets you monitor each step the agent takes, from clicking buttons to filling in information.
Task Management & Customization - You can run multiple tasks simultaneously in different conversations, save frequently used prompts for quick access, and set custom instructions either globally or for specific websites. This makes it particularly useful for repetitive tasks like weekly grocery orders or travel bookings.
Safety & Privacy Controls - Built-in safeguards include a takeover mode where Operator would ask you to take over when inputting sensitive information, a watch mode for high-stakes sites, and automated threat detection for malicious websites. When in takeover mode, Operator does not collect or screenshot information entered by the user.
Developer Access Coming Soon - OpenAI plans to release the Computer-Using Agent (CUA) model through their API so you can build your own browser-based AI agents.
Most Q&A bots or AI agents depend on retrieval systems to provide relevant context from knowledge bases, but traditional semantic and hybrid search methods frequently lead to inaccurate responses and LLM hallucinations. ZeroEntropy addresses this by adding intelligence to retrieval.
They employ search AI agents that actively understand query context and user intent. It then dynamically chooses the best retrieval strategies, deciding where to search, when to dig deeper, and how to refine results—just like a skilled human would. The system adapts and improves with usage, learning from interactions to deliver increasingly precise results.
Key Highlights:
Advanced Query Capabilities - ZeroEntropy goes beyond basic keyword and semantic search. It's designed to handle complex queries like those with negative constraints (things that aren't included), multi-step logic (using the result of one search to inform another), or fuzzy filtering (searching with ranges and thresholds).
Agentic Retrieval - The system doesn't perform a simple search; it acts like an intelligent agent that adapts to the context of the query and selects the most effective retrieval methods. Your AI applications get progressively better as the system dynamically adjusts to your users' needs and the questions they ask.
Granular Control & Filtering - The ZeroEntropy API provides a highly configurable platform. You have control over your search index through collections, you can query down to the document, page, or even specific snippet level. Also, its metadata filtering is exceptionally capable, letting you filter by a variety of document attributes with standard operators and boolean logic on list of strings.
Developer Friendly - ZeroEntropy offers SDKs in both Python and TypeScript/JavaScript, along with detailed API documentation. You can choose between cloud API access or on-premises deployment via Docker.
Quick Bites
Anthropic has launched Citations feature in its API that enables Claude to automatically reference specific passages from source documents in its responses. Available now for Claude 3.5 Sonnet and Haiku, you can simply enable Citations with a parameter flag on your documents, allowing Claude to ground its answers with precise character-level, page-level, or block-level citations – particularly useful for RAG implementations, document Q&A, and complex summarization tasks.
Perplexity has launched Perplexity Assistant, an AI agent that can call other apps, combining web search capabilities with multi-app actions, to complete basic daily tasks like restaurant bookings, ride-hailing, and event reminders. The assistant, now available on Android, is multimodal — it can answer questions about what’s around you or on your screen. The agent also maintains context from one action to another for an uninterrupted flow of work.
OpenAI is soon rolling out access to o3-mini in ChatGPT. Sam Altman has announced that ChatGPT free users will soon get (rate limited) access to the model, while Plus users will get to use a lot of it.
Tools of the Trade
Ragpi: Open-source AI assistant that builds knowledge bases from docs, GitHub Issues, and READMEs. It uses Redis Stack as a vector DB and RAG to answer technical questions through an API. Works with OpenAI or Ollama.
Vly.ai: Automates full-stack web development using AI agents and expert oversight. It allows you to build custom applications visually, make changes with point-and-click functionality — not a broad back-and-forth chat, but a very simple “click and describe” interface while the AI agents do the heavylifting.
Deepfake Audio Detector: Opensource command-line tool that uses deep learning to detect if an audio clip of someone speaking is a deepfake. Supports Flac and m4a files only. Claims 94% accuracy.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
DeepSeek is objectively quite good, but I wonder how much of it (and also Claude's) advantage comes from its "personality" - we intuitively judge whether an AI is pleasant to interact with, which biases our willingness to use it.
The generic-ness of Gemini & ChatGPT hurts them. ~
Ethan Mollicklots of ai people seem to think the most important thing is to get rich before the singularity happens. this is like a monkey trying to hoard bananas before another monkey invents self-replicating nanoswarms. no one wants your money in the nanoswarm future. it's just paper. ~
David Holz
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply