• unwind ai
  • Posts
  • Google's Multi-Agent AI Co-scientist

Google's Multi-Agent AI Co-scientist

PLUS: Opensource temporal knowledge graphs for AI agents, Uncensored DeepSeek R1

Today’s top AI Highlights:

  1. Build multimodal language agents for fast prototype and production

  2. Create and query Knowledge Graphs that evolve over time

  3. Google’s AI co-scientist agentic system powered by Gemini 2.0

  4. OpenAI o1 and Claude 3.5 Sonnet cannot solve freelance SWE tasks

  5. AI coding agent that can self-extend by creating custom tools on-the-fly

& so much more!

Read time: 3 mins

AI Tutorials

Turning natural language descriptions into working PyGame visualizations typically requires significant coding expertise and time. Let's build a system that automates the entire process using AI.

In this tutorial, we'll create an AI multi-agent PyGame generator that converts text descriptions into fully functional visualizations. It coordinates four specialized AI agents working together to bring your PyGame ideas to life.

Each agent has a specific role:

1. Navigator Agent - Handles browser navigation to Trinket.io 
2. Coder Agent - Manages code input and editing
3. Executor Agent - Runs the code in the browser
4. Viewer Agent - Monitors visualization output

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

The team behind Zep, the AI agent memory platform, has open-sourced Graphiti, the engine powering its core knowledge graph capabilities. Graphiti lets you build knowledge graphs that actually change over time, capturing evolving relationships and historical context.

Graphiti can handle both unstructured text and structured JSON data while maintaining historical context. You can build assistants and agents that learn from interactions by fusing personal knowledge with data from business systems. Using a combination of semantic search, full-text search, and graph algorithms, Graphiti helps create applications that require long-term recall and state-based reasoning.

Key Highlights:

  1. Temporal Data Handling - Each data ingestion is treated as an "episode" with its own timestamp, making it easy to track when and how information changes. The parallel processing capabilities help handle large datasets efficiently.

  2. Flexible Data Input - Process unstructured text, chat messages, or JSON through a unified API. Graphiti automatically extracts entities and relationships while preserving temporal context across all input types.

  3. Custom Schema Design - Define domain-specific entity types using Pydantic models. Add new attributes to existing types without breaking changes, letting your knowledge graph evolve naturally with your application.

  4. Hybrid Search System - Combine semantic similarity, full-text search (BM25), and graph-based ranking in a single query. Pre-built search configurations help you get started while maintaining flexibility for customization.

  5. Entity-Centric Querying - Use node distance reranking to prioritize results based on their proximity to specific entities, making context-aware searches more accurate and relevant.

  6. Quick Integration - Get started with just Neo4j and LLM provider credentials. The async Python API supports both individual and bulk operations, with clear documentation and examples for common use cases.

OmAgent is an open-source Python library that lets you build multimodal language agents without the complexity of other frameworks. The library takes care of complex engineering like worker orchestration and task queues behind the scenes while exposing a simple interface for defining agents.

You can create reusable agent components and combine them into more complex agents. OmAgent provides native support for vision-language models, video processing, and mobile device connections, making it easy to build agents that can reason across text, images, video and audio.

Key Highlights:

  1. Flexible Workflow Management - Build agents using graph-based workflow orchestration with automatic worker management and task queueing. The framework handles the complex orchestration so you can focus on defining agent behavior.

  2. Advanced Memory System - Multiple memory types for organizing both short-term conversational context and long-term knowledge, enabling agents to maintain context and learn from interactions.

  3. Modular Component System - Create reusable agent components and combine them into more complex agents. Includes pre-built algorithms like ReAct and Chain-of-Thought that go beyond basic LLM inference.

  4. Multimodal Support - Native integration with vision-language models, real-time APIs, computer vision models and mobile connections. Prototype locally using Ollama or LocalAI for LLM deployment.

  5. Production-Ready Features - Built-in support for workflow monitoring, error handling, rate limiting, and scalable deployments. Deploy locally during development and then seamlessly move to containers or serverless functions.

Quick Bites

Google has unveiled AI co-scientist, a multi-agent system powered by Gemini 2.0 model, to act as a virtual scientific collaborator. The system aims to accelerate scientific discoveries by generating and validating novel research hypotheses. It has already demonstrated promising results in real-world laboratory experiments, including identifying new drug candidates for leukemia treatment and explaining mechanisms of antimicrobial resistance. Google is opening access to research organizations through a Trusted Tester Program.

LangChain has released LangMem SDK, a library for AI agents to learn and improve through long-term memory. The SDK provides tooling to extract information from conversations, optimize agent behavior through prompt updates, and maintain long-term memory about behaviors, facts, and events. You can use its API with any storage system and within any Agent framework, and it integrates natively with LangGraph's long-term memory layer.

OpenAI has released SWE-Lancer, a new benchmark for evaluating AI coding capabilities, consisting of 1,400+ real-world freelance SWE tasks from Upwork, collectively valued at $1 million. The benchmark spans the full engineering stack from UI/UX to systems design, featuring tasks ranging from $50 bug fixes to $32,000 feature implementations, with an average task completion time of 21 days by human freelancers. Initial evaluations show that even frontier AI models like o1 and Claude 3.5 Sonnet struggle to solve the majority of these tasks. A portion of the SWE-Lancer dataset has been open-sourced.

Perplexity has open-sourced R1 1776, a post-trained version of the DeepSeek R1 model that removes Chinese Communist Party-influenced censorship, while preserving its state-of-the-art mathematical and reasoning capabilities. The team used a carefully curated dataset to retrain the model, successfully enabling it to handle previously censored topics like Taiwan independence without compromising its performance on technical benchmarks. The model weights are available on HuggingFace, and you can also access it through Perplexity's Sonar API.

London-based startup Convergence has launched Proxy, a web-browsing AI agent that can autonomously handle online tasks, automate repetitive actions, and learn from previous interactions - matching or exceeding the capabilities of OpenAI's Operator while offering global availability at a significantly lower price point of $20/month (compared to Operator's $200/month). Another interesting feature of Proxy is that you can save a browser workflow and run it at a set time and frequency. It’s completely free to try.

Tools of the Trade

  1. Cline: Open-source AI coding agent in VS Code, powered by Claude 3.5 Sonnet, capable of autonomous code creation, editing, terminal command execution, and browser interaction. Using new Model Context Protocol integration, Cline can dynamically extend its capabilities by creating and installing custom tools to securely connect to various data sources and services without requiring separate integrations for each.

  2. Lucidic: Provides visual debugging and analytics tools for AI agents, allowing you to inspect decision trees, replay agent actions, and simulate performance at scale. Instead of traditional log-based debugging, it offers interactive visualization of agent workflows and reasoning patterns.

  3. MGX by MetaGPT: A multi-agent AI development platform that simulates a complete software development team with 5 AI agents (team leader, product manager, architect, engineer, and data analyst) to help users create and deploy web applications, games, and other software projects.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. trying GPT-4.5 has been much more of a "feel the AGI" moment among high-taste testers than i expected! ~
    Sam Altman

  2. AI critic talking points have gone from "LLMs hallucinate and can't be trusted at all" to "okay, there's not as many hallucinations but if you ask it a really hard question it will hallucinate still" to "hm there's not really bad hallucinations anymore but the answer isn't frontier academic paper/expert research blog quality" in < ~1 year
    Always important to remember it's currently the worst it'll ever be. ~
    Alex Albert

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.