unwind ai
Posts
Build AI Agents Workflows in Python

Build AI Agents Workflows in Python

PLUS: OpenAI releases Sora, Smallest vision-language model running locally

Shubham Saboo & Gargi Gupta
December 10, 2024

Today’s top AI Highlights:

Build AI Agent workflows with Pythonic control and validation
OpenAI releases Sora beyond text-to-video with a complete creative toolkit
Run PydanticAI agents locally with just a few lines of code
The smallest vision-language model that requires just 479MiB to download
An attempt to build Cursor's @codebase feature - RAG on codebases

& so much more!

Read time: 3 mins

AI Tutorials

You might know about agencies that help build software products - with CEOs making strategic decisions, CTOs architecting solutions, developers writing code, and product managers coordinating everything. But can you imagine an agency fully run by AI agents that collaborate to analyze, plan and guide software projects, all working together seamlessly like a real team?

In this tutorial, we'll build exactly that - a multi-agent AI Services Agency where 5 specialized AI agents work together to provide comprehensive project analysis and planning:

CEO Agent: Strategic leader and final decision maker
CTO Agent: Technical architecture and feasibility expert
Product Manager Agent: Product strategy specialist
Developer Agent: Technical implementation expert
Client Success Agent: Marketing strategy leader

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build an AI Services Agency run by AI Agents

Fully functional multi-agents app built with Agency Swarm (step-by-step instructions)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Sora Comes with a Complete Creative Toolset 🎨

Remember all the buzz, speculation, and eventual frustration around OpenAI's Sora announcement in February? OpenAI has finally released Sora. And this is for everyone who thought Sora was out of the competition. It isn't just another player in the text-to-video space - the feature set they've launched shows they were working on something different from what Runway, Luma Labs, or Kling AI are doing.

Now available as a standalone product at sora.com, the new Sora Turbo version significantly improves on the preview version with faster speed and enhanced capabilities. It can generate videos from both text and images, and comes packed with features like Storyboard for frame-by-frame direction, video editing tools, and style presets. You can generate videos in different resolutions and aspect ratios.

Key Highlights:

Creative Controls - Choose between horizontal, square, or vertical formats, set your resolution (480p-1080p), and pick your duration (5-20 seconds). Generate up to 4 variations per prompt to explore different interpretations.
Video Editing Suite - Need to modify your generations? Use Remix to change elements while keeping the scene's feel, Re-cut to work with specific frames, Loop to create seamless repeating sequences, or Blend to combine two videos into one coherent scene. Each tool comes with adjustable settings for fine-tuning your results.
Storyboard Tool - This is probably their star feature! The Storyboard interface lets you map out scenes on a timeline, describing exactly what happens at each moment. You can set specific timings for actions, start from images with auto-generated continuation captions, and let Sora smoothly connect your sequences.
Community Features - Need inspiration? Browse the feed to see what others are creating. Each video shows exactly how it was made - the prompts used, techniques applied, and methods followed. It's a great way to adapt successful approaches for your own projects.
Subscription Access - ChatGPT Plus subscribers get:
- 50 video generations monthly
- 5s duration
- at 480p or fewer at 720p.
With a Pro subscription, you'll have
- 500 priority generations plus unlimited generations in the slower queue
- 1080p resolution,
- 20-second videos, and
- download without watermarks.
Free ChatGPT users can sadly just explore the feed!

Agents, Tasks, Flows - A New Way to Build Agentic Apps ♻️

ControlFlow is a new Python framework for building agentic AI workflows in a structured way. It puts you in control of how AI agents work in your applications. The framework introduces a task-centric approach:
- you define clear observable tasks,
- assign specialized AI agents to handle each task, and
- combine everything into flows to orchestrate more complex behaviors.
By breaking down AI workflows into observable tasks, ControlFlow helps you build applications that strike the right balance between AI autonomy and developer control.

Key Highlights:

Task Management System - Define clear objectives and get structured outputs from your AI agents. Tasks can return everything from simple strings to complex Pydantic models, with built-in validation to ensure outputs match your requirements. You decide exactly what format you need, and ControlFlow makes sure you get it.
Agent Specialization & Control - Set up agents with different LLM models, tools, and specialized instructions based on your needs. Agents can work together on tasks while sharing context through flows. Switch between GPT-4, Claude, or local LLMs without rewriting your code.
Memory & Context Handling - Give your agents persistent memory using vector databases like ChromaDB. They can store and retrieve information across conversations, maintain context, and build knowledge bases that last. Organize memories by user, agent, or project to keep everything tidy.
Production-Ready Features - Get everything you need to run in production: rate limiting, error handling, streaming responses, and detailed logging. Run operations asynchronously, execute agents concurrently, and keep tight control over your LLM usage with built-in tools and strategies.

Quick Bites

xAI has released their own image generation model Aurora which can generate images from text and images with incredible photorealism. It will be available with Grok-2 to all X users within a week. This also means that xAI has ended its partnership with Black Forest Labs for FLUX.

Ollama’s new version 0.5 now supports structured outputs making it possible to constrain a model’s output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs.

PydanticAI now runs smoothly with local models via Ollama. The integration requires minimal setup - just point an AsyncOpenAI client to your local Ollama instance and you're ready to build structured data extractors and agents that run entirely on your machine.

Here’s the smallest vision language model Moondream 0.5B that runs smoothly on edge and mobile devices, requiring just 479 MiB to download and a mere 996 MiB memory to run when quantized to 8-bit. The model is fully open-source - int8 and int4 weights for moondream 0.5B are available, as well as fast CPU inference support in the Python client library. 16-bit weights and distillation support will be coming soon!

Tools of the Trade

RedSage: A lightweight, terminal-based coding assistant that connects to LLM APIs (e.g., Claude, OpenAI) to provide real-time pair programming capabilities. It comes with real-time file watching, Git integration, and simple YAML-based setup.
Co.dev: No-code platform to create full-stack apps using natural language prompts. It generates code that you fully own, utilizing a modern tech stack with built-in features like authentication and database integration.
CodeQA: Explore codebases using natural language, similar to Cursor's @codebase feature. It answers queries with relevant code snippets, file names, and references. Powered by LanceDB, GPT-4o, and Answerdotai's colbert-small-v1 reranker. Supports Python, Rust, JavaScript, and Java.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

In 5 years, almost all code will be LLM generated. When this happens, a solid understanding of type systems, concurrency, and programming paradigms will be extremely useful. The people studying PLT now will 1000x outship the people learning Go or Java because it’s “employable” ~
wordgrammer
an insane amount of work on scaffolding, RAG, agents, in-context learning, semantic gradients, CoT monitoring, etc will go out the window once you get recurrent state-space models ~
James Campbell

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.