unwind ai
Posts
Text-to-Video is Here (So is AGI? 😱)

Text-to-Video is Here (So is AGI? 😱)

OpenAI on the launch spree with GPT-4 followed by ChatGPT plugin and code interpreter!

March 25, 2023

Hey there 👋

It's time for another edition of Unwind AI and we’re here to unwind the latest developments and trends in the fascinating world of AI. The AI landscape is changing at such a lightning pace! Try wrapping your head around one, there’d be news and advancements hitting you left, right and center.

That’s precisely why we’re here to keep you on the edge with everything hot and trending in AI. Last week OpenAI amazed everyone with the launch of GPT-4 but this week is no less! With Runway launching its text-to-video model to Opera, Adobe and Canva launching generative AI features in their tools to ChatGPT plugin, a lot has happened this week which we can’t wait to share with you!

So without further ado, let’s dive in!

But before we get started, we want to thank Metaview, our sponsor for this week.

Metaview is an AI-powered tool that automates interview note-taking for recruiters, allowing them to focus on high-quality interactions with candidates during interviews. Its features include:

This issue covers:

Latest Developments 🌍
News from the Industry 🧑‍🏫
Tools of the Trade ⚒️
Hot Takes 🔥
AI Meme of the Week 🤡

Latest Developments 🌍

Our Pick 👌

Gen-2 by Runway: A multi-modal AI system that can generate new videos using text, images, or video clips. It can synthesize videos in any style using text prompts, transfer image styles to videos, and customize the model for higher fidelity results.

LERF (Language Embedded Radiance Fields): A technique that optimizes a 3D language field for real-time language-based queries without region proposals or fine-tuning.
Stable Diffusion Reimagine: A new Clipdrop tool that generates multiple variations of a single image without limits, with no need for complex prompts.
Unified Multi-Modal Latent Diffusion: Generates high-quality images with complex semantics by taking joint text and image inputs and using a unified multi-modal latent space and a novel sampling technique.
Alibaba’s Text-to-Video Generation Diffusion Model: A 1.7 billion parameter text-to-video generation model can create videos from English text descriptions using a multi-stage diffusion model.
Google’s Vid2Seq: A large-scale pre-trained visual language model that can describe multi-event videos using time tokens to jointly localize and describe events in a single sequence.
DS-Fusion: Generates artistic typography by stylizing letter fonts to convey the meaning of an input word while maintaining readability.
Learning Real-world Conversation from Large-Scale Web Videos: CHAMPAGNE, a generative model of conversations that learns from a large-scale dataset of 18M video-based dialogues, accounting for visual contexts.
LOCATE (Localize and Transfer Object Parts): Identifies matching object parts across images and transfers knowledge of object usage, using a weakly supervised framework.
ElasticViT by Microsoft: A high-performance, low-latency neural network architecture for mobile devices that uses a two-stage approach to train and optimize ViT.
Breaking Common Sense - WHOOPS!: A new benchmark for visual commonsense that challenges AI models to recognize and interpret purposefully commonsense-defying images, including a difficult explanation generation task.
Edit-A-Video from Seoul National University: Object-aware video editing framework that uses a pretrained text-to-image model and a single text-video pair.
GPTs are GPTs (general-purpose technologies) by OpenAI: GPT models have the potential to impact at least 10% of the work of up to 80% of the US workforce, with higher-income jobs facing greater exposure.
Textured 3D Meshes from 2D Text-to-Image Models: Text2Room creates textured 3D room-scale meshes from text using 2D text-to-image models, depth estimation, and a tailored viewpoint selection strategy.
Prompting ChatGPT for Multimodal Reasoning and Action: MM-ReAct integrates ChatGPT with a pool of vision experts to achieve advanced visual intelligence through a textual prompt design.
Fate/Zero by Tencent AI Labs: A method for zero-shot text-based video editing on real-world videos without per-prompt training.
Instruct-NeRF2NeRF by UC Berkeley: A method for editing NeRF 3D scenes using text instructions and image-conditioned diffusion model.

News from the Industry 🧑‍🏫

Our Pick 👌

Opera adds Generative AI features: Opera Browser and Opera GX now have AI-powered tools, including AI Prompts and sidebar access to ChatGPT and ChatSonic, which can summarize, explore and offer assistance with a wide range of queries and issues.

GPT-4 is now available in preview on Azure OpenAI Service, empowering businesses to leverage OpenAI's most advanced model yet.
ChatGPT is implementing plugins, being rolled out gradually to a small number of developers and ChatGPT Plus subscribers for testing, and opening up vast potential real-world applications.
Microsoft has launched Loop for managing tasks and projects in virtual teams, that syncs across Microsoft 365 apps and services.
GitHub has launched Copilot X, an AI assistant for developers, integrating chat and voice interfaces, pull request support, and AI-generated answers to documentation questions, powered by OpenAI's GPT-4 model.
Application for Google Bard’s waitlist is now available!
Adobe has launched Adobe Firefly, a generative AI tool for creators, that enables infinite creative possibilities, focusing on image and text effect generation.
Canva has launched a host of AI-powered features in its Visual Suite to help users create designs faster, smarter and more creatively.
Mozilla is investing $30 million in Mozilla.ai, a startup to build a trustworthy and independent open-source AI ecosystem.
Baidu's ChatGPT-like AI bot, Ernie Bot, failed to impress investors at its unveiling in Beijing, with the company's shares falling by as much as 10%.
Apple is reportedly testing "Siri natural language generation" with plans to roll it out to all of its operating systems, making a significant shift from Siri's current template database-based approach and improving its functionality.
Unlearn.AI raises $15 million in funding and adds OpenAI CTO Mira Murati to its board, to expand its machine learning platform for digital twin patient profiles in clinical trials.
Runway partners with AWS to scale AI research and accelerate training and deployment of new models, including the recently released Gen-2.
The Allen Institute for AI has created Objaverse, a vast database of over 800,000 3D models of everyday objects to improve the training simulators used for AI models.

Tools of the Trade ⚒️

Our Pick 👌

Whimsical: A visual collaboration platform to create and combine flowcharts, wireframes, mind maps, and docs for faster and more efficient teamwork, with templates and integrations to boost productivity.

Whimsical's AI for Mind Maps generates fresh ideas with a single click.

Qlip AI: Generates AI-powered video highlights to grow social media presence by repurposing long-form videos into snackable clips.
Tugan AI: Generates promotional and educational emails in seconds from a given topic or URL, to save time and increase conversions for businesses.
Botsheets: Collects actionable data from customer conversations and writes it to a Google Sheet, streamlining lead generation and identifying trends for business growth.
Build AI: Lets businesses create customized AI-powered web apps quickly without technical skills, with natural language customization and easy integration into websites.
ChatBot Kit: Create your own AI chatbot with features like chat history, custom dataset, GPT support, integrations with Slack, Discord, and Twitter, and more.
Oxolo: AI video creation platform that summarizes product listings into video scripts and creates videos in minutes to help e-commerce businesses.
Wolverine: Fixes crashed Python scripts using GPT-4 to repeatedly rerun and edit them until they're error-free.
Superhuman: An email client offering a fast, intelligent, and visually beautiful email experience with features such as AI triage, read statuses, follow-up reminders, scheduled messages, etc.
Leonardo.AI: A creative studio to create stunning game assets, rapidly ideate, and craft worlds in minutes, with the option to use existing models or train their own AI models.
Wanda: Reduces documentation time by 90%, allows for easy documentation building through AI, automates repetitive tasks, and provides technical support.
Speechify: A text-to-speech tool that allows users to listen to content 2-3x faster than reading, retain more information, and sync across platforms.

Hot Takes🔥

Starting to have a feeling that *none* of our SciFi imaginings have adequately prepared us for what’s coming.
— Bojan Tunguz (@tunguz)
6:14 PM • Mar 23, 2023

Unpopular opinion: I don’t care that openai is for-profit now.
— Ben Tossell (@bentossell)
12:43 PM • Mar 25, 2023

NLP grad students, don’t despair. ChatGPT is not the end of NLP research, it’s the beginning.
— Pedro Domingos (@pmddomingos)
4:22 PM • Mar 25, 2023

AI Meme of the Week 🤡

That’s all for this week!

Will see you next Saturday with more such content. Don’t forget to subscribe and give your feedback below.

BONUS 🎉

Share this newsletter with three other friends and stand a chance to win my book GPT-3: The Ultimate Guide to build NLP Products with OpenAI API. Winners will be selected on a monthly basis.

🎁 Every paid subscriber will also receive FREE learning resources on trending topics like Python, Data Science, Machine Learning, and NLP!

Reply

or to participate.