unwind ai
Posts
Text-to-Video Now Available to All 🎥

Text-to-Video Now Available to All 🎥

PLUS: Shiny new Vision Pro Headsets from Apple, Stability AI still in the GenAI game, Adobe Firefly for enterprises, and more!

June 10, 2023

Welcome back 👋

This week is historical for multiple reasons!! If you think I am just hyping it up, let me tell you about two new developments that can change the way we see the world.

I always wanted to create Hollywood-style movies/videos for my absurdly wild imagination, but the lack of resources held me back. But this week, everything changes! Gen-2 is an AI tool turns your laptop into a magic wand, allowing you to transform your wildest imaginings into stunning visual masterpieces!
If that wasn’t enough already, Apple launched the Vision pro Headsets powered by AR/VR technology that can bring virtual things to life. These headsets transport you into a virtual world where you can watch digital content as if they're happening right in front of you, blurring the line between reality and fantasy!

So brace yourselves as we dive headfirst into everything exciting that went down.

This issue covers:

Latest Developments 🌍
News from the Industry 🧑‍🏫
Tools of the Trade ⚒️
AI Meme of the Week 🤡

Latest Developments 🌍

Our Pick 👌

Neuralangelo: A high-fidelity neural surface reconstruction method that efficiently recovers detailed 3D surfaces from multi-view images.

Video-LLaMA: A multi-modal framework that enhances language models with the ability to understand both visual and auditory content in videos.
VideoComposer: A novel video synthesis system that allows users to compose videos with control over motion, spatial conditions, and temporal conditions.
PolyVoice: A language model framework for high-quality speech-to-speech translation with voice and style preservation.
Orca: 13-billion parameter model, utilizes complex explanation traces and step-by-step thought processes from GPT-4 to imitate the reasoning process of LLMs.
LLaMA-Adapter V2: A parameter-efficient visual instruction model that builds upon the original LLaMA-Adapter, improving multi-modal reasoning with fewer parameters.
SAM3D: 3D object detection that adapts the zero-shot ability of the Segment Anything Model.
InternLM: 104B parameters multilingual language model excelling in knowledge understanding, comprehension, maths, coding, and understanding of Chinese language and culture.
InstructZero: Optimizes soft prompts for black-box language models, outperforming state-of-the-art methods without backpropagation.
Recognize Anything Model (RAM): Image tagging model, leverages large-scale image-text pairs for training, surpassing CLIP, BLIP, and other approaches.
Empowering LLMs as Responsible Task Automators: Framework to enable collaboration between LLMs and humans for task automation like feasibility prediction, completeness verification, and security enhancement.
Evaluating Language Models for Mathematics through Interactions: CheckMate evaluates language models for mathematics, uncovering divergences in model capabilities and providing recommendations for improvement.
Increasing Diversity While Maintaining Accuracy: Human-AI partnerships enhance diversity and accuracy in text data generation by LLMs, using techniques like logit suppression and temperature sampling.
Transformer-based Vulnerability Detection in Code at EditTime: Deep learning and LLMs are utilized to detect and improve software vulnerabilities in code during the editing process.
Diffusion Self-Guidance for Controllable Image Generation: Leveraging internal representations of diffusion models for manipulation of properties like object shape, location, and appearance, without additional models or training.
VisualGPTScore: Addresses the lack of compositional understanding in vision-language models, achieves high performance on image-text retrieval benchmarks.
Natural Language Commanding via Program Synthesis: Semantic Interpreter enables natural language commanding in Microsoft Office by translating user commands into concise program instructions.
PromptBench: Benchmark that evaluates robustness of LLMs to adversarial prompts across various tasks, shows LLMs’ vulnerability and provides recommendations.
Vocabulary-free Image Classification: CaSED assigns class labels to images without prior knowledge of class names by leveraging a Vision-Language Model.
SnapFusion: Text-to-image generation on mobile devices within 2 seconds by introducing efficient network architecture and improving step distillation.
An Empirical Study on Challenging Math Problem Solving with GPT-4: Examining internal mechanisms of transformers, focusing on their ability to store vast knowledge from training data and adapt to new information.
Transformers Operating Directly On File Bytes: Achieve high accuracy on various file types like images and audio, and demonstrate privacy-preserving inference capabilities.
ARTIC3D: Reconstructs robust 3D shapes of objects from noisy web images, using a skeleton-based surface representation and 2D diffusion priors.
Sorting Algorithm by Google DeepMind’s AI: AlphaDev created sorting algorithms that are up to three times faster than those built by humans, revolutionizing data sorting processes.

News from the Industry 🧑‍🏫

Our Pick 👌

Without tooting AI horns, Apple released multiple AI features into its products at its WWDC 2023. Here are our favorites:

Persona in Apple Vision Pro creates a lifelike digital representation of the user's face for interactive experiences during video calls and conferences.
Improved Autocorrect based on a transformer model powered by AI to enhance word prediction capabilities. No more "ducking" now!
Live Voicemail which displays real-time text-based transcriptions of voicemails as the caller speaks.
FaceTime Presenter allows users to present an app or their computer screen to others in a FaceTime call, while also displaying a live view of their own face or head and shoulders in front of it.
Personalized Volume for AirPods that understands and adjusts to environmental conditions and listening preferences of the user over time.

Google is expanding Bard’s logic and reasoning abilities through a new technique called implicit code execution, enabling it to handle mathematical tasks, coding questions, and string manipulation more accurately.
ChatGPT app for iPad receives major update, introducing native iPad support, integration with Siri and Shortcuts, split screen, and drag-and-drop functionality.
Adobe has released Firefly for enterprises, allowing businesses to use their own branded assets to custom-train Adobe's generative AI model for content creation.
Chinese tech giant Baidu is launching a 1 billion yuan ($145 million) AI venture fund to support generative AI startup, following the footsteps of OpenAI.
Falcon 40B AI model, ranked #1 globally on Hugging Face’s leaderboard for LLMs, is now royalty-free for commercial and research purposes.
OpenAI is launching a $1M Cybersecurity Grant Program to advance AI-powered cybersecurity capabilities for defenders.
MIT neuroscientists have developed a model that predicts human emotions including joy, regret, embarrassment, outperforming previous models.
The story of a rogue U.S. Air Force drone trying to kill its operator went viral, but it seems to be a misinterpretation of a "thought experiment" about the potential dangers of AI.
India's Minus Zero unveils the country's first fully autonomous vehicle, zPod, using high-resolution cameras, offering five levels of autonomy.
A recent Microsoft report states that 83% of Indians are open to delegating work to AI, despite 74% expressing worries about job replacement by AI.
Capgemini and Google Cloud partner to create a Generative AI Center of Excellence, enabling the development of industry-specific use cases and accelerating client value.
Google Cloud and Salesforce have partnered to empower businesses with enhanced AI capabilities through data integration and custom ML models.
UK will host the first global summit on AI safety, gathering key countries, tech companies and researchers to address the risks and establish safety measures.
EU has urged Google and Facebook to label AI-generated content to combat disinformation, while Twitter faces potential sanctions for non-compliance with new digital content laws.
China President Xi Jinping calls for greater state control of AI to counter national security threats and strengthen oversight of data security.
Mozilla Ventures has invested in Fiddler that helps enterprises build trust into AI with monitoring, explainability, analytics, fairness, and safety.
Instagram is reportedly testing an AI chatbot feature with 30 personalities, allowing users to ask questions and choose their preferred AI persona.
Microsoft is allowing US government agencies to access OpenAI's GPT-4 through its Azure Government cloud computing service.
Magic's LTM-1, an LLM with a 5,000,000 token context window, is now in closed alpha. Join the waitlist!
Glen has introduced Glean Chat, an enterprise-grade chat assistant powered by generative AI, providing personalized insights from company knowledge sources.

Tools of the Trade ⚒️

Our Pick 👌

Uncrop by Stability AI: Utilizes outpainting to alter the aspect ratio of images by creating an expanded background, free to use!

Twinning: Allows influencers to create AI clones of themselves, engage with fans in one-on-one chats, and earn revenue from fan interactions.
3DFY: Text-to-3D generation tool that provides scalable and high-quality 3D models, eliminating manual creation, photogrammetry, and 3D scanning.
MindGenie: Improves productivity offering offering smart scheduling, time tracking, and AI-driven task optimization for startups and individuals.
TimeComplexity: Analyzes code's runtime complexity, provides results in Big O notation, offers explanations, warns about potential inaccuracies.
Paperclips Copilot: Chrome extension that converts highlighted text into organized and synchronized flashcards, enhancing online study sessions.
Zigpoll: Survey and feedback platform for businesses to collect valuable insights, offering a no-code setup, multiple question formats and dashboard for analysis.
Snipd: Get personalized notes for podcasts by tapping your headphones, eliminating manual note-taking, and receive takeaways via email.
Label Studio: Open-source platform for data labeling, capable of handling various data types and tasks across multiple domains.
Jetpack AI Assistant: Effortlessly create, customize, and translate into numerous languages high-quality content directly in WordPress.

AI Meme of the Week 🤡

That’s all for this week!

Will see you next Saturday with more such content. Don’t forget to subscribe and give your feedback below.

BONUS 🎉

Share this newsletter with three other friends and stand a chance to win my book GPT-3: The Ultimate Guide to build NLP Products with OpenAI API. Winners will be selected on a monthly basis.

Reply

or to participate.