• unwind ai
  • Posts
  • Text-to-Video Now Available to All 🎥

Text-to-Video Now Available to All 🎥

PLUS: Shiny new Vision Pro Headsets from Apple, Stability AI still in the GenAI game, Adobe Firefly for enterprises, and more!

Welcome back 👋

This week is historical for multiple reasons!! If you think I am just hyping it up, let me tell you about two new developments that can change the way we see the world.

  1. I always wanted to create Hollywood-style movies/videos for my absurdly wild imagination, but the lack of resources held me back. But this week, everything changes! Gen-2 is an AI tool turns your laptop into a magic wand, allowing you to transform your wildest imaginings into stunning visual masterpieces!

  2. If that wasn’t enough already, Apple launched the Vision pro Headsets powered by AR/VR technology that can bring virtual things to life. These headsets transport you into a virtual world where you can watch digital content as if they're happening right in front of you, blurring the line between reality and fantasy!

So brace yourselves as we dive headfirst into everything exciting that went down.

This issue covers:

  • Latest Developments 🌍

  • News from the Industry 🧑‍🏫

  • Tools of the Trade ⚒️

  • AI Meme of the Week 🤡

Latest Developments 🌍

Our Pick 👌

Neuralangelo: A high-fidelity neural surface reconstruction method that efficiently recovers detailed 3D surfaces from multi-view images.

  • Video-LLaMA: A multi-modal framework that enhances language models with the ability to understand both visual and auditory content in videos.

  • VideoComposer: A novel video synthesis system that allows users to compose videos with control over motion, spatial conditions, and temporal conditions.

  • PolyVoice: A language model framework for high-quality speech-to-speech translation with voice and style preservation.

  • Orca: 13-billion parameter model, utilizes complex explanation traces and step-by-step thought processes from GPT-4 to imitate the reasoning process of LLMs.

  • LLaMA-Adapter V2: A parameter-efficient visual instruction model that builds upon the original LLaMA-Adapter, improving multi-modal reasoning with fewer parameters.

  • SAM3D: 3D object detection that adapts the zero-shot ability of the Segment Anything Model.

  • InternLM: 104B parameters multilingual language model excelling in knowledge understanding, comprehension, maths, coding, and understanding of Chinese language and culture.

  • InstructZero: Optimizes soft prompts for black-box language models, outperforming state-of-the-art methods without backpropagation.

  • Recognize Anything Model (RAM): Image tagging model, leverages large-scale image-text pairs for training, surpassing CLIP, BLIP, and other approaches.

  • Empowering LLMs as Responsible Task Automators: Framework to enable collaboration between LLMs and humans for task automation like feasibility prediction, completeness verification, and security enhancement.

  • Evaluating Language Models for Mathematics through Interactions: CheckMate evaluates language models for mathematics, uncovering divergences in model capabilities and providing recommendations for improvement.

  • Increasing Diversity While Maintaining Accuracy: Human-AI partnerships enhance diversity and accuracy in text data generation by LLMs, using techniques like logit suppression and temperature sampling.

  • Transformer-based Vulnerability Detection in Code at EditTime: Deep learning and LLMs are utilized to detect and improve software vulnerabilities in code during the editing process.

  • Diffusion Self-Guidance for Controllable Image Generation: Leveraging internal representations of diffusion models for manipulation of properties like object shape, location, and appearance, without additional models or training.

  • VisualGPTScore: Addresses the lack of compositional understanding in vision-language models, achieves high performance on image-text retrieval benchmarks.

  • Natural Language Commanding via Program Synthesis: Semantic Interpreter enables natural language commanding in Microsoft Office by translating user commands into concise program instructions.

  • PromptBench: Benchmark that evaluates robustness of LLMs to adversarial prompts across various tasks, shows LLMs’ vulnerability and provides recommendations.

  • Vocabulary-free Image Classification: CaSED assigns class labels to images without prior knowledge of class names by leveraging a Vision-Language Model.

  • SnapFusion: Text-to-image generation on mobile devices within 2 seconds by introducing efficient network architecture and improving step distillation.

  • An Empirical Study on Challenging Math Problem Solving with GPT-4: Examining internal mechanisms of transformers, focusing on their ability to store vast knowledge from training data and adapt to new information.

  • Transformers Operating Directly On File Bytes: Achieve high accuracy on various file types like images and audio, and demonstrate privacy-preserving inference capabilities.

  • ARTIC3D: Reconstructs robust 3D shapes of objects from noisy web images, using a skeleton-based surface representation and 2D diffusion priors.

  • Sorting Algorithm by Google DeepMind’s AI: AlphaDev created sorting algorithms that are up to three times faster than those built by humans, revolutionizing data sorting processes.

News from the Industry 🧑‍🏫

Our Pick 👌

Without tooting AI horns, Apple released multiple AI features into its products at its WWDC 2023. Here are our favorites:

  1. Persona in Apple Vision Pro creates a lifelike digital representation of the user's face for interactive experiences during video calls and conferences.

  2. Improved Autocorrect based on a transformer model powered by AI to enhance word prediction capabilities. No more "ducking" now!

  3. Live Voicemail which displays real-time text-based transcriptions of voicemails as the caller speaks.

  4. FaceTime Presenter allows users to present an app or their computer screen to others in a FaceTime call, while also displaying a live view of their own face or head and shoulders in front of it.

  5. Personalized Volume for AirPods that understands and adjusts to environmental conditions and listening preferences of the user over time.

Tools of the Trade ⚒️

Our Pick 👌

Uncrop by Stability AI: Utilizes outpainting to alter the aspect ratio of images by creating an expanded background, free to use!

  • Twinning: Allows influencers to create AI clones of themselves, engage with fans in one-on-one chats, and earn revenue from fan interactions.

  • 3DFY: Text-to-3D generation tool that provides scalable and high-quality 3D models, eliminating manual creation, photogrammetry, and 3D scanning.

  • MindGenie: Improves productivity offering offering smart scheduling, time tracking, and AI-driven task optimization for startups and individuals.

  • TimeComplexity: Analyzes code's runtime complexity, provides results in Big O notation, offers explanations, warns about potential inaccuracies.

  • Paperclips Copilot: Chrome extension that converts highlighted text into organized and synchronized flashcards, enhancing online study sessions.

  • Zigpoll: Survey and feedback platform for businesses to collect valuable insights, offering a no-code setup, multiple question formats and dashboard for analysis.

  • Snipd: Get personalized notes for podcasts by tapping your headphones, eliminating manual note-taking, and receive takeaways via email.

  • Label Studio: Open-source platform for data labeling, capable of handling various data types and tasks across multiple domains.

  • Jetpack AI Assistant: Effortlessly create, customize, and translate into numerous languages high-quality content directly in WordPress.

AI Meme of the Week 🤡

That’s all for this week!

Will see you next Saturday with more such content. Don’t forget to subscribe and give your feedback below.

BONUS 🎉

Share this newsletter with three other friends and stand a chance to win my book GPT-3: The Ultimate Guide to build NLP Products with OpenAI API. Winners will be selected on a monthly basis.

Reply

or to participate.