- unwind ai
- Posts
- Last Year in AI - An yearly Unwind
Last Year in AI - An yearly Unwind
2023: The Year Everyone Became a Writer, Poet, Artist, and 10x More Productive!
2023 was an absolute blast in the world of AI! It's the year where artificial intelligence really got down to business, becoming a part of our everyday life in cool, practical ways. Imagine having a chat with a computer that can crack jokes like your best friend, or getting art lessons from a digital Picasso - that's the kind of stuff 2023 made possible.
AI this year was all about making life easier and more fun. It helped doctors spot health issues faster than ever, turned classrooms into super-personalized learning spaces, and let anyone become a poet or an artist with just a few clicks. It wasn't just about smarter machines; it was about AI being our sidekick in creativity and problem-solving.
Think of it like having a super-smart buddy who's always there to help you out. 2023's AI wave was a game-changer, making tech feel more human and turning sci-fi dreams into everyday things. It’s been a wild ride watching how AI is reshaping our world, and we're just getting started!
As we dive into the specifics, it's fascinating to see how these advancements have manifested in various aspects of our lives. The magic of AI in 2023 wasn't just about the big, flashy innovations; it was about bringing a touch of extraordinary to the ordinary.
This wasn't just about technology getting an upgrade; it was about people finding new ways to unleash their creativity, boost their productivity, and connect with each other like never before. So, let's look at 5 key themes that made 2023 an year to remember:
1. The Rise of AI-Powered Creativity: With advanced AI tools at their fingertips, people from all walks of life discovered their inner writers and poets. These tools, like ChatGPT, broke down the barriers of traditional learning, allowing anyone to express themselves eloquently and artistically.
2. Art for All: The democratization of art was a highlight. AI-driven platforms enabled users to create stunning visuals, irrespective of their prior artistic skills. This led to a surge in digital art, bringing diverse and previously unheard voices to the forefront of the art world.
3. Productivity Redefined: 2023 also saw a significant leap in productivity. AI assistants, smarter task management tools, and enhanced virtual collaboration platforms meant tasks that once took hours were completed in minutes. This efficiency boom reshaped workplaces, fostering a culture of work-life balance.
4. Education Revolutionized: The landscape of education transformed, making learning more accessible and personalized. AI tutors provided customized learning experiences, igniting a passion for knowledge in students and lifelong learners alike.
5. The Power of Community: This year also reinforced the power of community. Online platforms became melting pots of ideas and collaboration, leading to unprecedented levels of innovation and creative exchange.
AI Developments Timeline (JAN - DEC)
January 2023
Anthropic Releases 1st Gen LLM: Claude
Anthropic releases a new AI chatbot named Claude, similar to ChatGPT. Based on the research paper on Constitutional AI, Claude differs from ChatGPT in its approach to reinforcement learning and uses a model-generated approach for initial ranking of fine-tuned outputs instead of relying on human feedback. Claude was a serious competitor to ChatGPT, with improvements in some areas like its ability to write coherently about itself and its limitations, and has more naturalistic writing.
Microsoft + OpenAI = Azure OpenAI Service ☁️
Microsoft announced the general availability of Azure OpenAI Service, a tool that allows businesses to access the most advanced AI models, including GPT-3.5, Codex, and DALL•E 2. The service lets you create cutting-edge applications, improve user experiences and streamline internal efficiencies. It also includes ChatGPT which runs inference on Azure AI infrastructure.
VALL-E: Language Model-Based Text-to-Speech
Microsoft releases VALL-E, a neural codec language model for text-to-speech synthesis (TTS). It is trained using 60K hours of English speech and can generate high-quality personalized speech by using just a short recording of a person's voice as a reference. It can also keep the emotions and the environment of the reference recording in the generated speech.
February 2023
Google’s Releases Bard
Google releases Bard, a conversational AI platform powered by LaMDA (Language Model for Dialogue Applications), to provide fresh, high-quality responses by drawing information from the web. The hype was real then, but it soon died due to high hallucination rate in Bard.
ChatGPT Touched 100M, and counting…
OpenAI’s popular chatbot ChatGPT reached 100 million users just two months after launching, with an unprecedented growth rate for a consumer app, according to UBS analysts. The app had 590 million visits in January from 100 million unique visitors, compared to TikTok's 9 months and Instagram's 2 years to reach the same milestone.
Meta Releases LLaMA 💬
After Google and Microsoft, Meta joined the LLM race with its new release LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla70B and PaLM-540B.
No Lights, No Cameras, All Action
Runway launched Gen-1, its first generative AI system for turning existing videos into new, compelling pieces of footage using images or text prompts. It is based on a structure and content-guided video diffusion model trained on images and videos and offers fine-grained control over output characteristics. The platform offered several modes for stylization, storyboarding, masking, and customization.
March 2023
OpenAI Releases GPT-4
OpenAI launched GPT-4 , its next-gen large multimodal model that accepts image and text inputs and emits text outputs, outperforming other LLMs and SOTA models in various language and reasoning tasks, and demonstrating human-level performance on various professional and academic benchmarks. We believe it has been the most important release of the year.
Adobe Releases Firefly
Adobe launched Firefly, a generative AI tool that allows users to generate images, add or remove objects, and transform text. There are different modules available in Firefly, including Generative Fill, Generative Recolor and 3D to Image. Firefly is integrated into Adobe apps like Photoshop, Illustrator, and Adobe Express, offering new features powered by generative AI.
Runway quickly went from Gen-1 to Gen-2
Runway launched, Gen-2, a multi-modal AI system that can generate new videos using text, images, or video clips. It can synthesize videos in any style using text prompts, transfer image styles to videos, and customize the model for higher fidelity results.
April 2023
Elon Musk Launches xAI
Elon Musk launched xAI for research in AI, in a stiff rival to AI leaders like OpenAI, Anthropic and Microsoft. He also took Twitter and rebranded it as X. If it actually made Twitter a liberal and intuitive platform is still a debatable topic!
Amazon’s Step in the LLM World
Amazon launched Amazon Bedrock which enables the building of generative AI-powered apps via third-party models from AI21 Labs, Anthropic and Stability AI as well as AWS's Titan FM models. Amazon also has releases CodeWhisperer, an AI coding companion for faster and more secure application development, free for individual developers.
Meta’s AI to “Cut Out” Any Object
Meta AI releases the Segment Anything Model (SAM), an AI model that can segment any object in an image with just one click, using input prompts, trained on over 11 million images with over 1 billion masks.
May 2023
Stable Diffusion XL
Stability AI releases Stable Diffusion XL (SDXL), the latest image generation model with next-level photorealism capabilities, enhanced image composition, and face generation. SDXL also includes image-to-image prompting, inpainting, and outpainting (constructing a seamless extension of an existing image).
Google Releases PaLM-2
Google's I/O Event showcased a deluge of AI innovations. Highlights include PaLM 2 enhancing over 25 Google products with multilingual and coding capabilities, Bard with its global reach and diverse functionalities, and the Search Generative Experience offering enriched query responses.
Inflection Releases its Empathetic AI Chatbot Pi
Inflection AI introduced its first Personal AI, named Pi, offering a unique blend of friendly conversation, advice, and information in a natural, engaging manner. Pi stands out by focusing on personal interaction, serving as a coach, confidante, and creative partner, distinct from other AIs that emphasize productivity and information retrieval.
June 2023
Microsoft Unveils Orca
Orca is a 13-billion parameter model that utilizes complex explanation traces and step-by-step thought processes from GPT-4 to imitate the reasoning process of LLMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT.
Video-LLaMA
Video-LLaMA is a multi-modal framework that empowers Large Language Models (LLMs) with the capability of understanding both visual and auditory content in the video. Video-LLaMA bootstraps cross-modal training from the frozen pre-trained visual and audio encoders and the frozen LLMs.
Tracking Everything Everywhere All at Once
Google Research, in collaboration with UC Berkeley, unveiled OmniMotion, a novel approach to motion tracking in video sequences. Breaking away from the constraints of traditional optical flow or particle video tracking methods, OmniMotion enables dense, long-range motion estimation, effectively tracking every pixel's journey through the entirety of a video.
July 2023
Humane Releases AI Pin
Humane, an AI startup by ex-Apple designers revealed "Ai pin". It is an intelligent clothing-based wearable device with an in-built projector that can replace smartphones in the future!
Meta AI Releases OpenSource Llama-2
Llama 2 is a series of large language models (LLMs) with 7B to 70B parameters, offering three sizes (7B, 13B, 70B). It's trained on 2 trillion tokens, 40% more than its predecessor, Llama 1, and includes over 1 million human annotations. With a 4k context length, Llama 2 surpasses other open-source models in benchmarks across various tasks including coding and reasoning
ChatGPT gets Code Interpreter
OpenAI has rolled out Code Interpreter to all ChatGPT Plus users. It can run code, analyze data, create charts, edit files, perform math, and so much more!
August 2023
Meta AI Releases Code Llama
Meta AI releases Code Llama, a sophisticated opensource large language model designed specifically for coding tasks. Built on the robust Llama 2 architecture, it comes in three variations to serve different coding needs, including a Python-specialized model.
Clone any Voice using AI in < 3 secs
VALL-E X is an amazing multilingual text-to-speech (TTS) model proposed by Microsoft. While Microsoft initially published a research paper, they did not release any code or pre-trained models. But now, we have an Opensource implementation of VALL-E X that you can use to clone any voice on your computer.
Google’s Next-Gen Coding Platform
Google launched Project IDX, a browser-based coding environment in the cloud, boasting next-gen AI features like code translation and auto-completion. This isn't just about fast coding, but smart coding. While rivals like GitHub's Copilot have ventured into AI-augmented coding, Google's full-stack approach is unique.
September 2023
Microsoft's AI Smart Backpack
Microsoft just introduced an AI-powered smart backpack that's more than just a storage space. Loaded with AI capabilities, this backpack can see, hear, and interact with the environment around you, taking digital assistance to a whole new level.
Autonomously Driving into the Future
Wayve has introduced LINGO-1, an open-loop driving commentator, harnessing natural language in autonomous driving to enhance the interpretability and training of driving models. They are taking inspiration from LLMs and combining language with vision and action data to create VLAMs.
3D Full-Head Synthesis in 360°
Meet PanoHead, the game-changer in the world of 3D human head creation. Unlike other 3D models, PanoHead delivers stunning, consistent views from all angles, not just front-facing ones. The secret sauce? A two-stage image alignment and a clever neural volume representation, trained on random real-world images.
OpenSource Falcon-180B
TII releases the opensource LLM Falcon 180B, a scaled-up version of the Falcon 40B model, incorporating innovations like multiquery attention to enhance scalability. It outperforms Llama 2 70B and OpenAI's GPT-3.5 on MMLU and is on par with Google's PaLM 2-Large on various evaluation benchmarks.
October 2023
DALL.E 3 Now in ChatGPT
DALL.E 3 is now accessible in ChatGPT Plus and Enterprise users. OpenAI has developed a safety mitigation stack for DALL·E 3, ensuring it is ready for a wider release. They have further integrated a multi-tiered safety system to prevent the generation of potentially harmful or inappropriate content.
Pegasus-1 Transforms Video-Understanding
Twelve Labs introduces Pegasus-1, a cutting-edge video-language foundation model, alongside a suite of Video-to-Text APIs, capable of video summarizing, custom text generation, and video analysis. With 80B parameters, the model exhibits a significant 61% performance improvement, outperforming existing models.
ChatGPT to Make Chat (Ro)Bot
Boston Dynamics turned its four-legged robot, Spot, into a chatbot to serve as a talking tour guide in its facility, using ChatGPT along with VQA, speech-to-text API (OpenAI’s Whisper) and text-to-speech API (Eleven Labs) for interactive capabilities. In a video posted by Boston Dynamics, we can see the robot effortlessly engaging with audiences, giving witty and informative responses.
November 2023
XAI’s Grok enters the LLM game
Introducing Grok by XAI - the new contender in the arena of language models, bringing a blend of humor, knowledge, and a rebellious streak. Developed in a record time of four months, Grok-1 is the AI that not only matches wits with established models like ChatGPT-3.5 but is also hot on the heels of GPT-4.
Open-Source LLMs with 200k Context
01.AI, an emerging player in the field of AI open-sourced its Yi series models, including Yi-6B and Yi-34B. These bilingual (English/Chinese) models were developed from scratch and boast a huge context window of 200k tokens.
OpenAI Releases GPTs and Assistant API
GPTs: OpenAI just releases GPTs, a simple way to create a tailored version of ChatGPT for your specific tasks without writing any lines of code.
Assistants API: Assistants API enables the creation of agent-like applications with advanced features including a Code Interpreter for executing Python and data visualization, Retrieval for integrating external knowledge sources, and Function Calling for incorporating user-defined functions directly into assistant responses.
Anthropic Unveils Claude 2.1
As OpenAI grappled with its challenges, Anthropic stepped up with the release of Claude 2.1, now available via API. This update brings substantial enhancements to enterprise applications, featuring a groundbreaking 200K token context window and a significant reduction in the generation of false information.
December 2023
Google Unveils Gemini: Multimodal AI Set to Rival GPT-4
Google has finally releases Gemini, a multimodal AI model, meaning it can understand, operate across, and combine different types of information, including text, images, audio, video, and code. This AI model is not just an answer to OpenAI's GPT-4 but a leap forward in multimodal AI technology.
Midjourney's Creative Leap with v6
Midjourney has announced the alpha release of Midjourney v6, currently available on Discord. The v6 model improves significantly in how it follows prompts, even with longer inputs. This means you can expect a more precise translation of your textual descriptions into images. The model's understanding has been refined, resulting in more coherent and knowledge-informed outputs.
Apple's Solution for Running Larger LLMs
LLMs have immense computational and memory requirements, posing challenges for devices with limited DRAM capacity. Researchers at Apple have proposed a technique that addresses this challenge in deploying advanced AI models, especially on resource-constrained devices. The technique allows models up to 2x the size of the device's DRAM capacity to be run, widening the accessibility of sophisticated AI technologies.
Google's Multimodal Video Gen AI
Producing coherent and dynamic large motions in video generation has been a longstanding challenge. Google has releases VideoPoet which employs the prowess of LLMs for an array of video generation tasks. This innovative model is capable of text-to-video, image-to-video, and video-to-audio conversions, along with advanced techniques like video stylization, inpainting, and outpainting.
💥 IMPORTANT 👇👇👇
Stay tuned for another week of innovation and discovery as AI continues to evolve at a staggering pace. Don’t miss out on the developments – join us next week for more insights into the AI revolution!
Click on the subscribe button and be part of the future, today!
📣 Spread the Word: Think your friends and colleagues should be in the know? Click the 'Share' button and let them join this exciting adventure into the world of AI. Sharing knowledge is the first step towards innovation!
🔗 Stay Connected: Follow us for updates, sneak peeks, and more. Your journey into the future of AI starts here!
Reply