- unwind ai
- Posts
- Gemini Pro Now Available as API 🖥️
Gemini Pro Now Available as API 🖥️
PLUS: Stability AI's 3D Generation Model, Fully Transparent Open-Source LLMs, Claude for Google Sheets
Today’s top AI Highlights:
Stability AI Releases 3D Object Generation Model
Open-source LLMs with Trule Open and Transparent Details
Google Releases Gemini Pro API, Imagen 2 in Vertex AI, and MedLM
OpenAI Partners with Another Publication House
Claude for Google Sheets
& so much more!
Read time: 3 mins
Latest Developments 🌍
Quality 3D Object Generation from Single Images 🔄
Stability AI has released Stable Zero123 transforming 3D understanding of an object’s appearance from various angles. It significantly enhances the quality and efficiency of creating novel 3D views from single images. Stable Zero123 outperforms its predecessors, Zero1-to-3 and Zero123-XL, in generating detailed 3D views of objects.
Key Highlights:
The leap in quality is attributed to an improved training dataset sourced from Objaverse, which focuses on high-quality 3D objects. Additionally, the model was introduced with an estimated camera angle, allowing the model to make more informed predictions, further enhancing the realism of the generated 3D objects.
For creating a single view of an object, it uses the same amount of VRAM as Stable Diffusion 1.5, making it efficient for simpler tasks. However, for generating full 3D objects, which involves creating multiple views and understanding the object in 3 dimensions, the model is more resource-intensive (24GB VRAM recommended).
Stable Zero123 is now integrated into threestudio. For those looking to transform text into 3D models, the process begins with generating a single image using a model like SDXL, and then Stable Zero123 takes over to craft the final 3D object.
Becoming Truly Open-Source 📖
LLM 360, a collaboration between Petuum, MBZUAI and Cerebras, is an initiative for fully open-sourcing LLMs. While there has been a surge in the number of open-source models being released like Mistral, Llama 2, and more, but are they truly transparent? LLM360 advocates for all training code and data, model checkpoints, and intermediate results to be made available.
To start with, the team has released two 7B parameter LLMs pre-trained from scratch, AMBER and CRYSTALCODER.
Key Highlights:
AMBER, an English language LLM, was pre-trained on a dataset comprising RefinedWeb, StarCoder, and RedPajama-v1, totaling 1.26 trillion tokens. CRYSTALCODER, aimed at English and code processing, was trained on a mix of SlimPajama and StarCoder data, with around 1.382 trillion tokens.
Both models employ an architecture similar to LLaMA 7B. AMBER features 6.7 billion parameters, a hidden size of 4096, and 32 attention heads. CRYSTALCODER includes modifications like maximal update parameterization and specific design choices for processing both language and code.
AMBER and CRYSTALCODER were released with extensive sets of model checkpoints (360 for AMBER). While AMBER outperforms Open Llama, MPT, and Falcon models and closely follows Llama 2, CRYSTALCODER also performs excellently in both language and coding tasks.
Start Building with Gemini Pro Now 👩💻
In a series of announcements, Google has released the Gemini API on Google’s AI Studio and Vertex AI. Google Cloud’s image-generation capabilities have been supercharged with the Imagen-2, Google’s most advanced text-to-image technology, now generally available on Vertex AI. Google has further released MedLM — a family of specialist foundation models fine-tuned for healthcare.
Key Highlights:
Gemini Pro API has been released for developer and enterprise use on Vertex AI and AI Studio. Standing out for its impressive 32k context window and support for 38 languages, it features robust capabilities like function calling, embeddings, semantic retrieval, custom knowledge grounding, and chat functionality. Currently free within certain limits, Gemini Pro also offers a dedicated vision multimodal endpoint for text and imagery inputs.
Imagen 2 is now generally available for Vertex AI customers. It excels in generating high-quality, photorealistic images from text prompts and supports text rendering in multiple languages, making it suitable for accurate text overlays in images. Additional features include logo generation, visual question answering, and caption generation from images.
MedLM, a family of foundation models specifically fine-tuned for healthcare applications, is based on the Med-PaLM 2 model and comes in two versions: a larger model for complex tasks and a medium model for scalability. Its applications range from improving ambient medical documentation in hospitals to enhancing the speed and quality of pre-clinical drug research and development.
Quick Updates from OpenAI 🤌
OpenAI has partnered with Axel Springer, a leading publishing house with POLITICO, BUSINESS INSIDER, and European properties BILD and WELT under its umbrella, to integrate ChatGPT with their recent and authoritative content on a wide variety of topics. ChatGPT users around the world will receive summaries of their selected global news content. OpenAI will further be able to train their LLMs from the publication’s content. This is OpenAI’s second such arrangement after Associate Press.
OpenAI's Startup Fund Team has launched Converge 2, a six-week program that offers tech talks, mentorship, and a $1 million investment from the OpenAI Startup Fund, focusing on builders using AI to innovate across various domains. It is open to founders of all backgrounds. Register open till January 26, 2024.
Tools of the Trade ⚒️
Claude for Sheets: Anthropic’s Claude can now be used in Google Sheets. Offers two functions: =CLAUDE() for general use and =CLAUDEFREE() for direct API calls. You can use it for text rewriting, translation, information extraction, and advanced Q&A, with features like in-sheet caching and API call control.
Music.ai: A comprehensive collection of state-of-the-art music APIs and AI audio solutions. It offers a range of modules for audio development, including stem separation, transcription, mixing, mastering, effects, and more. It provides tools for precise audio manipulation, AI-driven music creation, and analysis.
Coffee by Coframe: Build and iterate on your UI 10x faster with AI - right from your IDE. Coffee supports standard UI components and works with React codebases. This tool enables both the creation of new components and the editing of existing ones, facilitating the generation of clean, maintainable code.
Taskade: A productivity app that is not just a project management tool; it's an AI-powered workspace that integrates to-do lists, notes, mind maps, and more. It allows the creation and management of projects, tasks, boards, and calendars with AI and team chat support, and so much more.
😍 Enjoying so far, TWEET NOW to share with your friends!
Hot Takes 🔥
Strange to see VCs still funding LLMs when they should be funding mass-market robotics companies ~ Bindu Reddy
I asked a google engineer what it would take to build GPT-4 today. I will never forget his answer: "We can’t, we don’t know how to do it." ~ anton
Meme of the Day 🤡
That’s all for today!
See you tomorrow with more such AI-filled content. Don’t forget to subscribe and give your feedback below 👇
Real-time AI Updates 🚨
⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!!
PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!
Reply