• unwind ai
  • Posts
  • AI Chatbot that maps Human Emotions 💐

AI Chatbot that maps Human Emotions 💐

PLUS: OpenAI releases next-gen Voice Engine, OpenAI & Microsoft to build $100B supercomputer

Today’s top AI Highlights:

  1. Empathetic AI chatbot that changes its tone and style based on human emotions

  2. Microsoft and OpenAI join hands to build $100B supercomputer

  3. Google introduces ObjectDrop for realistic photo-editing with AI

  4. A voice-driven AI text editor that writes what you mean

& so much more!

Read time: 3 mins

Latest Developments 🌍

SPEAK Your Heart with EVI Empathetic Chatbot 🎙️

LLMs with emotional intelligence are not a very new concept. Inflection AI attempted something similar with its empathetic chatbot, Pi. But does text-based chatting hold up against the experience of speaking to a human, especially to share a range of emotions?

Hume AI has introduced a new conversational AI EVI, designed with emotional intelligence, with whom you can speak just like you’d to your human confidant. It goes beyond words to understand the emotions. Our voice carries a wealth of emotional information through nuances in tone, rhythm, and timbre, and EVI can recognize these subtleties in real-time. It speaks, pauses, stutters, and laughs just like a human!

Key Highlights:

  1. Human-Like Interaction: EVI responds with human-like tones, mirroring your emotional states. This is based on advanced detection of voice modulations, enabling it to adapt its responses in real-time.

  2. Advanced Conversational Features: It can detect end-of-turn to understand the conversational flow, avoiding awkward pauses or overlaps. It can also pause when interrupted and then pick up where it left off.

  3. Continuous Learning for Satisfaction: It can learn from users’ reactions over time. It self-improves, optimizing its responses to increase user satisfaction.

  4. Intelligence: It is powered by an empathetic LLM.

  5. Interpretation in real-time: EVI can interpret and express a wide range of human emotions, from amusement to distress, by recognizing over 25 patterns of speech prosody and 28 kinds of vocal expressions.

  6. EVI’s API: A single API will include fast and reliable transcription, text-to-speech services, emotional expression measurement across hundreds of dimensions, and the ability to hook into any LLM.

OpenAI Teases Again with a New Voice Cloning Model

OpenAI has been releasing demos of its text-to-video model Sora and we’re eagerly waiting for it to be publicly available. But before the anticipation could settle, OpenAI has announced a new model but it’s not available for use, again. The company has developed Voice Engine, an AI model for voice cloning that uses a 15-second audio sample and text input to almost perfectly clone the voice.

Key Highlights:

  1. Development and Testing: Voice Engine was first developed in late 2022. It is being tested with “trusted partners” for applications like non-readers and children, content translation, and improving essential service delivery in remote settings.

  2. Training and Data Use: The model is trained on a mix of licensed and publicly available data, with details on the training data being closely guarded considering the ramification of copyright issues.

  3. Editing: TheVoice Engine currently doesn’t allow editing the generated output. There are no options for adjusting the tone, pitch, or cadence of the voice.

  4. Pricing: Voice Engine will cost $15 per 1 million characters. It is quite cheap in comparison to the current-best in the industry - Eleven Labs - that charges $11 for 100,000 characters per month but provides editing features also. (Source)

Following is an example of translation from the HeyGen platform that is using OpenAI’s Voice Engine model.

Reference Audio:

Generated Audio in German:

Microsoft 🤝 OpenAI for a $100 Billion Supercomputer

Microsoft and OpenAI are joining forces on a very ambitious project of building a $100 billion supercomputer, called Stargate. Slated for launch by 2028, this project is part of a broader five-phase plan to erect a series of supercomputer installations over the next six years, to push the boundaries of can be done with AI. This supercomputer will probably be 100x more expensive than the largest data centers currently in operation.

Create an image that visually captures the essence of a groundbreaking collaboration between two technology giants, Microsoft and OpenAI, as they embark on an ambitious project to construct a $100 billion supercomputer named Stargate. The image should feature a futuristic data center with advanced, towering server racks illuminated by neon lights, signifying immense computing power. In the background, integrate elements that symbolize the partnership between these companies, such as their logos subtly merged into the architecture or displayed on screens. The scene should be energized with dynamic lighting to reflect the power and potential of the supercomputer. Additionally, incorporate visual cues to the project's challenges and scale, like engineers working on a giant, complex circuit board or digital displays showing AI algorithms at work. The overall atmosphere should convey a sense of advanced technology and strategic collaboration aimed at pushing the boundaries of artificial intelligence, without directly depicting any individuals or identifiable proprietary technology.

The US-based Stargate supercomputer will utilize millions of specialized server chips and could demand several gigawatts of power, possibly harnessing alternative energy sources such as nuclear power. Considering the scale of the project and shortage of special chips where NVIDIA is the dominant player, OpenAI is also reportedly planning to build its own chips.

The chip dilemna is just one of several details that still need to be ironed out for Stargate. The project faces other logistical hurdles and technical challenges related to maximizing GPU efficiency and cooling.

Will OpenAI and Microsoft be able to beat NVIDIA in the AI chip war? Let us know your thoughts in the comments! 👇

A Simple Dataset Shift for Realistic Photo Editing 🪄

Diffusion models are the most widely used models for image generation and editing. But they struggle with following the basic physical laws such as occlusions, shadows, and reflections. To make the editing more realistic and practical, Google has introduced ObjectDrop that involves creating a “counterfactual” dataset. This dataset, capturing scenes before and after object removal, allows for fine-tuning a diffusion model for precise object removal and insertion with realism in edited images.

Key Highlights:

  1. Object Removal: The model eliminates objects and their effects from images. Despite being trained on a relatively small counterfactual dataset captured, the model generalizes well to diverse scenarios.

  2. Inserting an Object: By training first on a large synthetic dataset created with the object removal model, and then on a high-quality dataset, the object insertion model can accurately model how an object affects its environment.

  3. Moving the Object: The model can also seamlessly move objects within an image. This involves removing them from their original position and re-inserting them elsewhere, resulting in realistic transformations.

😍 Enjoying so far, share with your friends!

Tools of the Trade ⚒️

  1. Glida AI: An AI-powered widget to enhance your customers’ interaction with your website. This widget looks like a human and sounds very realistic. It takes just a few clicks to create, can speak in multiple languages, and can handle custom customer journeys.

  1. Jan: Transform your computer into an AI machine by running AI models directly on your device or connecting to remote APIs. It is an open-source project and prioritizes local-first operations for privacy.

  2. OpenDevin: An open-source project aiming to replicate Devin, an autonomous AI software engineer who is capable of executing complex engineering tasks and collaborating actively with users on software development projects.

  3. Aqua: A voice-native text editor that lets you talk instead of type in, designed to make writing and editing by voice smooth and smart. It utilizes real-time audio analysis combined with an LLM to better understand user intent and facilitate writing and formatting without the need for typing.

Hot Takes 🔥

  1. OpenAI and MSFT want to build Stargate - a $100B GPU super cluster! Great! It’s time for Google to announce their $500B super cluster and Amazon to double down as well and start talking about their $300B cluster! They need to keep up with the Joneses ~Bindu Reddy

  2. In NYC you’re not rich enough In LA you’re not hot enough In SF you’re not autistic enough ~Gabe

Meme of the Day 🤡

Baboozled

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!

Reply

or to participate.