unwind ai
Posts
Chinese LLM beats GPT-4-Turbo

Chinese LLM beats GPT-4-Turbo

PLUS: xAI to raise $6B, OpenAI Sora & GPT-4 competitor from China

April 29, 2024

Today’s top AI Highlights:

Chinese AI companies release competitors to OpenAI Sora & GPT-4 Turbo
Benchmark AI models with live data from Chatbot Arena
xAI is raising $6 billion at a staggering $18 billion valuation
Canadian startup Extropic AI aims to build the ultimate AI chip by harnessing the physics of the world
Create videos with digital avatars that speak and express like humans

& so much more!

Read time: 3 mins

Latest Developments 🌍

China Challenges the Strongest in the Global AI Market 💪

China has been advancing its AI technology, fiercely competing with the best in the Valley. At one end, SenseTime has released RiRiXin SenseNova 5.0, an LLM that leverages a unique hybrid architecture - cloud computing for powerful processing and edge computing for quick responses. It beats the latest version of GPT-4 Turbo across all benchmarks with a significant 10% overall margin.

On the other hand, Shengshu Technology has introduced Vidu, a new and powerful text-to-video AI model competing with OpenAI’s Sora. The model can create high-definition videos from simple text prompts. While it may not quite match the level of detail seen in Sora, Vidu still boasts impressive capabilities in generating complex scenes.

SenseNova 5.0:

SenseNova 5.0 is trained on over 10T+ tokens of Chinese and English, including a large amount of synthetic data, and supports extensive context windows of up to 200K tokens.
The model beats GPT-4 Turbo and Llama 3 70B Instruct by a huge margin across all benchmarks including MMLU, knowledge, language understanding and writing, coding, and math.
In terms of multi-modal capabilities, SenseNova 5.0 ranks first in MMBench, the authoritative comprehensive benchmark test for multi-modal large models, surpassing all models including Qwen by Alibaba, GPT-4V, and Gemini Pro.

SenseTime launches a large-scale luxury family bucket! Show King of Fighters violently beats GPT-4, the first "Venice video" was released, WPS Xiaomi helped out on the spot

Vidu:

Duration: Vidu can generate 16-second video clips in stunning 1080p resolution. This is shorter than Sora’s capability of producing 60-second videos.
Realistic and Imaginative: Vidu can create videos with complex scenes that adhere to real-world physics, including details like lighting, shadows, and detailed facial expressions. It can also create surreal and non-existent content with depth and complexity. However, Sora currently holds the edge in overall visual realism and detail.
Multi-Angle: The model utilizes a unique architecture that allows it to simulate the real world using multiple camera angles, giving the videos a dynamic and professional feel.

LLM Chatbot Benchmarks with Live Data 📈

New AI models are released almost daily now with each showcasing state-of-the-art performance, as shown by their benchmark scores. But when put to use, the results do not live up to the expectations. This has sparked conversations if the test set of the benchmarks is leaked and the models overfit these benchmarks, showing impressive performance just on paper.

Establishing a reliable benchmark for LLM that remains current and in line with human preferences is a challenge where benchmarks like MMLU are insufficient to evaluate these LLMs. To address this, LMSys has introduced a new benchmark called Arena-Hard, which utilizes live data from the Chatbot Arena platform and a unique evaluation pipeline.

Key Highlights:

Dynamic Benchmarking: Unlike traditional benchmarks that become outdated, Arena-Hard utilizes live data from Chatbot Arena so that the benchmark continuously reflects current user interactions and preferences.
Cost-effective and efficient: At just $25 per model evaluation, it offers a more affordable evaluation method than existing benchmarks, using efficient data collection and processing techniques.
Separability and Agreement: Arena-Hard can differentiate between models based on their performance, with a separability score of 87.4%. This means it effectively identifies and ranks models from best to worst, aligning closely with human preferences at an 89.1% agreement rate.
Rigorous Evaluation: The benchmark uses advanced metrics like the Brier Score to assess the accuracy and confidence of model performances.

xAI Secures a Hefty $6 Billion from Investors 🤑

Elon Musk’s new AI venture, xAI, is close to securing a massive $6 billion in funding, valuing the company at $18 billion before the money even hits the account. Only ten months old and already stirring up the market, xAI is drawing heavyweight investors like Sequoia Capital and Future Ventures into its orbit.

The excitement is palpable as initially, xAI planned to raise $3 billion but it was changed to $6 billion as a number of investors clamored to get into the deal.

Musk’s plan for xAI is ambitious: to integrate vast amounts of data from his array of businesses, including Tesla, SpaceX, and Neuralink, to connect the digital and physical worlds and make AI more capable. Premium subscribers on X have already begun using xAI’s chatbot, Grok, which would funnel more data into these companies.

Future of Generative AI Compute Might be Thermodynamic

The current computing landscape is struggling to keep up with the growing demands of AI development, particularly in terms of energy efficiency and speed. Extropic, a Canadian company, has stepped in with its innovative approach: thermodynamic computing. This new paradigm harnesses the physics of electron behavior at low temperatures to perform computations, which could be a faster and more energy-efficient alternative to traditional digital computing. Extropic is currently developing and testing its first thermodynamic devices, which will then be scaled to full stack hardware and software optimized for AI applications.

Key Highlights:

Distinct from Quantum and Digital: Thermodynamic computing represents a new category of computing, different from both quantum computing and the binary-based digital computing used in today’s computers.
Harnessing Electron Dynamics: This approach leverages the “jiggly” and noisy behavior of electrons at low temperatures to perform computations - a unique way to represent and process information.
Probabilistic Computing Power: Instead of relying on definitive “on” or “off” states like digital computers, thermodynamic computing utilizes a range of states to represent probabilities and uncertainties, aligning well with the nature of many real-world problems.
Energy Efficiency: By working at low temperatures and utilizing the inherent noise of electrons, thermodynamic computing can significantly reduce the energy consumption associated with AI processing.

😍 Enjoying so far, share it with your friends!

Tools of the Trade ⚒️

Synthesia’s Expressive Avatars: AI-powered digital personas that can realistically deliver your script with proper expressions and sync. Previously, AI avatars could only deliver scripts with robotic voices and limited expressions, suitable for short content. With this, you can create natural and engaging videos with avatars that understand the context of your script, adjust their tone and body language accordingly, and even express emotions like a real actor would.

ChatLabs by Writingmate: Access and experiment with over 20 AI models in one place, including Gemini, GPT-4, Claude, Mistral and Llama models. You can leverage these models to perform various tasks such as writing different kinds of creative content, summarizing text, analyzing data, and answering your questions in an informative way.

Openlayer: Streamlines evaluation and testing of AI models through a straightforward workflow. It seamlessly integrates with your GitHub repository, triggering automatic tests with each commit to maintain consistent performance evaluation. It offers over 100 tests, accommodating various use cases and programming languages, and allows customizing the testing workflow.
Supertab: Monetize your AI applications more effectively. It lets you implement various pricing models beyond traditional subscriptions or advertising, such as charging based on query complexity, selling prompt packages, or offering timed access to your product.

Hot Takes 🔥

OK, now that I’m out, I can finally say this publicly: LOL, no, sorry, you are not catching up to NVIDIA any time this decade. ~Bojan Tunguz
As long as AI systems are trained to reproduce human-generated data (e.g. text) and have no search/planning/reasoning capability, performance will saturate below or around human level. ~Yann LeCun
The quality of the ChatGPT app is a direct indicator of how underwhelming GPT-5 will actually be. Ignore all the hype; look for tells that can't be gamed. ~Carlos E. Perez

Meme of the Day 🤡

I just checked into a hotel and, wow, they now provide guests with a complementary GPUs for LLM fine tuning while you sleep!

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!

Reply

or to participate.