unwind ai
Posts
Scaled-Down Llama 2 for Edge AI Applications

Scaled-Down Llama 2 for Edge AI Applications

PLUS: AI vs Humans in Creativity, GPT-4V as a Web Agent

January 08, 2024

Today’s top AI Highlights:

LiteLlama: Reduced-Scale Llama 2
Can AI Be as Creative as Humans?
GPT-4V(ision) is a Generalist Web Agent, if Grounded
AI Tools for Role-playing Autonomous AI agents, Productivity, Converting Web Content to Markdown, Install and Run AI Models in one click

& so much more!

Read time: 3 mins

Latest Developments 🌍

Scaling Down Language Models to Scale Up ☄️

LiteLlama-460M-1T is a scaled-down open-source reproduction of Meta’s LLaMa 2 model, a part of a growing interest in small yet high-performing models in the AI research community, especially for applications on edge devices like smartphones and IoT systems, where computational resources are limited.

Key Highlights:

LiteLlama-460M-1T has 460 million parameters and was trained on 1 trillion tokens. Despite its smaller size compared to larger models, it still delivers impressive performance.
The model was trained using the RedPajama dataset and employs the GPT2Tokenizer for text tokenization. This combination allows LiteLlama-460M-1T to effectively process and generate text, making it suitable for a variety of text generation tasks.
In terms of performance, LiteLlama-460M-1T has shown comparable or even superior results to other models of similar size in MMLU. Its efficiency and reduced size make it a potentially ideal choice for deployment in environments with limited memory and computational capacity. There's also curiosity in the AI community about whether it can run on systems with only 4GB of memory.

How AI Compares with Human Imagination 🌌

Researchers from the National University of Singapore, Stanford, Google DeepMind, Microsoft, have proposed a novel approach to assessing creativity in artificial intelligence, challenging the long-standing belief that creativity is an exclusively human trait. The study introduces innovative concepts like Relative Creativity and Statistical Creativity, providing a new lens through which AI’s creative potential can be quantitatively measured and compared against human creativity. This approach, drawing inspiration from the Turing Test's methodology, offers a significant shift in how we understand and evaluate AI's capabilities in creative domains.

Key Highlights:

Relative Creativity evaluates AI's creative output by comparing it with that of a hypothetical yet realistic human creator. This approach determines an AI model's creativity based on whether its works are indistinguishable from those of the human creator. The study revealed that AI, when evaluated through this lens, can produce works that are challenging to distinguish from human-created content, indicating that AI models can reach a level of creativity comparable to humans.
Statistical Creativity quantifies AI's creative abilities by comparing its creations with those of actual human creators. It employs a distribution distance metric to evaluate whether an AI model can emulate the creative abilities of specific human groups, such as children or PhD researchers. In some cases, AI's creative outputs rivaled or even surpassed those of human groups, particularly in tasks where creativity was defined by specific, measurable criteria.
A significant practical outcome of the research is the Statistical Creative Loss function, which guides the training of AI models to enhance their creative capabilities. The application of this function in training sessions resulted in AI models that more effectively mimicked human-like creativity, as evidenced by the improved alignment of AI-generated outputs with the chosen human creative benchmarks.

Multimodal Model Uses Web Like a Pro Human 😎

The introduction of large multimodal models like GPT-4V(ision) and Gemini is pushing the boundaries of the capabilities of AI, beyond conventional tasks, such as image captioning and visual question answering. GPT-4V, in particular, has shown notable promise in this area. Its ability to follow natural language instructions for completing diverse tasks on any given website marks a significant leap in AI utility. In this realm, researchers have introduced SEEACT, a web agent that leverages GPT-4V for visual understanding and web interaction. GPT-4V presents a great potential for web agents - it can successfully complete 50% of the tasks on live websites!

Key Highlights:

GPT-4V(ision) sets a new standard in web-based task performance with a 50% success rate on live websites. This figure is notably higher than that of text-only LLMs like GPT-4 and other specialized models such as FLAN-T5 and BLIP-2, marking a significant advancement in the field.
While GPT-4V demonstrates advanced capabilities, the process of grounding—translating the model's textual plans into concrete actions on websites—poses a significant challenge. The most effective strategy developed involves integrating HTML text with visual elements, but it still shows a considerable performance gap compared to ideal (oracle) grounding, highlighting a crucial area for future research and development.
The development of GPT-4V(ision) illuminates the complex nature of web content, which often contains thousands of elements with intricate relationships. SEEACT's design focuses on integrated visual understanding and interaction with web content, demonstrating how advanced LMMs like GPT-4V can be effectively utilized as generalist web agents.

Tools of the Trade ⚒️

CrewAI: A framework designed for orchestrating role-playing, autonomous AI agents to work collaboratively on complex tasks. It enables AI agents to assume roles, share goals, and operate in a cohesive unit, enhancing the efficiency of multi-agent interactions.
Brill AI: An AI-powered productivity platform designed to streamline workflows and amplify productivity for individuals and teams by automating tasks, providing smart task management, and integrating personal and work tools. It targets knowledge workers and aims to maximize efficiency and work-life balance.
Pinokio: A browser-based platform that allows you to easily install and control various applications with one-click scripts. These scripts include tools for image and video editing, Stable Diffusion UIs, and voice cloning, facilitating user-friendly access to complex software functions.
Clipper: A command-line tool to streamline the extraction of web content and its conversion to Markdown, making it highly useful for building markdown datasets for LLMs or integrating into RAG pipelines.

😍 Enjoying so far, TWEET NOW to share with your friends!

Hot Takes 🔥

AI is not exploited enough at product level. WhatsApp, the most used IM app in the world: where are summarization of chats? Local audio transcription? Urgent chat detection? Automatic translation? And the list could continue. ~ antirez
Most of my dreams are generated and rendered on GPU these days. ~ Bojan Tunguz

Meme of the Day 🤡

That’s all for today!

See you tomorrow with more such AI-filled content. Don’t forget to subscribe and give your feedback below 👇

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!

Reply

or to participate.