#070 Special Feature - Now Apple is Leading the AI Revolution with 3D Avatars and Language Models.
Fresh & Hot curated AI happenings in one snack. Never miss a byte 🍔
This snack byte will take approx 3 minutes to consume.
AI BYTE # 📢: Now Apple is Leading the AI Revolution with 3D Avatars and Language Models
⭐ One of the most impressive achievements that Apple has recently announced is a new technique for generating animated 3D avatars from short monocular videos (i.e. videos taken from a single camera).
In simple terms, it’s called 3D Avatars from Video
The technique, called HUGS (Human Gaussian Splats), can automatically learn to separate the human and the background scene from a video, and create a realistic and animatable 3D model of the human.
This model can capture details like clothing and hair, and can be reposed and viewed from different angles.
The new technology lets people put different digital characters, or “avatars,” into a new scene using just one video of the person and the place. This can be done quickly, with the image updating 60 times every second to make it look smooth and realistic.
HUGS is up to 100 times faster than previous methods, and can produce photorealistic results after only 30 minutes of training on a typical gaming GPU.
The researchers demonstrate the potential applications of HUGS for virtual try-on, telepresence, and synthetic media.
Language Models on Devices
Another breakthrough that Apple has achieved is a new method for deploying large language models (LLMs) on devices with limited memory, such as the iPhone and iPad.
LLMs are powerful AI systems that can understand and generate natural language, such as GPT-4, which contains hundreds of billions of parameters. However, running such models on consumer devices is challenging, as they require a lot of data and computation.
The proposed system minimizes data transfer from flash storage into scarce DRAM during inference.
Their method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding them to optimize by reducing the volume of data transferred from flash and reading data in larger more contiguous chunks.
The system uses two main techniques: “windowing”, which reuses activations from recent inferences, and “row-column bundling”, which reads larger blocks of data by storing rows and columns together.
On an Apple M1 Max CPU, these methods improve inference latency by 4-5x compared to naive loading. On a GPU, the speedup reaches 20-25x.
This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility.
The optimizations could soon allow complex AI assistants and chatbots to run smoothly on iPhone, iPads, and other mobile devices.
The Future of AI is Apple
As Apple potentially integrates these innovations into its product lineup, it’s clear that the company is not just enhancing its devices but also anticipating the future needs of AI-infused services.
By allowing more complex AI models to run on devices with limited memory, Apple is potentially setting the stage for a new class of applications and services that leverage the power of LLMs in a way that was previously unfeasible.
Furthermore, Apple is contributing to the broader AI community, which could stimulate further advancements in the field. It’s a move that reflects Apple’s confidence in its position as a tech leader and its commitment to pushing the boundaries of what’s possible.
If applied judiciously, Apple’s latest innovations could take AI to the next level (literally in your hands).
Photorealistic digital avatars and powerful AI assistants on portable devices once seemed far off — but thanks to Apple’s scientists, the future is rapidly becoming reality.