#105 Self-Improving Language Models: A New Frontier for AI

Fresh & Hot curated AI happenings in one snack. Never miss a byte 🍔

Feb 21, 2024

This snack byte will take approx 3 minutes to consume.

AI BYTE #1 📢: Self-Improving Language Models: A New Frontier for AI

⭐ Language models are powerful tools that can perform a variety of natural language processing tasks, such as answering questions, generating text, and reasoning.

However, most language models require a lot of human supervision and labeled data to fine-tune their performance on specific domains or tasks.

What if we could train language models to self-improve without any external feedback, just like humans do?

In a research paper, Huang et al. (2022) proposed a novel method that enables Large Language Models (LLMs) to self-improve by using only unlabeled datasets. Their method leverages the idea of Chain-of-Thought (CoT) prompting, which guides the language model to generate intermediate steps of reasoning for a given question, and self-consistency, which selects the most reliable answer based on multiple reasoning paths.

By fine-tuning the language model on its own generated CoT reasoning paths, the authors showed that the model can improve its general reasoning ability and achieve state-of-the-art-level performance on several benchmarks of commonsense reasoning and natural language inference, without any ground truth label.

The authors used a pre-trained LLM with 540 billion parameters, called PaLM-540B, as their base model. They applied their method on six datasets that cover different types of reasoning tasks:

Arithmetic Reasoning,
Commonsense Reasoning, and
Natural Language Inference.

For each dataset, they used a few human-written CoT examples as prompts to generate multiple CoT reasoning paths and answers for each question in the training set.

They then filtered out the reasoning paths that lead to the most consistent answer, which is determined by majority voting. They also augmented the reasoning paths with different formats of prompts and answers, such as standard prompting, CoT prompting, and zero-shot prompting.

They fine-tuned the language model on these self-generated reasoning-answer data,and call the resulting model as Language Model Self-Improved (LMSI).

The authors evaluated their method on both in-domain and out-of-domain tasks. They compared the performance of LMSI with the pre-trained PaLM-540B model, using three different prompting methods: standard prompting, CoT prompting, and self-consistency.

They found that LMSI outperforms PaLM-540B on all six in-domain datasets, with significant improvements ranging from 1.1% to 7.7%.

They also found that LMSI achieves new state-of-the-art results on commonsense reasoning and natural language inference, surpassing previous methods that use supervised data or diverse prompts.

The authors also conducted ablation studies and explored additional approaches for self-improvement. They showed that training with CoT formats is crucial for the success of their method, and that using self-generated questions or prompts can further reduce human effort.

They also showed that the knowledge learned by LMSI can be distilled to smaller models, such as PaLM-8B and PaLM-62B, which can outperform larger models without self-improvement.

The published paper demonstrates that LLMs can self-improve by using their own reasoning abilities, without relying on human supervision or external inputs.

This is a very promising direction for the autonomous evolution of LLMs, which can potentially lead to more versatile and robust AI systems.

AI Snack Bytes

Discussion about this post