Open Source AI - Is it Really Open?

Fresh & Hot curated AI happenings in one snack. Never miss a byte 🍔

Nov 18, 2024

Image by Rudy and Peter Skitterians from Pixabay

This snack byte will take approx 4 minutes to consume.

The software industry is a dynamic duo. On one end, we see dazzling applications and services generating billions for the big players—Apple, Microsoft, Google—that most people recognize.

On the other side lies the backbone of the digital universe: open-source software, a realm where developers release the raw code behind products for free, enabling anyone to tweak, modify, and even profit from it.

This open-source movement, though rooted in the hippie-enthused idealism of the 1980s, is anything but quaint; it powers Google’s Android, Apple’s iOS, and most modern web browsers, and has become a billion-dollar ecosystem in itself.

Meta and Llama: Big Tech Goes “Open Source” (Sort of)

Meta’s decision to open-source Llama, its large language model (LLM), is a landmark move. The goal?

Tap into the open-source tradition and, perhaps, recruit the global hobbyists and small firms eager to innovate. If you’ve got the model's “weights”—the intricate neural connections within Llama’s brain—you can experiment and adapt it, though you can’t use it to serve more than 700 million monthly active users. This cap might sound arbitrary, but it’s Meta’s attempt to protect its model from companies who might “go Meta” with their AI ventures.

Meta’s push also highlights the competitive stakes. While open-source pioneers believe in “total transparency,” Meta has chosen to release the model’s weights but not the code or data that powers it. Why? Training an AI isn’t like building an app where you can copy the lines, add some features, and call it your own. Each AI training run leads to unpredictable outcomes, which means that even with the exact data and code, each Llama 3 replica would turn out slightly different.

Why Did Meta Open-Source Llama?

Crowdsourcing Innovation: By allowing developers access, Meta hopes to crowdsource ideas and improvements, something the company simply couldn’t do in-house without hiring thousands more engineers.
Strategic Positioning Against Closed-Source Rivals: Companies like OpenAI and Google keep their models tightly guarded. But Meta, by offering open-source access, hopes to establish itself as a friendlier alternative, gathering support from independent developers and smaller firms.
Reputation & Influence: Open-source projects often enjoy a strong community-driven reputation, and with the current scrutiny on big tech, this move gives Meta a more collaborative, benevolent image in AI development.

Meta’s Challenges: Competing with Closed-Source Giants

Meta’s approach of offering a partially open-source model is bold, but it comes with challenges, especially given the high-stakes competition with companies that refuse to open-source. OpenAI, for instance, guards its models tightly, which helps maintain control over how the models are used, tested, and improved. Closed-source giants enjoy the benefits of direct monetization, as they can sell access to powerful models while keeping AI advancements (and profits) in-house.

While Meta’s open-source move garners goodwill, it’s yet to be seen if this can translate into the competitive edge needed to keep up with OpenAI and Anthropic, who invest billions in proprietary developments. The catch? Open-source models generally lag a few steps behind closed-source titans due to the vast resources the latter allocate for constant improvement.

Several companies have followed the open-source path, though with varying levels of commitment:

Mistral: The French AI startup released its own LLM as open-source, though like Meta, it imposes restrictions.
Hugging Face: Known for its open-source AI repository, Hugging Face enables collaboration and offers tools for model training, hosting, and scaling.
Alibaba: The Chinese tech giant released models with usage constraints, positioning itself as a regional open-source leader.
Stability AI: Primarily known for its generative art models, Stability AI open-sources models to make it easier for companies to develop custom applications without the need for costly proprietary tools.

Why Not Fully Open Source?

The open-source concept itself is contested in the AI space. The traditional software definition of “open source” demands transparency and freedom to use, modify, and distribute the software without restrictions. But the AI landscape is different:

Astronomical Costs: Building an AI model from scratch with the capabilities of Llama or GPT-4 demands enormous resources—financially and computationally. Open-sourcing would mean letting competitors benefit from years of work without the associated costs.
Ethical and Safety Concerns: Full transparency risks making powerful AI accessible for malicious purposes. With unrestricted access, there’s a potential for harmful applications, from spreading misinformation to creating sophisticated cyber threats.
Profitability: OpenAI, Google, and others profit by offering closed-access APIs that enable companies to integrate powerful AI without risking intellectual property. Meta’s challenge is to convince users that “open enough” still adds value.

Open Source vs. Closed Source: Who Comes Out on Top?

In the quest for dominance, open-source models face a delicate balancing act. Closed-source models allow companies to maintain higher-quality control, better security, and more consistent monetization pathways.

For open-source advocates, however, the long-term benefits lie in collaborative improvements, lower costs for developers, and quicker democratization of AI technology.

While Meta’s decision to open-source Llama reflects an ambitious approach to drive the industry toward openness, the future remains uncertain.

As more tech giants reconsider what open source can mean in the AI world, regulators may soon step in to define standards. One thing’s clear: AI’s future is being built on the tension between collaboration and control, openness and secrecy.

AI Snack Bytes

Discussion about this post

Ready for more?