Responsible Use: How Meta Responds to One of the Central Fears of Open Source AI

Blog Post
Photo of a spacious and well-lit office setting where two individuals are collaboratively working on AI development. A young Asian male is intently typing on a keyboard, while a middle-aged African female stands next to him, pointing at a specific line of code on a large computer monitor. The screen displays a vibrant code editor with colorful syntax highlighting.
Image generated by OpenAI's DALL-E 3. 2023.
Nov. 6, 2023

Last Monday, the White House released an Executive Order that instructs the government to play a larger role in advancing “responsible global technical standards for AI development.” The very next day, Mozilla posted an open letter signed by dozens of technologists, researchers, and others in the artificial intelligence (AI) field. This “Joint Statement on AI Safety and Openness,” argues that in order “to mitigate current and future harms from AI systems, we need to embrace openness, transparency, and broad access.” This letter is the latest in an ongoing debate about open versus closed architectures for AI models, a topic increasingly central to discussions about securing AI, as well as those about regulating it.

Open source AI models might pose different risks than their closed-platform counterparts, but those risks need to be weighed against the many potential benefits of using open models. So much of the innovation we will see in AI is a function of the insight and creativity that comes from mixing open source code with use cases its original developers did not imagine. And while open-sourced models are available for bad actors to abuse, popular models can build on one of open source’s greatest strengths, the possibility that it can be vetted and improved by a broader range of security experts and developers—or, as the open letter puts it, “the idea that tight and proprietary control of foundational AI models is the only path to protecting us from society-scale harm is naive at best, dangerous at worst.” As legislative bodies around the world—including in the United States—consider various approaches to AI governance, we need to make sure we’re being thoughtful in how we address concerns without limiting the potential of open source AI models.

To explore what it means to open source an AI model, or how these models might be used responsibly, we’ll dig into some of the technical aspects of Meta’s openly available large language model (LLM), known as LLaMa.

Meta’s Open Sourcing of LLaMA

An LLM is part of a subset of AI known as generative AI. Broadly speaking, generative AI are technologies that are trained on very large amounts of data, and that use that background information to generate similar content. Generative AI can be used to make images, music, and—as in the case of ChatGPT and many other LLMs—text. In this case, open sourcing an LLM means making the language model itself available for download, along with software for developers to use to work with the model.

Building this sort of large general use model is expensive. Meta lists this among its reasons for releasing LLaMA, remarking that the “compute costs of pretraining LLMs remains prohibitively expensive for small organizations.” Meta also recognizes that the costs are not only monetary, and that more people training LLMs “would increase the carbon footprint of the sector.” And they have a point: According to Meta’s estimates, training version 2 of LLaMA used about as much electricity as it would take to power 70 homes for a year.

When LLaMA 2 was released earlier this year, Meta published an accompanying Responsible Use Guide. The guide offers developers using LLaMA 2 for their LLM-powered project “common approaches to building responsibly.” Reading the guide, one notices two things. First, Meta is placing a huge amount of trust on downstream developers for responsible behavior. And second, even developers with the best intentions may struggle to act responsibly. The Responsible Use Guide warns developers that they are making “decisions that shape the objectives and functionality” of their LLM project in ways that “can introduce potential risks.” It instructs them to take care to “examine each layer of a product” so that they can determine where those risks might arise.

Overall, while the Responsible Use Guide offers a good discussion of development considerations, it is also somewhat frustratingly inspecific. It could benefit from discussing hypothetical use cases as a way to explore how a team could approach these issues. Even though there is a deep technical discussion in the white paper, the Responsible Use Guide could be made more technical—or at least do more to point towards the technical documentation that might be relevant to responsible use.

Still, Meta’s guide implicitly demonstrates that it takes a great deal of forethought, planning, and testing to responsibly use AI. Even with all that effort, things might still go wrong. This is likely why the guide spends so much time on the need for developing mechanisms for users to report problems (e.g., a button to push when the AI generates troubling content), and teams being set up to react to those reports when they come in.

To better appreciate the complexity that goes into the decision making, it helps to understand some of LLaMA’s layered structure, how some of its layers interact, and how each layer can be fine-tuned in ways that may have effects on other layers. LLaMA’s base layer is its foundation model, a huge pretrained and pre-tuned English language (mostly) model. The model is remarkably large, with two trillion tokens (AI jargon for pieces of information about text) used in its training. Meta distributes three sizes of the model, each with differing parameter counts (7B, 13B, 70B). In AI neural networks, parameters are systems of numbers that define the relationships between the data in the training set—for instance, weighing the strength of connection between tokens. A higher parameter count means the model has made more connections between its tokens. Pretraining, as the guide puts it, is the process where “a model builds its understanding of the statistical patterns across the sample of human language contained in its training data.” The foundation model is what powers the AI’s ability to understand prompts and generate human sounding content.

LLaMA allows developers to further pretrain their LLaMA instance with domain specific information. For example, if a team of developers are building an AI to help prospective students compare information about different colleges, they might need to add a bunch of data about colleges. The foundational model is used to understand the questions and craft answers, but the answers are further shaped by all the extra data LLaMA is trained on. For this example, it’s college data, but it could be information about anything from music discographies to catalogs of auto parts.

After further training, developers can also write rules that prohibit certain types of questions from being asked and prevent the AI from giving certain kinds of answers. This can be done either via simple rules (e.g., instructing the model that a list of words is offensive), or by training the AI on other texts. To be sure, there are other places where developers will make tuning decisions, but the interaction of these three layers illustrates the complexity of fine-tuning AI.

Returning to the earlier example of a team building an AI that compares college data for prospective students, let’s say this team has a deep commitment to making sure their chat bot cannot be coaxed into providing answers that describe or recommend violent behavior. The team could attempt to remove all violent language from the foundation model (because it’s open source, they can attempt to do that). However, if the foundation model doesn’t have any violent language, how will the AI process a later instruction to filter out violent questions and answers? In other words, if you over-tune the base model, there’s not enough contextual information for the AI to understand what violence is. Furthermore, filtering at those later stages is deeply context specific. For example, developers of a chatbot designed to help with HR functions would presumably exclude descriptions of body parts that would be completely appropriate for an AI chatbot designed to answer medical questions.

Improvements to the Guide

With its Responsible Use Guide, Meta is relying on development teams to not only envision the positive ways their AI system can be used, but to understand how it might be abused, attempt to abuse it themselves in the testing process, and mitigate against abuse. It is a lot to ask, but it is not too much. There are so many examples of security flaws being rooted in a lack of care or attention paid to security and abuse in the development process. The guide shows that Meta is thinking deeply about how LLaMA can be abused—but it should do more than ask developers to be attentive and thoughtful in the face of complexity, and provide more specific guidance for developers as they set out to build with AI.

Creating an ecosystem of developers building AI for their niche use cases is not only one of the potential benefits of open source AI—it also points to an important element of responsible use. Meta’s guide makes only a brief, one-paragraph mention of defining use cases, but a well-understood use case may be one of the best tools available for developers interested in preventing the abuse of their AI projects. One of the risks of big LLMs is that in trying to do (or know) too much they become easier to trick or more likely to spew fake nonsense. By clearly scoping what the AI is doing, developers can shrink the amount of downstream misuse they have to consider, including privacy risks that could result from overbroad data collection. Returning to the college data example, if the AI is trained just on that data, and supposed to only answer questions about higher education, any prompt asking about anything else can be rejected with a “that’s not a college data question” response. There’s still plenty of room for mischief within prompts about higher ed, but the scope of the problem likely doesn’t include thinking through scenarios like the AI producing chlorine gas recipes. Thinking of all the things that could go wrong when using a model as powerful as LLaMA is largely impossible, but tightly defining a use case for the AI can meaningfully help to limit downstream harms.

Beyond LLaMa, limiting the abuse of tools built on open source AI models will look different than doing so on the large closed models like ChatGPT—but different need not mean worse. Open source models give developers options that are not possible on models trying to do everything for everyone. And it is in those places—AI doing specific things for some people—where we might see some of AI’s most profound positive impacts.

Related Topics
Platform Accountability