In the AI Age, Data Literacy Should Be a Human Right

Blog Post
April 4, 2023

Unsettling conversations with artificial intelligence chatbots, such as Microsoft Bing’s Sydney, have triggered fresh anxieties about big tech’s “move fast and break things” strategy. The transformative impact of AI on the future of search technologies and other everyday applications is not only raising questions about the blurring of lines between humans and machines, but about what to do when the next big thing comes.

The large language models that power Sydney, ChatGPT, and Google’s Bard have been hyped as opening new frontiers, but the “I” part of their AI functions are trained on astronomical amounts of data siphoned from various facets of real human experiences and users in the digital domain. As these technologies are brought to bear, we must grapple with the costs and benefits of the datafication of the human experience. As an added wrinkle, we are often doing so with a limited understanding of how these technologies work or what they are doing to produce the wild – and sometimes wacky – outputs that we’ve seen to date.

Tech companies are leveraging massive collections of user data to churn out AI-fueled products, effectively unleashing them on the public, only to realize their potential harms later. For example, after New York Times tech columnist Kevin Roose wrote about his creepy encounter with the AI-powered Bing search engine that runs on ChatGPT-4, Microsoft briefly lobotomized its sci-fi villain chatbot, Sydney. But Microsoft is again experimenting with how users engage with Sydney, trying to find a balance between startling, aimless chats and utility.

These recent developments emphasize the need for literacy as we embark upon the AI revolution. In my forthcoming book, We, the Data, I propose one way to help users is to declare a human right to data literacy. Data literacy can be defined as “the ability to read, work with, analyze, and argue with data.” It is a collection of skills needed to help people navigate the digital age. We know literacy helps spread knowledge. Typically, literacy is associated with the written word, and perhaps numeracy. However, it encompasses more than just these skills. It denotes competence and the capacity to function in society. Without the ability to read, write, or perform basic math, we lack the skills needed to navigate everyday decisions. In today's world, where digital technologies are pervasive, data literacy is as essential as traditional literacy.

Data literacy is not data science; we do not all need to become experts in managing and manipulating data. However, we do need to understand how data is created, and the implications of the datafication of our lives. We need to learn that there are no “raw” data, and how we all can “cook” information to make data. But simply expecting citizens to shoulder responsibility for data literacy or requiring states to provide basic training is not enough. The recent furor over large-language models highlights the urgent need for social literacy among companies developing AI and other data-intensive technologies.

Tech companies have repeatedly demonstrated a lack of social literacy, evidenced by algorithms that harm teenage girls and image search results that have associated Michelle Obama with apes. These missteps reveal a concerning lack of awareness and consideration for the social and political consequences of data-intensive technologies by those responsible for their development and use. The absence of systematic regulation for the training of technologists to build human-centered technologies is alarming, yet we have no lack of companies claiming their AI or other products are such. We also know that “soft” employees – humanities and social science-trained workers – in tech companies are often both sidelined and expected to fix the shortcomings of their technical counterparts. The high-profile 2020 firing of prominent AI ethics researcher Timnit Gebru from Google raised questions of how well companies tolerate criticism of their leading products more generally.

To ensure that technology is beneficial rather than harmful to society, technology companies must become more socially literate and allow for diverse perspectives in product development. One place to start is with a general code of ethics for AI creators. Although hundreds of AI-related codes exist, developed by various stakeholders and concerned parties, voluntary ethical codes are inadequate.

Digital platforms have become tools for disruption, but their respective companies lack the capacity and understanding to evaluate their wider social and political effects. Concepts like “ethics,” “fairness,” “privacy,” and “human rights,” are more than buzzwords or boxes to tick. They are rooted in rich histories that reflect social understandings and disagreements in human communities, spanning multiple research fields. Social science and humanities departments hold deep wells of knowledge (and debate) on these ideas.

In the face of new challenges, like AI and datafication, that disrupt our communities and existing understanding, it’s not enough to throw up one’s hands and claim ignorance. Becoming familiar with the depth of non-technical research requires training. Currently, only a few computer science programs have begun integrating ethics into their student training. This is not for a lack of expertise in tech ethics or offerings in courses about the ramifications of technology. Ethics are just the beginning if we are to pursue a broader effort to create more responsible and socially aware technology.

Privately owned digital platforms have immense power in shaping our communication and knowledge. Meta platforms (Facebook, Instagram, Messenger, WhatsApp) have nearly 3.75 billion users worldwide. Google conducts 5.9 million searches per minute. In democratic systems, it is crucial for citizens to understand one another and recognize our interdependence to effectively interact. This requires not only accessible information, but also the creation of new norms around acceptable communication and information verification. Data literacy is an essential component in a 21st century democracy and is crucial to preserving an open society.

On the flip side, companies should not only be sensitive to, but accountable for, the social, political, and cultural ramifications of their products, beyond just their technical or economic performance. For too long, certain types of skills and knowledge have been valued over others in developing life-changing technologies. With billions of people engaging with their technologies, demanding that companies are socially literate is not unreasonable. In fact, it’s increasingly unfathomable that they are not. There is no shortage of research on the consequences of datafication., and it's time to train future technologists to account for these findings from the social sciences and humanities in the development of data-intensive technologies.

Literacy is a basic cornerstone of functioning in human society. As we navigate our digital realities, it’s crucial that we prioritize data and social literacy, especially if we hope to leverage emerging technologies like AI for the betterment of humanity.

Wendy H. Wong is Principal’s Research Chair and Professor of Political Science, University of British Columbia, Okanagan. Her research focuses on the governance of emerging technologies. Her book We the Data: Human Rights in the Digital Age will be published by MIT Press in October.