AI and Copyright Law: The New York Times’ Generative AI Lawsuit is Just the Beginning

The rise of generative artificial intelligence has sparked a heated debate about AI and copyright law. The New York Timeslawsuit against OpenAI and Microsoft is not the first of its kind. But it is a pivotal one that could shape the future of AI development and intellectual property rights.

The New York Times, a media powerhouse, employs nearly 6,000 people and represents 3.5% of the U.S. newspaper industry. Its sheer size and influence mean its decision to sue these AI giants serves as a statement about the value of content and the rights of its creators. This case goes beyond copyright infringement disputes to test the ability of AI tools to generate content.

At the heart of this legal battle are the methods AI developers use to train their large language models. Generative AI tools, such as OpenAI’s ChatGPT and Google’s Bard, are trained on massive datasets containing copyrighted material. The lawsuit raises critical questions about whether the use of copyrighted content to train AI models is fair use or violates the rights of the content creators.  Fair compensation for creators is no longer a hypothetical question, but an urgent one.

The outcome of this litigation is likely to influence how AI developers approach model training, potentially leading to changes in how data is sourced and used. This shift could have far-reaching consequences for the AI ecosystem, affecting everyone from developers to end users.

But there is a larger issue that is not getting as much attention as it deserves: the potential for these models to misrepresent original sources and spread misinformation. 

The issues at stake strike at the heart of information authenticity, especially as AI-generated content becomes seamlessly integrated into our daily routines. AI’s learning process is prone to inaccuracies, often resulting in content that misrepresents its original context. Such content will become ubiquitous, appearing in search engine results, voiced by AI assistants like Alexa and Siri, and embedded in our electric vehicles and smart appliances. 

Unlike the spread of misinformation on social media platforms, the one-on-one nature of human interactions with generative AI systems makes monitoring and addressing falsehoods in real time a daunting task. 

Unchecked misinformation adds to the woes of our fragile information ecosystem. This double-edged sword cuts deep for content creators, who face not only the injustice of inadequate compensation, but also the threat of being misrepresented, with little recourse to challenge the AI developers responsible.

The need for new legislation specifically addressing the use of copyrighted material in AI training is clear, as is the need for mechanisms, such as “red teaming,” to ensure that generative AI systems accurately represent and source the material they use.

Public datasets from organizations like Common Crawl are essential for AI development. However, if every AI model relies on these public datasets, they will all start to behave similarly. To stand out, it is important for AI developers to continually train on unique, proprietary datasets. Ironically, this also means that AI developers benefit from protecting the rights of those who produce original content, ensuring a continuous supply of fresh, diverse data.

The legal battle between the New York Times, OpenAI, and Microsoft is a defining moment in the history of AI and the media industry. It challenges us to rethink our societal principles and the path we wish to chart as AI continues to advance.

Newspapers are essential to our democracy and civil society. To paraphrase a founding father, if it were up to me to decide whether we should have generative AI without newspapers or newspapers without generative AI, I would not hesitate for a moment to prefer the latter.

The future of AI will be shaped by the choices we make today. We must strike a balance between encouraging innovation and fairly honoring and compensating the creative minds that advance humanity. The resolution of this litigation will reveal our shared ideals and aspirations for the future we hope to create, as well as the ethos of our society.

Tinglong Dai is the Bernard T. Ferrari Professor of Business at the Carey Business School of the Johns Hopkins University. He serves on the core faculty of the Hopkins Business of Health Initiative and the executive committee of the Institute for Data-Intensive Engineering and Science.