News organizations and other creators are blocking access to the diverse, high-quality content that feeds GenAI models. Here’s a way for the two sides to work together:
  • GenAI companies need enormous quantities of training data, but content owners argue that collecting it without paying compensation is not fair use.
  • Closing the content vaults threatens to hamper the quality of GenAI output, placing investments in GenAI at risk and potentially stifling the development of new GenAI solutions.
  • What’s needed is a platform that sparks cooperation. For rights owners, it would ensure fair compensation. For GenAI companies, access to content. Blockchain is the ideal backbone for such a system.

Subscribe

Subscribe to our Digital, Technology, and Data E-Alert.

" "

Key Takeaways

News organizations and other creators are blocking access to the diverse, high-quality content that feeds GenAI models. Here’s a way for the two sides to work together:
  • GenAI companies need enormous quantities of training data, but content owners argue that collecting it without paying compensation is not fair use.
  • Closing the content vaults threatens to hamper the quality of GenAI output, placing investments in GenAI at risk and potentially stifling the development of new GenAI solutions.
  • What’s needed is a platform that sparks cooperation. For rights owners, it would ensure fair compensation. For GenAI companies, access to content. Blockchain is the ideal backbone for such a system.
News organizations and other creators are blocking access to the diverse, high-quality content that feeds GenAI models. Here’s a way for the two sides to work together:
  • GenAI companies need enormous quantities of training data, but content owners argue that collecting it without paying compensation is not fair use.
  • Closing the content vaults threatens to hamper the quality of GenAI output, placing investments in GenAI at risk and potentially stifling the development of new GenAI solutions.
  • What’s needed is a platform that sparks cooperation. For rights owners, it would ensure fair compensation. For GenAI companies, access to content. Blockchain is the ideal backbone for such a system.

Generative AI creates content, but it’s also creating friction that, unchecked, may keep the technology from realizing its potential. This conflict—GenAI companies versus content owners—needs to become a collaboration. And soon.

The problem: GenAI is a fast learner, but IP tracking and compensation aren’t its strong suits. Algorithms uncover patterns and relationships—essential for generating new content—by training on huge volumes of unstructured data, such as news articles, images, even programming code. These data sets are collected in such a wide-scale manner that details like IP rights and payment often get lost in the shuffle. That’s a concern for content creators. But it should also be a concern for anyone looking to, and counting on, the promise of GenAI.

GenAI is a fast learner, but IP tracking and compensation aren’t its strong suits.

In the absence of tracking that shows when and how models use protected content, and without fair compensation for that use, rights holders are placing their data under guard. Already, many media companies are taking technical steps—and stepping into courtrooms—to prevent artificial intelligence companies from training on their content. This threatens to hamper the quality of GenAI output, which would put investments in GenAI at risk and stifle the development of new solutions.

The answer is a distribution channel that recognizes the value that content creators bring to GenAI, sparking cooperation instead of collision. This platform would provide three key functions: visibility into the source and use of training data, a licensing mechanism, and a payment solution for fair and fast compensation.

Such a platform has so far been elusive, but blockchain offers a good way forward. Although it’s often discounted because of its perceived complexity, blockchain fosters transparency and trust between unfamiliar parties. And it can handle payments down to the microtransaction level. Blockchain can be the backbone for a platform—and an ecosystem—where all sides benefit, to the benefit of us all.

Fair Use Versus Fair Pay

Generative AI is a game changer—literally. Already, organizations are using it to write code faster, generate talking points for sales calls, even create concept art for computer games. The use cases keep growing and with them the market for everything GenAI: servers and storage, infrastructure as a service for training models, digital ads driven by the technology, specialized GenAI assistant software, and so on. According to Bloomberg Intelligence, GenAI is poised to be a $1.3 trillion business by 2032.

The immense value of the market is prompting creators to question the fairness of how GenAI uses their content. It’s not only that models may be training, free of charge, on a creator’s IP but also that the models may generate new content that infringes upon or competes with their own works. The counterargument—given voice in OpenAI’s response to a copyright infringement lawsuit filed by The New York Times—is that training GenAI models on publicly available internet materials is fair use. While the courts work that out (with some 25 lawsuits pending, according to the Copyright Alliance), rights holders are taking a more technical and more immediate approach: blocking the crawlers that forage the Web for training data.

Research by the Reuters Institute for the Study of Journalism found that by the end of 2023, 48% of the most widely used news websites were blocking OpenAI’s crawlers and 24% were blocking Google’s AI crawler. Those figures are almost certainly trending upward. In August 2024, Wired reported that a Who’s Who of media companies, including Facebook, Instagram, The New York Times, Financial Times, and Wired’s own parent, Condé Nast, were excluding their data from Apple’s AI training.

The off-limits signs take high-quality content off the table but also mean a shortage of diverse data. And that’s a problem, even as the overall volume of information continues to grow exponentially. Training on 200 pictures of a cat won’t help GenAI produce a picture of a dog. Models need both quality and diversity in their data sets. As content owners take more data out of circulation, we may see a downward spiral in the quality of GenAI output.

As content owners take more data out of circulation, we may see a downward spiral in the quality of GenAI output.

Another concern: as creators increasingly use GenAI in their own work (to augment and accelerate their process), lower-quality output impacts their end products. Ultimately, it’s not just the models that are starved of novel, compelling works. It’s all of us.

Finally, there’s concern about what this all means for GenAI innovation. First movers have already collected training data and as the spigots close—and fast followers find it harder to access sufficient content—their advantage will only increase. If new entrants can’t gain footholds, promising solutions may never make it to market.

It Takes an Ecosystem

The key to moving ahead—and ensuring that diverse, high-quality content remains available—is recognizing that at its core, this is really an ecosystem problem, with interdependencies between the participants. Thriving ecosystems, whether in nature or business, are all about balance. Participants are incentivized toward collaboration rather than competition. For the GenAI ecosystem to thrive, there needs to be a system for content rights and usage where GenAI companies and content creators alike benefit: where working together is a better option than working at odds.

An ecosystem for GenAI would need to offer at least the following functionality:

  • Transparency. Ensure visibility into where GenAI models source their training data and how they use that data. This enables rights owners to track the use of their content. It also helps GenAI developers confirm the quality of their training data, since it creates certainty about the source.
  • Licensing. Define clear and fair usage rights for the content that models can use for training. And create an automated mechanism for establishing and revoking licensing agreements.
  • Compensation. Implement a robust system for micropayments according to licensing terms.

Currently, no solution tackles all these elements. Legislative approaches, for instance, have focused on transparency.

The European Union’s Artificial Intelligence Act, which came into initial effect in August 2024 (with its provisions to be rolled out over a three-year period), requires the disclosure of copyrighted works that GenAI models use in their training phase. It also calls for a newly created AI Office to develop a “simple and effective” template for summarizing this information. Given the vast amount of training data involved, full transparency promises to be a challenge. And even in the best-case scenario, rights owners would know only that models are training on their content, not how those models then use the data to create new content. Nor does the act provide a mechanism for licensing and compensation.

In the US, legislation is fragmented, with most laws governing AI enacted at the state level—and typically focusing on consumer protection. At the federal level, the proposed Generative AI Copyright Disclosure Act does have an IP focus, but like the EU legislation, stops at transparency. The bill requires the disclosure of copyrighted works in training data sets, enabling rights owners to seek compensation, but it doesn’t address what fair compensation looks like or provide a path for obtaining it.

Traditional licensing mechanisms, meanwhile, are poorly suited for a GenAI world. When a model trains on content—learning patterns and relationships—that’s just the start of the story. For a given query, the model bases its output on specific pieces of training data. Say, for the sake of simplicity, the model trains on ten pieces of content. In responding to a query and creating a new piece of content, the model may rely on four of those ten pieces. Complicating matters further, each of those four pieces may contribute to a different degree, requiring payment proportional to that use. Conventional licensing agreements, memorialized in written legal documents, aren’t designed to track this intricate web of distribution.

Similarly, centralized content platforms are a less-than-ideal solution. Granted, many of these repositories—already a prime source of training data given their large collections of images, text, or video—facilitate rights management and compensation for creators. But centralized platforms also present drawbacks. The house “cut” can be high, reducing creator income. Reliance on an intermediary can lead to delays in payment. And that intermediary, which runs the show, often has outsize control over content distribution and pricing, with creators having little insight into how earnings are calculated and distributed.

Blockchain Unshackles Content

If the old ways won’t cut it, a novel approach becomes essential. What’s also clear is that technology is key. New technical standards and platforms are already tackling some of the challenges that widespread access to—and distribution of—digital media content present. For example, to help publishers and consumers verify the authenticity of online content and prevent the spread of misinformation, the Coalition for Content Provenance and Authenticity (C2PA) has developed an open standard for certifying the origin and history of any piece of content. Provenance data, such as who created the work and how it may have been edited subsequently, travels with the content as it flows across the internet.

Closer to home—for those on both sides of GenAI’s copyright clash—is an effort to use technology to tackle the attribution challenge. A startup called ProRata.ai is developing a GenAI platform that would track the specific content it uses to answer a prompt and share revenue with content owners accordingly.

Platforms like ProRata spotlight the need for—and viability of—a symbiotic approach: one where content owners and GenAI companies both benefit. Taking this idea further, leveling the playing field between all GenAI companies and all content owners will require a technological solution that fosters ecosystem growth.

Blockchain is a prime candidate. A distributed database that records information and transactions but prohibits alterations to existing records, blockchain ensures trust and fairness. Organizations use it to track and trade product carbon footprint certificates, facilitate financial transactions (down to the micropayment level), and record ownership of digital and real-world assets, among other applications.

These capabilities mean blockchain can track the origin and use of training data, provide an efficient means for licensing, and support microtransactions on a macro-level scale (crucial because each piece of content a GenAI model generates can be based on thousands of training inputs, resulting in a massive volume of very low payments). In short, it hits the trifecta of transparency, licensing, and compensation.

Blockchain hits the solution trifecta of transparency, licensing, and compensation.

A counterpoint—and it’s a reasonable one—is the computational cost of such a solution. Granted, current GenAI solutions are not optimized for keeping track of what training data models use when generating new content, and we estimate that the process could raise computational resource requirements by up to 30%. But optimized GenAI solutions should be able to track training data usage much more efficiently and require only 1% to 5% more computational resources. (See “Sparking the Discussion.”)

Sparking the Discussion

This article describes a model for creating a sustainable, level playing field for GenAI companies and content creators. But another goal—even, perhaps, a more important one—is to spark discussion around this critical topic. We see blockchain as a good way forward: the backbone for an ecosystem that recognizes the value content creators bring to GenAI but also recognizes the value GenAI companies unleash when they can access diverse, high-quality data. Not everyone agrees. And that’s okay. By working together, we ultimately work better.

Some counterpoints we heard while developing this article:

Why does it need to be an ecosystem? Does that really offer value for incumbent model developers? Wouldn’t they be more likely to enter into exclusive agreements to gain a data advantage?

Aren’t we overcomplicating the compensation mechanism? Why not just say “I’ll pay you $100 to use your work, regardless of usage at inference”? More certainty, lower variance.

If we did have an “at inference” or “pay per use” system, would we really be able to lower the computational costs enough to justify variable over fixed payments?

Our view is that current efforts tend to favor GenAI companies. When ecosystems are well executed, they can put pressure on incumbents while incentivizing content creators. And although our model is complex, none of the ideas already on the table feel like the perfect fit. With the right focus and effort, the computational costs can come down over time.

Seeking out—and hearing—all views is the key. To quote some well-known content creators: we can work it out.

It’s also true that blockchain is not a plug-and-play technology. The platform’s initial orchestrators—the content providers and GenAI companies leading the charge—will need to partner with a tech venture builder.1 1 As the platform grows, initial orchestrators are likely to pass the baton to a governing body that will maintain the platform and reflect the common interests of all parties. Notes: 1 As the platform grows, initial orchestrators are likely to pass the baton to a governing body that will maintain the platform and reflect the common interests of all parties. But already we are seeing momentum in leveraging blockchain in GenAI-related solutions. Ocean Protocol and Vana are two examples of decentralized data marketplaces where owners control who can access their data and are compensated for that access. Story Protocol is another intriguing effort. Its goal: to enable creators to prove that they are IP owners of a piece of content and prevent theft of protected material by storing their content on the platform. While these platforms don’t track the use of training data, they demonstrate the power of blockchain to fuel solutions that benefit content owner and user alike.

But technology alone is not enough. Successfully building a blockchain-based IP platform—and a thriving ecosystem—means aligning participants, ensuring governance, and driving scale in the right way. Orchestrators will take point here: setting the rules, coordinating efforts, and bearing significant upfront costs and risks. The best candidates for the role will possess sufficient resources, be invested in the ecosystem’s success, and be able to manage relationships with other ecosystem members.

To make it all work, orchestrators should take five critical steps:

  • Ensure that essential partners join. To persuade creators and GenAI companies to participate, get the word out on the value proposition. For creators, that’s transparency, IP protection, and compensation. For GenAI companies, it’s access to data. Without these core contributors, the platform won’t thrive.
  • Establish the right governance model. Balance openness (to attract a wide variety of participants) with control mechanisms that ensure data security and fair IP usage. Governance must protect creators’ rights while providing transparency in how content is used, licensed, and monetized on the platform.
  • Focus on scale before scope. Start by solving a specific problem—like tracking content provenance and licensing for creators—and build scale before expanding to other services. These could include more advanced features such as custom data curation and selection, whereby IP owners can specify which data should be included and which should be excluded, giving them more control over what content is available for training and who can use it. Early success will drive credibility and attract more users.
  • Solve the chicken-or-egg problem. To build critical mass, prioritize the side of the platform that needs the most immediate traction. For example, offering incentives to creators—so they onboard their content—can help attract GenAI companies looking for high-quality data.
  • Create three flywheels. Implement three reinforcing mechanisms: data, growth, and cost. As more creators and GenAI companies join, network effects will increase, creating richer data sets that enhance the value proposition. In the process, the platform’s scalability will mean lower transaction costs—sparking still more participation. (See the exhibit.)

Cultivating a Creator Economy

GenAI companies thrive when content creators thrive. The current situation has proved to be unsustainable, and GenAI companies appear to recognize this. They’re trying various strategies to source high-quality data, comply with new regulations, and prevent more lawsuits. What they might be missing out on is the opportunity to foster a creator economy. Rewarding high-quality content will lead to the creation of more high-quality works: the fuel GenAI companies need to continue generating new content at scale.

Ironically, AI itself can rate submitted content on uniqueness and attractiveness and offer compensation accordingly, encouraging content creators to produce new and exciting works that meet GenAI needs.

But who “pays the bill”?

There are multiple paths toward fair compensation of content creation. Compensation can be paid to creators that contributed to the generation of new content while respecting the flat-fee business models of GenAI companies, much akin to how music-streaming services work these days. Alternatives are also possible. For example, generated content could remain free for individual use but require a license fee for commercial applications. This would allow GenAI companies to share the added revenue stream with creators and secure access to increasing amounts of unique and differentiating content—which will help them take GenAI, and their business, further.

Recognizing the value that creators bring to GenAI, and ensuring fair compensation for using their works, is in the interest of all players in the GenAI space. GenAI companies can meet their growing demand for new training data, while content owners can tap a significant opportunity to benefit from their IP. Working together, instead of at odds, both groups can sustain growth and prosperity. By creating a blockchain-based platform for tracking and monetizing content, an ecosystem can flourish—and GenAI’s impact can keep growing.

The authors thank Marco Badur, Cathy Hackl, Daniel Sack, and Stefan Wang for their contributions to this article.

Subscribe to our Digital, Technology, and Data E-Alert.