How generative AI’s uncomfortable relationship with copyright law will determine the future of the industry

News Room

Eva Toorenent has had her artwork stolen before, but the use of her art to train an AI model felt like a “new kind of violating.”

The artist and illustrator, who’s worked as a freelancer since 2019, discovered last year that another artist had used her work to enable the text-to-image program Midjourney to produce art in her style, some of which was sold to an art gallery, as Insider previously reported.

With the advent of generative AI, Toorenent’s story has become an increasingly common one as artists, developers, and writers struggle to protect their work. Many creators are turning to the courts to help them.

Rights holders argue that AI using their work without a license should be considered “unauthorized derivative work” — an infringement of copyright law. Meanwhile, AI startups insist that their models comply with fair-use doctrine, which grants them some leeway to others’ works.

This month, Universal Music Group sued the AI startup Athropic for circulating copyrighted lyrics. In January, artists claimed that both Midjourney and Stability AI, the startup behind the image generator Stable Diffusion, scraped their work without their consent. Meanwhile, Getty Images is battling Stability AI in the courtroom over the use of its library in training Stable Diffusion.

The results of these legal conflicts will likely have enormous knock-on effects for generative-AI startups, which have been among the few bright spots in what’s been a dismal year for tech as venture-capital funding continues to plummet from its 2021 high. Generative-AI startups have pulled in $18 billion from investors so far this year, according to data from Dealroom.

 An article in the Harvard Business Review said that if courts rule in favor of artists, startups would likely have to pay “substantial infringement penalties.” In the long term, investors say a data market is likely to evolve.

The data-scraping free-for-all will come to an end

AI models are trained on data scraped from the web, and that’s raised questions about whether original data sources should be credited or used in the first place without consent.

Simon Menashy, a partner at MMC Ventures, calls the current model a “Wild West with few licenses and little regulation.”

The world was not prepared for ChatGPT-3 when it was released in 2022, and few systems or processes have been put in place for the fair and ethical exchange of data, he said.

“We’re going to see the shutters coming down” on data scraping, Menashy said. He believes that future regulations may explicitly forbid AI data scraping.

Ekaterina Almasque, a partner at the VC fund OpenOcean, said the worst-case scenario would be that no rulings emerge from the ongoing legal battles and things will continue as they are.

She noted that most of the AI models come from very large companies. “The same way they don’t pay tax in many places, they wouldn’t pay for using such a valuable resource as data,” she said.

Almasque hopes that court cases will kick-start a functioning data market, where data is bought, sold, and licensed in a fair and equitable way.

In its case against Stability AI, Getty claims that the AI startup’s Stable Diffusion program has “copied 12 million images to train its AI model without permission.”

A win for Stability would set a “dangerous precedent” by sending the message that “everything on the internet is up for grabs to train large language models,” Sunny Dhillon, a managing partner at Kyber Knight Capital, said.

Getty, which recently announced a partnership with Nvidia to build out its own generative-AI tool for photo generation, has argued that “the explicit consent of rights holders is required to use their data to train learning models.”

“Generative-AI tools and services should be transparent as to the data that is used for training and the outputs of these models,” a representative for Getty said.

A specialized future for AI

Startups that build more specialized models with licensed data could be primed to excel if restrictions are placed on data scraping. Companies like the Sequoia-backed Harvey and the Andreessen Horowitz portfolio startup Hippocratic AI have built models to serve the legal and healthcare industries, respectively.

MMC Ventures’ Menashy said that a distinction between AI companies that license their data and the rest of the pool will emerge.

“That’s interesting to startups — there’s an opportunity for them to have a differentiated product,” he said. “They can train models on data that’s not universally available to customers, and tell them it’s licensed and compliant.”

Two investors suggested that in a regulated market, a bevy of data sources will emerge from non-AI companies that have collected data for their own operations but opt to license it to firms building vertical specialties.

Climate Aligned is one startup that’s using public data to build a specialist generative-AI tool. The company, which recently raised $1.8 million in early-stage funding, uses AI to showcase the environmental, social, and governance, or ESG, credentials of financial products and issuers. 

“We use public disclosure, which is available on the internet — it’s companies’ websites, their annual reports, and things like that,” Aleksi Tukiainen, Climate Aligned’s cofounder and CEO, said.

“Then when we’re providing information through our platform, we’re pointing to the source documentation. We’re not training models from random data that has been scoured from somewhere or making it up on the go.”

AI regulation could differ across continents

“Europe likes to regulate things first and have lots of rules. The US is often the opposite, as it only regulates when there’s a large size or issue,” Menashy said.

AI regulation could follow suit, said Andre Retterath, a partner at Earlybird Venture Capital.

“Europe moved ahead with GDPR and the US followed with CCPA, two independent regulatory frameworks that vary in details but still pursue the same overarching goals. I expect something similar for gen AI,” he said.

Menashy said the industry is waiting for its “Taylor Swift moment.”

The megastar famously reclaimed control of her music catalog, despite having no licensing control over it. Swift wrote the songs on her first six albums, which gave her “synchronization license” and the ability to rerecord those albums without violating copyright laws, putting her back in control of her music.

“Who’s going to be the Taylor Swift of generative AI?” Menashy asked.

Read the full article here

Share this Article
Leave a comment