OpenAI and FT Licensing Deal: Old News, New Value

Anthea Stratigos
Outsell, Inc.
Published in
6 min readMay 10, 2024

--

There are pieces we write that have industry-wide implications and this latest analysis about FT’s licensing deal with OpenAI is one of them. For that reason, I’m sharing it today with the broader industry. Our clients have had first benefit of its use along with the ability to inquire confidentially about their unique circumstances and to receive tailored advice — about what this means to them — another benefit of membership.

With that said, our team makes a great case here about certain distinctions for licensing to big tech and the differences between content like news vs. other types of content. As you consider your own licensing strategies be sure to give this a read and when you need help — give us a call.

What to Know and Why It Matters

The Financial Times has entered a licensing agreement with OpenAI, marking a new strategy to expose its journalistic content to AI language models. Under the agreement, ChatGPT will provide users with summaries and quotes from FT journalism, along with links to the articles, in response to prompts. The collaboration also permits OpenAI to train its models on the FT’s archives, enhancing the models’ overall knowledge and capabilities.

In addition to undisclosed payments, the deal also gives the FT broader reach, potentially engaging over 100 million ChatGPT users, while also ensuring a level of attribution and remuneration for the usage of FT’s content.

The deal is one in a string of several major news and current affairs publishers licensing content to AI firms. Notably, Axel Springer, the parent company of prominent outlets like Politico and Business Insider, entered into a multiyear licensing agreement with OpenAI in December 2023. Additionally, the Associated Press, the French newspaper Le Monde, and the Spanish newspaper El País have also signed content licensing deals with OpenAI.

The demand from tech companies to do these deals is currently driven by three forces: 1) increasing model knowledge, 2) increasing trust in model outputs, and 3) avoiding penalties for using training data without appropriate rights. Concern about the latter arises primarily from the EU Artificial Intelligence Act, which gained approval from the European Parliament in March 2024, and is expected to come into force over a 24-month period. The act will require “General-purpose AI” providers to supply lists of the content used to train and validate them. This transparency puts pressure on AI model developers to ensure they have the rights to their training content ahead of the act coming into force.

Demand for publishers’ archives is therefore higher than ever. However, as Outsell covered in its best practices for licensing IP into AI use cases, such deals are not without their pitfalls. Allowing tech companies to train their models on archives can eliminate the need for users to access the original source content, intermediating the publisher.

In light of this intermediation risk, it is telling that most of the announced deals have been for fast-moving news and entertainment content. Very few have been announced in other, more long-lived content segments. It’s not for want of trying: AI companies have been looking to license reference content, including books, from education, technical, scientific, legal, and other segments, but publishers have been less forthcoming.

Confidential Outsell discussions with a broad range of publishers suggest that their differing views on AI licensing mainly depend on how much they value their archives — and whether sharing their content more widely poses a competitive risk.

In fast-moving content such as news, the value of archive content is much lower than that of up-to-date, recent content. The competitive threat from making news archives available to models is therefore low, making it a compelling way to monetize these underutilized assets.

At the other end of the spectrum is long-lived or evergreen content that retains its value over time — such as educational materials, academic research, technical documents, legal references, and other perpetual resources. The implications here are different: a model trained on last year’s textbooks can be a substitute for this year’s textbooks, whereas a model trained on last year’s news is not a substitute for tomorrow’s headlines.

Training deals are therefore best suited to content that rapidly loses value, like news and current events — making it a great play for the FT. These deals are most effective when they incorporate 1) fixed terms for training licenses, 2) subscription models for accessing new content, 3) attribution requirements through retrieval-augmented generation (RAG) at inference, 4) limits on what content is made available to users, and 5) links back to original articles.

Conversely, the strategy for long-lived content necessitates a different approach. Licensing this type of content to AI models can endorse alternatives to the original content. Users querying it are less likely to click through, even with attribution. The emerging practice is therefore to avoid training licenses for long-lived content altogether, unless content is undifferentiated or specific deal criteria can be met.

Outsell has been working extensively with information providers across various sectors to resolve such AI licensing strategies — get in touch if we can help.

Winners and Losers

Both OpenAI and the FT stand to benefit significantly from this agreement. OpenAI gains access to the FT’s high-value content, enhancing its chatbot and its API’s ability to provide accurate and credible responses to user queries that demand the latest information. On the other hand, the FT secures prime positioning with a broad audience, especially on the assumption that OpenAI will not be entering into such arrangements with all of the FT’s competitors. This positions the FT advantageously against potential displacement by rising AI technologies.

The publishing industry, particularly those providers that lack the strategy or capability to license their content to AI companies, stands to lose as AI becomes a more important way to discover and consume content — much as the web browser did before. Content providers may see their influence wane and their direct connections with audiences diminish — especially if more content finds its way into the public models’ pre-training corpora. LLM providers will likely not be able to enter into agreements with all publishers, potentially marginalizing smaller publishers who could lose out on revenues.

What’s Next

In addition to training models on authoritative, high-quality content, model developers are poised to build their own web search indexes. This combination of capabilities will allow AI assistants to deliver more up-to-date and trustworthy insights to users through a blend of enhanced model knowledge and trusted retrieval augmentation. It will also put them into direct competition with the major search engines — continuing to change how information is discovered and consumed, posing new challenges for traditional publishing models.

The licensing deals also set a precedent for compensating publishers, who now need to weigh the benefits of fast and easy licensing revenues against the risks of diluting their brand or losing direct engagement with their audience. As the landscape evolves, the industry must grapple with the challenge of integrating AI into their business models without undermining the value of their core product: original, high-quality content.

The future of licensing content to AI models will also be significantly influenced by the outcome of the New York Times lawsuit against OpenAI and its major investor, Microsoft. This legal action, which centers on the alleged unauthorized use of NYT content to train extensive language models, contrasts sharply with the FT’s approach of entering into a licensing agreement with OpenAI. The resolution of the NYT case may set a legal precedent that will determine the boundaries of how AI companies can utilize journalistic content, potentially reshaping licensing strategies and collaboration models across the entire sector.

--

--

Anthea Stratigos is a Silicon Valley CEO, wife, mother, public speaker, and writer, among many other passions and pursuits. She is Co-founder & CEO of Outsell.