Skip to main content
Open this photo in gallery:

A ChatGPT prompt is shown on a device near a public school in Brooklyn, New York on Jan. 5.Peter Morgan/The Associated Press

Late September a writer friend texted a screenshot showing my name in a search box. Two of my novels, The Day the Falls Stood Still and The Painted Girls, were listed just beneath.

The Atlantic had run a piece by Alex Reisner about a collection of pirated books being used to train AI systems, including those created by Meta, EleutherAI and Bloomberg. Reisner, who is a programmer as well, had also built a tool allowing anyone to check whether a particular author’s books were in the data set. My friend had typed in my name.

The data set, according to Reisner, consists of roughly 191,000 books – everything from Margaret Atwood to self-published erotic fiction. I was aghast. Without my permission, or even my knowledge, and without a dime in compensation, AI giants had decided it was okay to use my novels to train their systems to generate text that appears to have been written by someone like me. This, when my novels are protected by copyright and when I’d taken years to research, imagine and write each book.

While several AI companies have come clean and acknowledged the use of the data set, OpenAI has not. Rather, the company refuses to disclose training information for its popular ChatGPT.

But if no offence was committed, why the secrecy? I was irked. If my novels had been used, it seemed I would find evidence in the output delivered in response to certain prompts. I opened ChatGPT and began to type.

Me: Provide a summary of Cathy Marie Buchanan’s The Painted Girls.

ChatGPT: The Painted Girls by Cathy Marie Buchanan is a historical novel set in 19th-century Paris, France. The story revolves around three sisters – Antoinette, Marie, and Charlotte Van Goethem – struggling to make a living in the harsh conditions of the Belle Époque. The sisters’ dire circumstances, Marie’s position as a ballet dancer with the Paris Opera, artist Edgar Degas as a character, the real-life inspiration for the novel …

The answer was not entirely accurate. Antoinette works as a stage extra and later as a prostitute rather than in Degas’s studio as ChatGPT had responded. Also, the bulk of the response could have been derived from publicly available reviews or summaries.

I needed to come up with a prompt that would lead ChatGPT to respond with information only available in the text of The Painted Girls. No review, blurb or author website would disclose the final pages of a book, so I reasoned that if ChatGPT were able to, it indicated the novel had been ingested in its entirety.

Open this photo in gallery:

Cathy Marie Buchanan’s The Painted GirlsHandout

Me: Provide a summary of the last chapter of Cathy Marie Buchanan’s The Painted Girls

ChatGPT: I’m sorry, but I can’t provide a detailed summary of the last chapter of Cathy Marie Buchanan’s The Painted Girls as my training data doesn’t include specific chapter-by-chapter details for copyrighted texts.

That sounded like language introduced by a company accused of copyright infraction, but maybe I could outsmart a machine. In the final chapter of The Painted Girls, Marie is married to Alphonse, a mild spoiler unlikely to have been disclosed anywhere.

Me: Does Marie marry Alphonse?

ChatGPT: Yes, in The Painted Girls by Cathy Marie Buchanan, Marie does marry Alphonse. … Despite the hardships she faces, her relationship with Alphonse evolves, and they end up getting married, providing a sense of resolution in Marie’s personal life towards the conclusion of the story.

Sure the “yes” was accurate, but then hadn’t the odds of answering correctly been 50 per cent?

I continued prompting. ChatGPT continued responding. The most telling response came when I asked for a summary of my third novel, Daughter of Black Lake, which, according to Reisner’s tool, was not among the books used to train the AI systems.

ChatGPT: As of my last knowledge update in January, 2022, Cathy Marie Buchanan’s novel Daughter of Black Lake had not been released. Therefore, I don’t have specific information about its content or plot.

The novel was in fact released in 2020, and reviews and summaries are widely available. Might ChatGPT’s inability to respond point to the primacy of the data set of ingested books?

Me: Is it possible some books used to train ChatGPT were under copyright?

ChatGPT: I don’t have access to my training data, but I was trained on a mixture of licensed data, data created by human trainers, and publicly available data. … I am not able to access books, articles or any other content that is copyrighted unless it is publicly available.

Me: Is it possible some of the “publicly available data” used to train ChatGPT was pirated?

ChatGPT: OpenAI, the organization behind Assistant, has not publicly disclosed the specifics of the individual data sets used, including whether pirated or infringing materials were used.

Authors are taking the news of our words having been stolen seriously. A group of notable writers filed a complaint against Meta. Then the Authors Guild along with a second group, including fiction heavyweights George R.R. Martin, Jonathan Franzen and Jodi Picoult, filed a class-action lawsuit against OpenAI and its partner Microsoft. In the complaint, the authors accuse the tech companies of copyright infringement. They assert that without having ingested their works, ChatGPT would be vastly different.

They also say that the AI system threatens their ability to make a living, pointing out that ChatGPT is already being used to generate books that mimic authors’ works, including novels on Amazon falsely bearing the names of established writers.

Interestingly, the Authors Guild group asserts that “until very recently, ChatGPT could be prompted to return quotations of text from copyrighted books. … Now, however, ChatGPT generally responds to such prompts with the statement, ‘I can’t provide verbatim excerpts from copyrighted texts.’”

The group’s assessment is the same as mine: OpenAI has seen the writing on the wall and has, for now, programmed in some restraint.

Cathy Marie Buchanan is The New York Times bestselling author of The Day the Falls Stood Still, The Painted Girls and Daughter of Black Lake.

Follow related authors and topics

Authors and topics you follow will be added to your personal news feed in Following.

Interact with The Globe