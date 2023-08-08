Numerous writers recently discovered that their books were uploaded and scanned into a large database without their consent. Shaxpir, a cloud word processor, created a project called Prosecraft that compiled over 27,000 books. These books were then compared, ranked, and analyzed based on the “vividness” of their language. Several authors, including Maureen Johnson and Celeste Ng, raised their voices against Prosecraft for using their books to train a model without permission. Surprisingly, even books published less than a month ago had already been uploaded.

In response to the backlash, Benji Smith, the creator of Prosecraft, took down the website, which had been active since 2017. Smith clarified that his tool was not a generative AI, but authors expressed concerns about its potential to become one. Smith had collected a dataset of a quarter billion words from published books by crawling the internet.

Prosecraft presented two paragraphs from each book, one categorized as the “most passive” and the other as the “most vivid.” The books were then ranked based on the vividness, length, or passivity of their language. Some authors found this approach frustrating, arguing that style should not be equated with writing corporate documents.

In a blog post, Smith explained that he believed he was operating within the bounds of fair use by only publishing summary statistics and small snippets of text. However, some authors pointed out that the excerpts on Prosecraft contained significant spoilers, adding to their frustration.

Despite Smith’s apologies, authors remain troubled by the increasing use of AI tools without their consent. With the rise of generative AI and self-publishing technology, artists and writers feel caught in a never-ending battle. As soon as they opt out of one database, their work ends up being used to train another AI model.

The proliferation of AI tools has also led to scammy activities such as the flood of low-quality, AI-generated travel guides and children’s books on platforms like Amazon. Impersonation is another issue, as authors like Jane Friedman have found fake books being sold under their name, seemingly written by AI. Friedman faced difficulties removing these books from sale without a trademark for her name.

While some writers don’t believe AI will ruin literature, they worry that publishers might replace marketing and publicity teams with AI-generated content. The situation overall leaves a sense of unease in the writing community.