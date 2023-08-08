Numerous authors recently discovered that their books had been uploaded and scanned into a large dataset without their permission. The project, called Prosecraft, was developed by cloud word processor Shaxpir and involved the analysis of over 27,000 books based on the “vividness” of their language. However, many authors, including Maureen Johnson and Celeste Ng, expressed dissatisfaction with Prosecraft for using their books without consent, even those that were published recently.

In response to significant online backlash, the creator of Prosecraft, Benji Smith, took down the website, which had been operational since 2017. Although Prosecraft was not a generative AI tool, concerns were raised about its potential to become one due to its vast dataset sourced from published books on the internet.

Smith’s method involved presenting two paragraphs from a book, one deemed the “most passive” and the other the “most vivid.” The books were then ranked based on criteria such as vividness, length, and passivity. However, this approach drew criticism from authors who argued that style and creative expression are unique to each writer. They maintained that style should not be limited by rigid standards.

Smith defended his project, stating that he only published summary statistics and small text snippets, believing he was within the boundaries of the Fair Use doctrine. However, some authors discovered that the excerpts of their books on Prosecraft contained significant spoilers, furthering their concerns.

Artists and writers have expressed frustration with the increasing number of AI tools that exploit their work. They have found that even when opting out of one database, their content is still being used to train other AI models. This proliferation of AI tools, coupled with the rise of self-publishing, has created a breeding ground for scam activities, such as the flooding of Amazon with low-quality AI-generated travel guides and children’s books.

Authors are particularly concerned about the potential for unintentional plagiarism, as AI models like ChatGPT are trained on a wide range of internet content. Some authors, like Jane Friedman, have even found counterfeit books under their names on platforms like Amazon, which they believe are AI-generated. Despite successfully getting these books removed from platforms like Goodreads, authors struggle to remove them from sale without a trademark for their name.

While authors acknowledge that AI won’t ruin literature, they worry that publishers might be persuaded otherwise, leading to the replacement of marketing and publicity teams with AI-generated promotional content. This situation remains disheartening for authors, as they strive to protect their creative output from unauthorized use.