The New York Times and Daily News have sued OpenAI and Microsoft, alleging ChatGPT was trained on their copyrighted content. However, potential evidence gathered by the publishers’ lawyers was accidentally erased by OpenAI engineers during their investigation.
The publishers had spent over 150 hours searching OpenAI’s training data using provided virtual machines, but on November 14, one machine’s data was deleted. Although some data was recovered, it’s unusable for legal proceedings, forcing the publishers to restart their efforts.
This incident highlights ongoing concerns about the transparency of AI training data and adds complications to the legal battle.