Copyright and AI-Generated Content: Establishing Scope Requires More Than Registration
Posted in AI
U.S. copyright law protects human-authored expression, not works generated purely by generative AI. When a human author uses generative AI tools to create their work, the scope of copyright protection extends to the human-authored aspects of the work, not the AI-generated material within that work. The ability to separate out AI-generated content from the human content poses challenges at both the registration stage and the enforcement stage, with a set of related but distinct issues.
Applying for Copyright Registration
When registering a copyright, “the applicant has a duty to disclose the inclusion of AI-generated content in [the] work” and “provide a brief explanation of the human author’s contributions to the work.” 88 FR 16190. Completing the application consistent with this duty is relatively straightforward:
For example, the Author Created field can claim “selection, coordination, and arrangement of text created by the author with AI-generated text,” and the Material Excluded field can disclaim “AI-generated text.” That’s it.
Right now, the application does not formally require a detailed explanation of how generative AI was used or a specific identification of the AI-generated elements of the work. This is consistent with the Copyright Office’s practices for registering computer code, a type of work that is (almost) always created by combining preexisting and new content. It can be enough to identify “computer program” as the new matter and “computer program” as the disclaimed matter. The application does not need to describe or identify in greater detail, for example, any open-source software components, previous versions of the registered work or newly created files.
Establishing Scope of Copyright
The Copyright Office will rarely seek additional information about the differences between the disclaimed and claimed matter for registration. These questions normally arise, if at all, when copyright is enforced, which means that for many copyrighted works, this ends up being a nonissue. But when copyright is enforced in litigation, the inability of the copyright owner to identify the protectable elements of the registered work is fatal. See, e.g., SAS Institute v. World Programming Ltd., 64 F.4th 1319 (Fed. Cir. 2023). This evidentiary challenge will be central to any case concerning the scope of copyright protection for works containing AI-generated content. Zarya of the Dawn is a harbinger of such challenges for generative AI users.
While much of the discussion of the Copyright Office’s decision to reissue a “more limited registration” of Zarya of the Dawn focuses on the conclusion that unaltered generative AI outputs are not protectable, we can learn a lot from the Copyright Office’s analysis of whether Kris Kashtanova’s editing of certain images is “sufficiently creative to be entitled to copyright.” Accused infringers would be smart to take a similar approach when challenging the scope of copyright protection for works having AI-generated content. Likewise, authors should consider how they will establish the scope of their copyright in response to such a challenge.
How does an author identify the human and AI contributions to their work? A vague description of how AI was used isn’t going to be enough in an adversarial setting. Consider Kashtanova’s discussion of their image shown above. When Kashtanova explained they used Photoshop to “show aging of the face, smoothing of gradients, and modifications of line and shapes,” the Copyright Office found such explanation insufficient to “determine what expression in the image was contributed through [Kashtanova’s] use of Photoshop as opposed to generated by Midjourney.” With a better explanation, or records showing how Kashtanova altered the AI-generated output, Kashtanova may have been able to establish some copyright protection of the above image.
In an ideal situation, the copyright holder would have some contemporaneous records that distinguish the AI and human contributions. If Kashtanova had provided the pre-Photoshop version of the above image, one could discern Kashtanova’s contribution in the final image. However, unless the technology automatically identifies the AI and human contributions, making such records requires a high level of planning and foresight.
The ability to parse out human expression from AI expression will also be a challenge in the context of AI-generated computer code. Here, the human authorship and AI contributions most likely occur simultaneously and not in two separate stages, as they did when Kashtanova used Photoshop to edit Midjourney outputs. It’s an imperfect analogy, but imagine a human developer collaborating alongside the generative-AI tool developer. If each “author” drafts certain lines of code or revises the architecture of a particular file, how do we record who made which contributions? For code-generating tools that are part of the integrated development environment, the AI tool itself might offer some type of automatic tracking, which simplifies the process. However, if developers are incorporating AI-generated code into their codebase from external tools, the developers themselves may need some method to document AI contributions (e.g., using naming conventions or comments in the code).
Assessing Relevance of Scope
How much will the ability to separate the human expression from the AI-generated work matter when it comes time to assert your copyright? That depends on how generative AI was used to create the work and how the work was infringed.
When the infringing work is an exact copy of protected work, copyright holders just need to establish that the work itself includes enough human authorship to have thin protection. If the infringing work is only (a) a portion of the protected work, (b) substantially similar to the protected work, or (c) a derivative of the protected work, then copyright holders will need to show human authorship in those particular portions of the protected work.
For Kashtanova to assert copyright against unauthorized copies of their entire novel, whether a single image by itself is unprotectable AI-generated content will not change the infringement analysis. That is, Kashtanova has copyright protection for the complete work because they selected, coordinated and arranged the images and wrote the text that together created the graphic novel. However, if Kashtanova were to assert copyright against the use of a single image, then the specifics of how Kashtanova edited the image would be a central question.
Having detailed records when the human authorship and AI “authorship” are of similar nature is particularly important. If Kashtanova were to assert copyright against an infringer’s use of the text of Zarya of the Dawn, they do not need detailed records to demonstrate the overlap between the alleged infringement and the human expression—they did not use generative AI at all to generate text.
However, consider an instance where an alleged infringer copied only a portion of a computer program. The author may recall that generative AI was used to create some of the code, while the rest was human authored. The central question, then, is whether the copied portion included only AI-generated code—in which case there is no copyright infringement—or whether it was human authored, which may be sufficient to sustain the copyright infringement claim.
Computer programs, like other literary works, suffer from the added complication of nonliteral copying. If, instead of verbatim copying of a portion or the entirety of the protected work, the alleged infringer copied the work’s nonliteral elements (like its sequence, structure and organization), infringement analysis usually requires application of the abstraction-filtration-comparison test articulated in Computer Associates v. Altai, 982 F.2d 693 (2d Cir. 1992). Part of the filtration step requires identifying elements of the protected work that are not authored by the copyright holder—third-party components (like open-source software) and elements taken from the public domain (like those that are “commonplace in the computer software industry”). Now, the filtration will have to account for AI-generated portions of the computer code.
It’s impossible to predict which portions of a protected work infringers are most likely to copy, and records distinguishing AI-generated portions from human-authored portions are only helpful if those records relate to the alleged copying. To be prepared to assert copyright against incomplete or substantially similar copying, the author should maintain detailed records about the role AI played in the creation of the work. However, if the author is primarily concerned with unauthorized copying of the entire work or if AI played a more limited role (e.g., only image generation in a multimedia work), such onerous recordkeeping may be overkill.
Typically, no single person (or department) has all the information to figure out how granularly to identify AI-generated content within a given work in order to preserve valuable copyright protection. So implementing processes and procedures for tracking AI-generated content may involve input from the authors themselves (who can speak to how they are using AI), engineers (who can advise on what tools can be used to track such use), product or marketing teams (who can speak to which portions of the work are most valuable), and lawyers (who can—hopefully—explain how it all fits together).