Abstract
In Wikimedia's 2030 Strategy plan, they state unequivocally that "Developing and harnessing technology in socially equitable and constructive ways—and preventing unintended negative consequences—requires thoughtful leadership and technical vigilance."
And so we have. This proposal is the work of over a dozen staff members from various teams, in conjunction with legal and industry experts including feedback from the Creative Commons community.
In the past year there has been a lot of interest in AI-generated art coming to the forefront with highly-accessible tools that allow AI content to be generated on the fly. In examining AI art, particularly as it has rapidly gained popularity, there are an incredible number of considerations ranging from legal to ethical with varying levels of subjectivity. To this end, several key points over this policy have risen to the top:
- We have no control over third-party image sites or their policies, so we have to make a policy that takes this into consideration, considers bad actors and unsuspecting collaborators, and protects the community for the long haul.
- We need to be sure that we adhere to moral and ethical considerations that match the values we have in our community, respects artists, and encourages creativity.
- We need to set a good example for ethical considerations in AI art, not just for our community but for everyone involved in this rapidly-changing environment.
- We must adopt a policy that mitigates the impact of unethical behaviors related to AI art generation as much as we can to protect our users and community.
Definitions
Terms used in AI generation
- Generative Models: A generative model is a type of machine learning algorithm that can generate new data samples that are similar to the training data. Generative models can be used to generate new images, text, or other types of data. One example of a popular generative model is the Generative Adversarial Network (GAN).
- Generative adversarial network (GAN): A GAN is a type of generative model that consists of two neural networks: a generator and a discriminator. The generator produces new samples, while the discriminator tries to distinguish between the generated samples and real samples. The generator and discriminator are trained simultaneously, with the goal of producing generated samples that are indistinguishable from real samples.
- Diffusion model: Diffusion models are trained with the objective of removing successive applications of Gaussian noise on training images which can be thought of as a sequence of de-noising autoencoders. In short: it is trained to turn noise into image by "clearing up" the image in probabilistic ways. Most diffusion models (such as Stable Diffusion) use GAN principles in training.
- Training Data: Training data is the set of examples used to train a machine learning model. For example, a dataset of images that is used to train a GAN would be considered training data.
- Latent Space: Latent space refers to the internal representation of data within a generative model. In the case of GANs, the latent space is a vector that is input to the generator, which is then used to produce a new sample. It is important to note that in these models, there is no "imagery" stored within. The model is trained to "de-noise" in a specific way, using a probabilistic algorithm.
Legal and Licensing
As Licensing has previously noted in Licensing discussions about AI art, the US Patent and Trademark office has rejected the copyrightability of AI-generated art. This should make such art compatible with Creative Commons Share-Alike 3.01. It has been noted that in many jurisdictions, visual styles and compositions are not subject to copyright, but intellectual/personality rights may still be enforced2.
General legal considerations re: the Site License
The key thing to consider with AI-generated art is whether or not it qualifies as a derivative work. Under copyright law, a derivative work is a work that is based on one or more preexisting works and that recasts, transforms, or adapts those preexisting works. A work that is created using a pre-existing work as a starting point, such as an image created by an AI model that is trained on a dataset of images, could be considered a derivative work.
If AI-generated imagery is used as a standalone work and not using it to create any other work, it will be considered as original work as it is not based on any other existing work and it is created by an AI model. Similarly, If AI-generated imagery is used as a part of a larger work, it's important to make sure that any images that are used as input to the AI model are properly licensed for that purpose. In general, in order to comply with the terms of a Creative Commons Share-Alike 3.0 license and avoid any potential copyright issues, it would be best to only use AI-generated art that is either in the public domain or that is licensed under a compatible Creative Commons license.
It's also important to consider that AI-generated images can be mistaken as real images and cause confusions, especially if they depict real-world people, places or events. It's therefore important to have some kind of disclaimer to let people know that the images they see are AI generated to avoid any confusion.
AI Art generated via Stable Diffusion
Stable Diffusion is a deep learning text-to-image model and is the most popular image-generation model used in text-to-image generation software. The Stable Diffusion model is open source3, and images generated with the model are explicitly released to the person generating the image.4 Stable diffusion is trained on LAION-5B, which is licensed under a Creative Commons license5.
Other Creative Commons projects
Other major Creative Commons projects which are often used as a baseline for our licensing policies have been active in the ongoing discussion of AI content and ethics. Recently they discussed some of their perspectives here. In addition, the Wikimedia Foundation produced a whitepaper on the subject, discussing the ethical implications of AI tools6.
History of AI content on the Wiki
Prior to 2022, the subject of AI art came up several times in the Licensing team. In general, the nebulous nature of AI art led to an all-out ban of such content which persisted for a long time.
However, during that time, members of the team continued to research and remain aware of developments in the industry, particularly as they related to Creative Commons implementations. Overall, AI art was becoming readily adopted across multiple projects. It became clear that the Wiki needed to adopt some kind of policy related to AI content, as the lines between what is and is not acceptable became increasingly unclear under current policies.
In 2022, a member of Licensing staff attended the Game Development Conference in San Francisco, CA; in part, this was to handle several licensing issues. At that time, the staff member met with experts in multiple industries related to AI design and art, conferred with legal professionals, and more. A major goal of these conversations was to establish a baseline of considerations related to AI art, in the event that it was ever to be implemented. Several key points were adopted which were created as a neutral baseline for even the possibility of AI art usage.
- The training model must be compatible with the Site License
- The output must be explicitly released under a compatible License
- The inputs used to generate the image must be explicit and recorded publicly (a footprint)
- The model must not use copyrighted imagery in its prompts
- The image generation must be generated using a GAN that does not store actual images for reference—the image must be generated from the model + user inputs, and not include any image raster data in its latent space.7
Several solutions were discussed including DALL-E and others; at the time, Midjourney AI was the only system that met all of these key points. An internal proposal was generated and was up for consideration. The timing of this "test run" was met with some controversy over a miscommunication related to a user and AI content, but the test moved forward, with articles that used AI-generated imagery being required to disclose this and which are kept on record while the Licensing Team aimed to keep a strong thread of communication with the community.
At this same time, AI systems like Midjourney exploded in popularity, and with this notice came great consideration of ethical matters. Some systems which were similar to Midjourney were shown to be abusable and create imagery that, while legal, created controversy over artist intellectual property rights8.
With increasing scrutiny of the policy, the Licensing Team adopted a plan of action which would address some lingering policy holes as well and create better definitions for what exactly was being talked about, because it was increasingly clear that not everyone had the same definition of AI art.
Community Feedback
Over the last year and more especially in the last several months, multiple groups inside and outside the SCP community were engaged on matters relating to AI. Several discussions related to AI occurred both inside Staff Chats and in community chats, engaging with artists and other members of the community on several important considerations. These were good, open conversations that generated positive and helpful discussion which informed greatly on the matters the SCP community as a whole, and communities outside the SCP community, care about.
The most major and oft-repeated considerations generally took the form of the following considerations:
- Art which is meant to imitate another living artist has no place on the wiki, and is an unethical application of AI tools.
- Our wiki, being mostly a text-based medium of creativity, should disallow using text generated by AI in anything more than a passing manner.
- Art generated using AI must never be misrepresented, or claimed to be the sole work of the person generating the art.
- The data used to train AIs must be ethically-sourced and copyrighted images should not be used as inputs in the generation of any AI art.
Obstacles in an outright ban
It is sometimes proposed that AI imagery, being in such a legally-nebulous area, has no place on the Wiki, and an over-arching ban of AI-generated and AI-assisted imagery has been proposed before. The situation is unfortunately more complicated than that and while it was considered, it ultimately had to be rejected for several important reasons:
AI-assisted versus AI-generated
AI-assisted imagery refers to images that have been created or manipulated using artificial intelligence algorithms, but with some level of human input or oversight. This could include using AI to automatically colorize black and white photos, or to generate suggestions for changes to a photograph that a human photographer can then choose to accept or reject.
On the other hand, AI-generated imagery refers to images that have been created entirely by AI, with no human input or oversight. This can include images created by generative models, such as generative adversarial networks (GANs), or images created by other types of AI algorithms, such as evolutionary algorithms. AI-generated images are usually generated through a neural network structure and technique called "Deep Learning" which is a subset of machine learning.
That said, the line between these two concepts is increasingly becoming unclear. Programs like Photoshop and many other image editors utilize both Deep Learning and assistance algorithms with varying levels of human input and oversight.
It's therefore important to consider a balance between protecting rights of authors and copyright holders, but also taking advantage of the potential creative potential of AI-generated art. It's also a good idea to provide a clear policy regarding use of AI-generated art and imagery and enforce it consistently, and also to provide an easy way for users to report any potential violations.
Lack of disclosures from third parties
While we can enact policy and procedure on our site, we have no control over other sites and their policies, including sites which may already be on our whitelist. If a third-party site includes AI-generated or AI-assisted art, they may not adopt a policy which discloses this. Similarly, an uploader may fail to disclose these things as well. In these cases, the image is compatible with our license but the image would violate a policy. This kind of consideration could put a well-intentioned user into a bind, or cause other unforeseen issues down the road.
Concerns about undisclosed AI art
One concern with banning AI-generated art completely is that it may add additional difficulty in enforcement. Users may still upload it to the wiki without identifying it as such. Without a way for users to self-identify AI-generated images, it can be harder to later identify and remove them if they are in violation of the site license.
Other Considerations
The laws and regulations surrounding AI-generated art are still evolving, and it's difficult to predict how they may change in the future. As a result, it's important to be vigilant and stay informed about any updates or changes in laws and regulations, and to consult with legal experts if necessary, in order to ensure compliance with any applicable laws and regulations. It is therefore necessary that any policy enacted require tagging and record-keeping, even if a policy banning AI-generated art is what is adopted.
Proposal
This is a multi-part proposal. Individual parts of this proposal will be voted on individually.
Part I: Defining and handling AI content in general
For clarity: This first part is independent of any other part's approval or rejection. In other words, if AI-generated art is disallowed altogether, then sections of this part relating to it simply won't have any effect.
Definitions
AI-assisted imagery refers to images that have been created or manipulated using artificial intelligence algorithms, but with some level of human input or oversight. This could include using AI to automatically colorize black and white photos, or to generate suggestions for changes to a photograph that a human photographer can then choose to accept or reject. (As an example, this might include images using filters and cleanup tools in programs like Adobe Photoshop.)
AI-generated imagery refers to images that have been created entirely by AI, with no human input or oversight. This can include images created by generative models, such as generative adversarial networks (GANs), or images created by other types of AI algorithms, such as evolutionary algorithms.Policy
- AI-assisted imagery does not require a disclosure, though one may be added if the uploader wishes to.
- AI-assisted imagery which is used in such a way that a significant part of the content is created by the assistance tool *may* on a case-by-case basis be determined to be AI-Generated and therefore fall into that class of imagery.
- Any articles including AI-generated imagery must be tagged with "_ai".
- Any articles including AI-generated imagery must include a special template or modification to the License Box to specifically disclose it as such.
- Any AI-generated imagery must be created via approved (whitelisted) tools.
- In the event that an image that has been sourced normally (for example, via a whitelisted public domain web site) is later discovered to be AI-generated, then it must be treated as an AI-generated image; images may be removed at staff discretion.
AI Art Subteam
The Licensing Team must run a subteam focused on handling AI art, supporting ethical use of tools, making determinations, and removing imagery which violates policies on AI art.
This team should work with MAST, Tech, and even Disciplinary to follow site policy on AI-generated art.Validity of downvotes
It is valid to downvote a work for including AI content.
Due Diligence
Users should make reasonable effort to determine if images they have sourced for usage are AI-generated.
Users should understand that AI art of any kind is subject to removal (as any image on the site may be).
AI generated art should not be included in user art galleries.
Users may not falsely claim to have made art that is AI-generated.
Part II: Allowing AI-generated text
Articles primarily composed of AI-generated text are not allowed. AI-generated text may be used in a supplementary fashion as long as the significant majority of the content is written by a human. The use of tools which may use AI-adjacent scripting in its operation (such as Grammarly) to assist with grammar, critique, translation and idea generation are also allowed.
Staff may have to determine on a case-by-case basis whether a specific article should be considered AI-generated. Some existing works which have used AI-generated text may be allowed as well, at staff discretion.
Part III: Allowing AI-generated art
Disclosure and tag requirement
All AI-generated art included in an article must be disclosed in the license box. The Licensing Team will create an appropriate template to use for this, which indicates:
- Which image generator generated the image
- Which keywords were used in image generation
- A link to the record of the generation instance.
Restrictions on generators
In order to be added to the generator whitelist defined in Part I, an AI generator:
- must use a model which is compatible with our license.
- must explicitly confer IP rights to the person generating the image allowing it to be sublicensed under CC BY-SA 3.0.
- must allow individual instances of image generation to have a unique record (sometimes called a "footprint" or "breadcrumb") which confirms the key words used and related metadata.
- must utilize either a Generative adversarial network (GAN), human-trained diffusion model, or a combination of both systems.
Individual "versions" of these generators may need to be reviewed separately, if applicable.
Restrictions on content generated
The resulting image:
- must not include copyrighted characters, locations, logos or designs
- must not use use words which are known to bias generation toward living artists (i.e., "in the style of") or in a way which could reasonably imitate another living artist's work.
- must not be used for the creation of content related to SCP-173.
Use of AI art in derivative works
AI-generated content used in a derivative fashion may be used as long as it complies with the above data if known. Example: using an AI-generated carpet design in a larger image of a hotel room. The (AI) image being derived from must be disclosed.
If only used for inspiration, disclosure is encouraged but not required.
This discussion will expire in one week.