Attribution in the GenAI era: I used AI to make it
WHITEPAPER I used AI to make it Navigating copyright infringement and attribution in the GenAI era
Generative AI (GenAI) has forever changed how humans deploy (and interact with) AI systems. Its capabilities to generate human-like content, interpret user prompts, and respond creatively have unlocked an exciting new landscape of possibilities. From assisting in content creation to simulating human-like conversations, its vast potential has brought about remarkable improvements in productivity and efficiency. However, the rapid rise of GenAI has surfaced ethical concerns about attribution and copyright. As AI-generated content becomes more prevalent, distinguishing original creations from AI-assisted creations becomes challenging, prompting questions about authorship, intellectual property, and equitable use. Addressing these issues is crucial for content creators' rights, transparency, and accountability in AI-generated content. Businesses must navigate these challenges with integrity and discernment in pursuing ethical innovation. Understanding attribution and copyright In the context of GenAI, attribution refers to acknowledging and honoring the creators and contributors behind the generated content. Copyright, on the other hand, grants legal protection to original works, protecting the rights and privileges of content creators. Both attribution and copyright are vital to creating an environment that respects intellectual property (IP), encourages creativity, and ensures fair recognition. Why humans are slow to attribute work to GenAI The distinction between human-generated and AI-generated content is progressively fading. Creators worldwide face the dilemma of determining when to embrace GenAI and when to exercise caution, as content can be categorized as real or fake, authentic or manufactured, credible or plagiarized, and human or non-human. But before we dive into the legal aspects of the topic, it’s essential to understand why and how attribution is becoming a problem in the first place.
Behavioral science offers fascinating insights into why people may increasingly rely on Gen AI tools for content creation without proper credit or attribution. Numerous factors, such as cognitive biases and social influence, underline this inclination, shaping human behavior and decision-making.
© 2023 Fractal Analytics Inc. All rights reserved
01
Four key behavioral factors can influence why there is a lack of attribution for GenAI use:
Convenience bias This bias may drive people to use Gen AI tools as they can significantly simplify tasks that would otherwise require considerable time and cognitive effort. In the race to be more productive or creative, individuals may overlook or undervalue the role of original creators, focusing instead on the immediate utility the AI offers. Social influence The "bandwagon effect," for instance, could lead people to adopt Gen AI tools because others in their social or professional circles are doing so. This social proof often acts as a validation of the tool's usefulness. Still, it can also make individuals less critical of the ethical and legal considerations involved, such as copyright infringement or attribution.
The Dunning–Kruger effect
Not-invented-here syndrome This is where individuals and organizations often undervalue innovations and creations not directly linked to them. It can manifest as a reluctance to appropriately credit original data sources when utilizing Gen AI tools, stemming from a cognitive dissonance regarding external contributions to what they perceive as "their" project. Interestingly, studies across groups of students, entrepreneurs, leaders, and employees demonstrate that the higher the person's creativity, the higher the chance of them engaging in unethical practices. Given our behavioral tendencies, the future of attribution and copyright in GenAI should include parameters and guidelines on best practices of giving due credit. This effect posits that people with limited knowledge overestimate their grasp of complex subjects. In the Gen AI domain, users may mistakenly believe they fully understand the intricate legal implications of using AI-generated content. Unfortunately, this overconfidence can lead to neglect or inadvertent disregard for critical issues such as attribution and copyright, resulting in undesirable consequences.
© 2023 Fractal Analytics Inc. All rights reserved
02
Unfortunately, this is not as straightforward as we think. Under traditional copyright laws, the creator of a work is typically considered the copyright owner. But there are many issues around GenAI attribution, and none of these challenges can be clarified in a black-and-white manner. Take, for example, the images below. Crafted by complex algorithms that use extensive training data sets, these images raise legitimate questions about eligibility for copyright protection. The consensus leans toward requiring substantial human involvement for a work to qualify for copyright. However, the debate is far from settled as some argue that the AI should be considered the creator, thereby granting copyright ownership to the AI owner. Others contend that the human programmer who developed the AI model should be recognized as the creator and copyright owner. Then some suggest that the humans who contributed to the training data preparation should be acknowledged as the creators. This murky landscape underscores the need for well-defined guidelines to help people navigate the complex interplay between Gen AI and copyright laws
Source: Princess. Painting generated by Dall-e.
Source: Footballers. Painting generated by Dall-e.
Whom will you credit for these images?
© 2023 Fractal Analytics Inc. All rights reserved
03
Finally, within the European Union, the advent of Gen AI has unsettled the lawmaking trajectory for the proposed AI Act, prompting a re-evaluation of how responsibilities are allocated to AI system providers and users. While the AI Act doesn't solely pertain to copyright legislation, EU legislators are currently contemplating the imposition of a mandate on providers of Gen AI systems. This mandate would necessitate the disclosure of a summary in which the utilization of training data subject to copyright protection is publicly outlined. Similarly, Japanese legislation does not expressly cover concerns related to AI-generated creations. However, the Japanese Copyright Act does encompass safeguards for derivative creations, which involve works founded upon or modified from pre-existing works. It remains plausible that a creation produced by AI might be classified as a derivative work, consequently qualifying for copyright protection. The question of ownership To further complicate the matter, laws and regulations protecting IP and copyright infringement can be influenced by many additional factors, including country-specific law. Applying copyright laws to Gen AI poses unique challenges, as different countries have different approaches to the copyright protection of AI-generated works — and others have no law regarding GenAI works at all. In the United States, copyright laws do not extend protection to works created solely by a machine. However, copyright protection may be granted if substantial human involvement can be demonstrated in the creation process.
To complicate the matter even further, not all AI-generated content falls under the purview of copyright law. For example, brief AI-generated phrases, produced regularly, typically lack the elements necessary for copyright protection. As such, they are generally exempt from the constraints of open-source licenses.
Sorting Human from Machine Content GenAI attribution tools can differentiate between AI-generated and human-generated content, which can be extremely important in creating clarity within areas such as intellectual property. But at the heart of GenAI attribution is the idea of crediting the content creators and owners of data used to train the models and form the outputted content by GenAI tools. As previously mentioned, the implications of ownership and copyright infringement within GenAI are legally not black and white. However, regulation will soon be implemented, giving attribution models more direction in adhering to requirements.
© 2023 Fractal Analytics Inc. All rights reserved
04
Current models within AI detection
Text
Image
Video
Open AI classifier
Vision transformer model
Microsoft video authenticator
RoBERTa large OpenAI detector
GAN detector
Stable attribution (rendered) out of use by legal issues)
Giant language model test room
One example of a model uses a classifier method to give the text a label — “real” or “fake” — and a percentage score associated with it. An output is displayed below.
Fractal GenAI Text Dectector This tool uses the RoBERTa model to detect generative AI written text. It is best at detecting text using GPT-2
Additional Tools
Human evaluation The most proficient method for identifying attribution and copyright concerns within GenAI's output remains human assessment. A human reviewer can thoroughly examine the content and pinpoint any resemblances to copyrighted materials.
Machine learning approach Leveraging machine learning, we can formulate models designed to identify instances of copyright violation. By training these models on a data set comprising established copyrighted materials, they can subsequently be employed
Statistical examination Statistical analysis serves as an effective means to unveil text patterns indicative of copyright infringement. This methodology detects works likely to be derivative of others, even when no exact matches are present.
to scan novel works for potential infringement.
© 2023 Fractal Analytics Inc. All rights reserved
05
The other side of copyright infringement IP ownership, copyright, and attribution are not just matters of determining who owns the work generated by AI but also the complex legal implications of the data that goes into training these models. As companies progressively incorporate Gen AI tools such as GitHub Copilot and ChatGPT into their workflows, the question of copyright within training data gains paramount importance. Copyright law forbids the unauthorized replication of copyrighted content and the development of derivative works without proper authorization. When Gen AI models are trained on copyrighted data, enterprises must ask crucial questions: Is the output generated by the AI to be considered a derivative work? In such cases, it becomes essential for companies to diligently adhere to all licensing obligations tied to the core data or code. In the age of AI-powered content creation, enterprises must prioritize adherence to copyright laws, intellectual property rights, and regulatory compliance. It is necessary to consider these crucial considerations to avert potential legal repercussions and damage to a company's reputation. For instance, recent high-profile cases underscore the urgency of addressing copyright concerns in training data. This challenge also extends to the very architecture of machine learning models. If models trained on copyrighted data could be considered derivative works, it may necessitate an open-source approach, introducing additional complexity to organizations' model handling and deployment strategies. Case 1 Comedian Sarah Silverman and authors Christopher Golden and Richard Kadrey filed a copyright infringement lawsuit against Open AI and Meta Platforms. They allege that their copyrighted books were violated by the companies' use of ChatGPT for training purposes. This case underscores the significance of upholding copyright regulations and securing appropriate permissions when incorporating content into GenAI applications. Case 2 Microsoft, GitHub, and OpenAI are currently embroiled in a class action lawsuit alleging that they have violated copyright law by permitting the use of Copilot. This Gen AI tool purportedly replicates copyrighted content without proper authorization. This lawsuit highlights the ongoing legal challenges surrounding using Gen AI and the need for clarity in copyright law.
© 2023 Fractal Analytics Inc. All rights reserved
06
Conclusion
In the age of Gen AI, the intricate dynamics of copyright infringement and attribution come into sharper focus, presenting formidable challenges to the established legal and ethical frameworks that underpin the business landscape. As AI technologies seamlessly merge with content creation processes, it becomes paramount for enterprises and industry professionals to navigate this multifaceted terrain adeptly. This journey entails confronting issues such as the rightful ownership of derivative works and the ethical use of copyrighted training data. As the momentum behind GenAI adoption surges, the need for meticulously crafted guidelines and educational initiatives tailored to the unique demands of businesses becomes increasingly critical. Neglecting to undertake this crucial endeavor exposes your enterprise to legal vulnerabilities and erodes the ethical bedrock upon which the edifice of creativity and innovation firmly rests.
Authors
Supriya Panigrahi Consultant, Strategic Center
Swasti Acharya Design consultant, Strategic Center
Sray Agarwal Principal Consultant, Strategic Center
© 2023 Fractal Analytics Inc. All rights reserved
07
About Fractal
Fractal is one of the most prominent providers of Artificial Intelligence to Fortune 500® companies. Fractal's vision is to power every human decision in the enterprise, and bring AI, engineering, and design to help the world's most admired companies. Fractal's businesses include Crux Intelligence (AI driven business intelligence), Eugenie.ai (AI for sustainability), Asper.ai (AI for revenue growth management) and Senseforth.ai (conversational AI for sales and customer service). Fractal incubated Qure.ai, a leading player in healthcare AI for detecting Tuberculosis and Lung cancer. Fractal currently has 4000+ employees across 16 global locations, including the United States, UK, Ukraine, India, Singapore, and Australia. Fractal has been recognized as 'Great Workplace' and 'India's Best Workplaces for Women' in the top 100 (large) category by The Great Place to Work® Institute; featured as a leader in Customer Analytics Service Providers Wave™ 2021, Computer Vision Consultancies Wave™ 2020 & Specialized Insights Service Providers Wave™ 2020 by Forrester Research Inc., a leader in Analytics & AI Services Specialists Peak Matrix 2022 by Everest Group and recognized as an 'Honorable Vendor' in 2022 Magic Quadrant™ for data & analytics by Gartner Inc.
For more information, visit fractal.ai
Corporate Headquarters Suite 76J, One World Trade Center, New York, NY 10007
Get in touch
© 2023 Fractal Analytics Inc. All rights reserved
08
Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9Made with FlippingBook - PDF hosting