Generative AI Risk Mitigation

Generative AI Risk Mitigation
By Gideon T. Rasmussen, CISSP, CRISC, CISA, CISM, CIPP
April 3, 2023

Here are my recommendations to leverage the benefits of chatbots while mitigating the risks. Includes five risk scenarios and seven tips to reduce the risk of using chatbots in a work environment.

Types of Generative AI

Generative AI refers to a category of AI algorithms that generate new outputs based on the data they have been trained on. This includes but is not limited to:

• AI Chatbots: AI Chatbots are designed to provide information and answer questions through a conversational interface. Chatbots generate written content and software code. Examples of AI Chatbots include ChatGPT, Bing Chat and Google Bard.

• AI Art Generators: AI Art Generators create works of art based on input from a user. They can create paintings, drawings and pieces of music. Examples of AI Art Generators include DALL-E, Simplified and Starryai.

Risks

AI Hallucinations: Generative AI references data from the Internet, which can include untruths, explicit and implicit bias. Training data is collected at a point-in-time so Chatbot output may be out-of-date. Data quality issues with training data can cause a Chatbot to output convincing language that wrong or false. That tendency to confidently produce inaccurate responses is referred to as a “hallucination”, which can include irrelevant, nonsensical or factually incorrect answers.

Intellectual Property Violations: Generative AI leverages information that can violate copyrights, patents, trademarks and similar legal protections. Generative AI is trained on a large amount of text-based data and images, typically scraped from the Internet. Other sources such as scientific research, books or social media posts may also be used. Colleges use technology to detect plagiarism. That same approach can be used to detect intellectual property infringement.

Data Leakage: When a person types a query into a Chatbot, that information is visible to the service provider and may be shared with third parties. The service provider’s terms of use and privacy policy provide additional details. Chatbot queries stored online may be subject to data breach or accidental exposure to the Internet.

Software Integration: Access to Generative AI services can be added to existing software through a plugin, web browser extension or an Application Programming Interface (API). API integrations provide visibility to the service provider and may leak sensitive information. Reference the Data Leakage section above.

Downloading Fake AI Software: Threat actors are capitalizing on the popularity of Generative AI to distribute malware. Advertising campaigns promote websites where the “latest version” of ChatGPT software can be downloaded for free. This is just one example, others are bound to follow.

Guidance

1. Do not trust AI output to be accurate.
• Be skeptical as you use Chatbots. They are not sentient or conscious. Remember AI can “hallucinate” and produce output that is false or misleading.
• Only use a Chatbot for a subject or domain with which you are familiar or have a degree of expertise in.
• Critically review all Chatbot output for quality and accuracy, this includes text-based output, software code, calculations and formulas.
• AI generated content should receive a second level of quality assurance review before being used for any purpose that may have business impact.
• Do not trust Chatbot content when making decisions. Always validate Generative AI content from at least one reputable source of information.

2. Corroborate research or analysis conducted by Generative AI.
• Legitimate use cases for Generative AI include fraud detection, risk assessment and investment research.
• Critically evaluate the trustworthiness of AI analysis with at least one source of corroborating evidence.

3. Do not enter sensitive data into Chatbot queries.
• This restriction includes but is not limited to Personally Identifiable Information (PII), Protected Health Information (PHI), payment information, intellectual property and strategic planning information.
• Do not submit queries that would cause issues if they were made public.
• Be mindful that information may be inferred through aggregation across multiple queries using the same login.

4. Do not use AI Art Generators for Internet or customer-facing communications.
• Generative AI art may infringe upon legal protections which could result in financial penalties, fines and reputational damage.

5. Do not attempt to download or install AI software.
• Remember that Generative AI typically operates from an Internet based website, versus stand-alone software that is downloaded and installed.
• Be mindful that software downloaded from the Internet may contain malicious code. Only download software from reputable sources.

6. Understand the risks of integrating Generative AI into existing software.
• It may be tempting to "add AI" to existing software through a plugin, web browser extension or an Application Programming Interface (API).
• Carefully consider the risks of how sensitive information may be exposed. Reference the Data Leakage section above.

7. When in doubt, contact your IT department for guidance and support.
• Generative AI is an emerging technology that is rapidly evolving.
• It is a good practice to reach out to your IT team with questions or concerns pertaining to the use of Generative AI technology.

Definitions

Application Programming Interface (API): An API is code that enables two software programs to communicate. An API defines how a developer should request services from an operating system (OS) or other application and expose data within different contexts and across multiple channels.

Artificial General Intelligence (AGI): The essence of intelligence is the principle of adapting to the environment while working with insufficient knowledge and resources. Accordingly, an intelligent system should rely on finite processing capacity, work in real time, open to unexpected tasks, and learn from experience. This working definition interprets “intelligence” as a form of “relative rationality”.

Generative Artificial Intelligence (AI): Generative AI can produce various types of content including text, software code, imagery, audio and synthetic data. Guidance and directives are necessary leverage its benefits and to protect against risks.

Generative Pre-Trained Transformer (GPT): GPT stands for Generative Pre-trained Transformer. It is a neural network machine learning model which is trained using data on the internet to generate any type of text. This sophisticated neural network is used to train large language models (LLMs) to simulate human communication.

Personally Identifiable Information (PII): Information that can be used to distinguish or trace an individual’s identity—such as name, social security number, biometric data records—either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual (e.g., date and place of birth, mother’s maiden name, etc.).

Protected Health Information (PHI): Individually identifiable health information, held or maintained by a covered entity or its business associates acting for the covered entity, that is transmitted or maintained in any form or medium (including the individually identifiable health information of non-U.S. citizens). This includes identifiable demographic and other information relating to the past, present, or future physical or mental health or condition of an individual, or the provision or payment of health care to an individual that is created or received by a health care provider, health plan, employer, or health care clearinghouse. Genetic information is considered to be health information.

Synthetic Data: Synthetic data is information that's artificially manufactured rather than generated by real-world events. Synthetic data is created algorithmically, and it is used as a stand-in for test datasets of production or operational data, to validate mathematical models and, increasingly, to train machine learning models.

References

What is Generative AI? Everything You Need to Know - TechTarget
https://www.techtarget.com/searchenterpriseai/definition/generative-AI

What is Generative AI? An AI Explains - World Economic Forum
https://www.weforum.org/agenda/2023/02/generative-ai-explain-algorithms-work

Potential Bias in AI Consumer Decision Tools Eyed by FTC, CFPB - Bloomberg Law
https://news.bloomberglaw.com/tech-and-telecom-law/potential-bias-in-ai-consumer-decision-tools-eyed-by-ftc-cfpb

OpenAI Rolls out ChatGPT Plugins, Granting Iffy Language Model Access to Your Apps - The Register
https://www.theregister.com/2023/03/26/openai_chatgpt_plugins

ChatGPT and Large Language Models: What's the Risk? - UK National Cyber Security Centre
https://www.ncsc.gov.uk/blog-post/chatgpt-and-large-language-models-whats-the-risk

Legal Doomsday for Generative AI ChatGPT if Caught Plagiarizing or Infringing, Warns AI Ethics and AI Law - Forbes
https://www.forbes.com/sites/lanceeliot/2023/02/26/legal-doomsday-for-generative-ai-chatgpt-if-caught-plagiarizing-or-infringing-warns-ai-ethics-and-ai-law/?sh=423b11f9122b

Stable Diffusion Copyright Lawsuits Could be a Legal Earthquake for AI - Ars Technica
https://arstechnica.com/tech-policy/2023/04/stable-diffusion-copyright-lawsuits-could-be-a-legal-earthquake-for-ai

What Makes A.I. Chatbots Go Wrong? - The New York Times
https://www.nytimes.com/2023/03/29/technology/ai-chatbots-hallucinations.html

What is GPT? - KDnuggets
https://www.kdnuggets.com/2023/03/gpt4-everything-need-know.html

ChatGPT Tied to Samsung’s Alleged Data Leak - Cybernews
https://cybernews.com/news/chatgpt-samsung-data-leak

ChatGPT is Being Used to Lure Victims into Downloading Malware - TechRadar
https://www.techradar.com/news/chatgpt-is-being-used-to-lure-victims-into-downloading-malware

The Next Wave of Disruption in Financial Services: Generative AI and Large Language Models - SEI
https://www.seic.com/pulse-future/next-wave-disruption-financial-services-generative-ai-and-large-language-models

Introducing BloombergGPT, Bloomberg’s 50-Billion Parameter Large Language Model, Purpose-Built from Scratch for Finance - Bloomberg
https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance

Corroborating Information from Multiple Sources by Minji Wu - Rutgers University
https://rucore.libraries.rutgers.edu/rutgers-lib/51498/PDF/1/play

What is an API? | Definition from TechTarget
https://www.techtarget.com/searchapparchitecture/definition/application-program-interface-API

A Burp Suite Extension to Add OpenAI to Burp to Help You with Your Bug Bounty Recon! - GitHub
https://github.com/hisxo/ReconAIzer

Journal of Artificial General Intelligence Special Issue “On Defining Artificial Intelligence” - Artificial General Intelligence Society (AGIS)
https://alumni.media.mit.edu/~kris/ftp/JAGI%20Special%20Issue%20On%20Defining%20Artificial%20Intelligence%202020.pdf

Personally Identifiable Information - NIST Glossary of Terms
https://csrc.nist.gov/glossary/term/personally_identifiable_information

What Health Information Is Protected by the Privacy Rule? - U.S. Department of Health & Human Services
https://privacyruleandresearch.nih.gov/pr_07.asp

What is Synthetic Data? | Definition from TechTarget
https://www.techtarget.com/searchcio/definition/synthetic-data