Let Me Be Clear: It’s Never OK to Enter Customer Data Into ChatGPT
Since when has it been “OK” let alone “innovative” to hand over emails from your European customers full of customer data to an LLM (= Large Language Model) server in the US?! Our news feeds might be full of sketchy “AI experts” extolling its virtues, but that doesn’t mean we should take their word for it. I, for one, certainly wouldn’t.
Fortunately, we now have the first projects with customers where "private" LLM is in use and where it is ensured that data is protected.
Below are some of the risks companies face when using LLM tools, as presented and discussed in several expert panels at the Gartner D&A Summit in London this year:
Inaccurate Responses, if Not Plain Made Up!
Simply put, the most widespread issue is probably that LLM tools often generate false information, with the precise nature of that inaccuracy varying from answer to answer. In some cases, these tools provide part-truths as a response. Google’s AI chatbot Bard, for example, recently made headlines for falsely claiming that the James Webb Space Telescope captured the first images of exoplanets.
Plus, LLMs are only trained using data up to a specific date (ChatGPT relies on data up to 2021), limiting the relevance of these tools at the time of writing. While future LLMs will surely be trained using more recent data, they still won’t be without informational limitations.
As well as inaccuracies, LLM tools are also prone to “hallucinations”, including inventing false answers and non-existent legal or scientific citations. At the root of this problem is first and foremost the model’s predictive technique coupled with its inability to actually “understand” the content it produces. That said, updated LLMs do seem to be rapidly gaining in accuracy. GPT-4, for example, is reported to have a 40% higher probability of providing correct answers compared to the previous version.
To mitigate the risk of both inaccuracies and false information, it is vital for legal departments to adopt policies that require employees to verify the accuracy, reasonableness and actual usefulness of any and all results generated by LLM tools before simply accepting the information given as either wholly accurate or true. In fact, where companies do permit the use of LLM tools, employees should treat AI responses as first drafts only to reduce the risk of inaccuracies in the company’s internal and external communications and ensure a rigorous review process is in place.
Bias in Decision-Making and Outcomes
LLM tools may also provide biased responses. As such, where companies allow their use, they must have policies or controls in place to detect biased responses and address these in keeping with company policies and relevant legal requirements. Google, for its part, uses an open-source anti-bias tool that performs counterfactual analysis to check whether a machine learning algorithm meets a range of mathematical definitions of fairness.
OpenAI has also acknowledged issues with bias: In some cases, ChatGPT rejects outputs it shouldn’t, whereas in other cases it fails to reject when it should. Despite OpenAI’s best efforts to minimize bias and discrimination in ChatGPT, there are already plenty of instances of these occurring, a trend that seems likely to persist despite active ongoing attempts by OpenAI and others to reduce these risks.
From a corporate perspective, it’s worth bearing in mind that as the complete elimination of bias in AI-generated results is near impossible, legal and compliance managers should work with subject experts to ensure employees are at least aware and mindful of this issue.
Data Protection and Confidentiality
Legal and compliance managers need to remember that any information entered into an LLM tool, at least the public version, can then be used to train the tool further. That means any sensitive, proprietary or confidential information used in prompts may later be used in responses for users outside the organization. In a stark example, ChatGPT recently disclosed a journalist’s phone number in response to a user’s question about whether the tool could be used with Signal, a messaging app.
Beyond the potential disclosure of prompts in future responses, LLM companies like OpenAI may in certain circumstances also disclose personal user information to unspecified third parties without prior notice.
At the very least, legal and compliance managers need to consider the following to address data protection and confidentiality risks for the company:
- Establishing a compliance framework for the use of LLM tools within the company: Amazon, for example, has already warned employees against entering confidential information into ChatGPT prompts.
- Implementing clear polices to prevent employees from asking LLM tools any questions that could reveal sensitive organizational or personal information. For example, policies should explicitly instruct employees never to enter company content, including emails, reports, chat logs or customer data, or personally identifiable information such as customer/employee identification or credit card numbers into LLM prompts.
- Ensuring compliance with requirements by applying privacy-by-design principles.
- Updating incident response policies to cover data leaks involving confidential information. These provisions should, for one, require any output generated by LLM tools to be subject to human review. At the bare minimum, LLM tool outputs must be at least read by a human before being forwarded to guarantee no compromising data is shared in the content.
- Providing guidance to staff on when LLM output is likely to include compromising data.
Intellectual Property and Copyright Risks
LLM tools are trained using vast amounts of online data, some of which may contain copyrighted material, meaning any output could infringe copyright or intellectual property regulations. In fact, relevant issues are already being argued in court in the US. And it’s simply not possible to mitigate this risk with improved transparency, as tools like ChatGPT don’t currently provide sources or explain how they generated the results. What’s interesting is that OpenAI claims that users own the results generated by ChatGPT and therefore bear any associated liability. This makes it vital for legal and compliance managers to require users to review their results carefully before reuse to make absolutely sure they are not infringing copyright or intellectual property rights.
Consumer Protection Risks
Companies that fail to disclose their LLM use (e.g. a customer support chatbot) to consumers risk losing their trust – not to mention being charged with unfair practices under various laws! California’s chatbot law, for example, stipulates that companies must clearly disclose that a consumer is communicating with a bot during certain customer interactions.
Furthermore, the U.S. Federal Trade Commission emphasizes that the use of AI tools should be “transparent, explainable, fair, and empirically sound, while fostering accountability.” As such, legal and compliance managers have a duty to make sure their company’s use of LLM tools complies with all relevant laws and regulations and that customers are appropriately informed. Examples of disclosures could include: “The following content was created wholly by a system that uses AI based on specific requests to the AI system,” or “I created the following content using a system that uses AI to support my work.”
But managers don’t just need to monitor the use of LLM tools within their own company, they also need to do so for any third parties they work with. Any data the company sends to third parties could potentially be used by third parties in their LLM tool use. Let’s say, for example, the company uses a third-party chatbot provider that relies on LLM tools to provide customer service. This could put sensitive customer data at risk as the customer data could be used without the company’s knowledge to train the LLM tool, or in the event of a data breach involving such tools.
Another big third-party risk is the green factor, as generative AI tools consume vast amounts of energy, often undocumented, in the form of computing power.
Yes, case law on LLM tools is slowly growing, but we’re still yet to see much precedent. For the time being, if you’re using an LLM platform, you need to be prepared to assume liability for the output.
This uncertainty means legal departments will have to exercise a good measure of caution when incorporating LLM platforms to avoid any potential liability issues for the company.
As things stand, companies currently have the following options, for example:
- Disclosing any publicly available use cases involving LLM-generated content or tools.
- Reviewing and updating any contracts with third parties to include clauses clarifying that the organization is not liable for any output generated with LLM tools.
- Developing training for any staff within the organization who interact with or develop LLM tools to empower responsible use.
For many companies right now, it’s a struggle to keep up with emerging regulations. Proposed regulations like the EU’s AI Act, Canada’s Artificial Intelligence and Data Act, and China’s Generative AI Regulation all represent a complex web of potential legal factors.
But as these regulations take shape, it is becoming clear that they do share common principles. From a legal perspective, it’s well worth taking a close look at these principles and actioning the right next steps to ensure they lay the groundwork for your organization’s use of LLM. Examples include:
- Accountability and data protection – Companies are responsible for ensuring compliance with any relevant AI and data protection regulations.
- Human supervision – Individuals must oversee the use of AI.
- Risk management – It is vital companies assess and mitigate risks associated with the use of AI.
- Transparency – Companies must inform anyone who interacts with AI tools that they are doing so.
Ever since OpenAI launched ChatGPT, the LLM technology it relies on has generated its fair share of buzz. People are rightly fascinated by the potential of this seemingly intelligent conversation platform. But while LLM tools like ChatGPT give the appearance of being able to perform complex tasks, we have to remember they don’t actually “think” or “understand”. They simply predict the next most likely word(s) in a sequence, making the output probabilistic not deterministic. So let’s not forget just how vulnerable the output generated by LLMs is to the risks listed above.
And in light of the significant risk, it’s vital that companies take a close look at the legal basis for their innovative new IT projects!
Born in 1972, Michael is a computer scientist, husband and father of two. He has been passionate about the data revolution for many years. At Parsionate, his focus is on external relations with customers and partners, sales and marketing.