LLMs and Your Data

An oft asked question: “Is the data that I put into a Large Language Model (LLM) safe”? Or, alternatively, “If I put commercial or confidential information into an LLM, what is the chance that this information will be leaked?” There is great motivation for putting information into LLM, because LLMs are calculators for text, and they are intensely useful, for example, for summarizing, extracting data, rewriting, highlighting changes, and responding. This is a worry for CIO’s and IT departments everywhere, because, in the absence of guidelines and training, people are putting information into LLMs, due to their utility.

The thing is, whenever information is stored, there’s a chance that it will get leaked. Emails can be forwarded, computers or SharePoint can be hacked. The likelihood depends on the security of the system. Large established companies, e.g. as Microsoft, Google and Apple, tend to be more secure, because they have more resource, and more to lose if they fail. Information in an LLM is similar, in that it is held on a computer, and people try to keep it safe.

One difference with LLMs is around motivation. The information you give to the LLM would be useful for training the LLM. Just like your Google queries help Google become a better search engine, and Facebook posts help ad targeting, your LLM queries could help improve the LLM, and early LLMs did use submitted information to train their models. This is mostly changed, where almost all the leading models allow users to secure their data from being trained on.

Here are the details for each of the major models, and links to disable tracking, if possible:

ChatGPT personal or free – Training enabled, can disable. For details, see policy.
ChatGPT Teams or Enterprise – Training disabled by default. For details, see policy.
Gemini – Training enabled, can disable (but lose access to history). For details, see policy.
Claude.ai (Anthropic) – Training disabled by default (unless feedback is submitted). For details, see policy.
Deepseek – Training enabled, cannot disable.
Grok.ai (X/Twitter) – Training enabled, can disable. (via the App)
Microsoft Copilot – Training disabled. For details, see policy.

While LLMs offer significant utility for tasks like summarizing, extracting, and rewriting text, concerns about data privacy and security remain valid. The risk of information leakage exists wherever data is stored, and LLMs are no exception. However, leading providers are increasingly allowing users to control whether their data is used for training, with enterprise solutions often prioritizing data security by disabling training by default. Organizations and individuals must remain vigilant, review privacy policies, and implement guidelines to mitigate risks when using LLMs for sensitive or confidential information.

Personal Identity and Asset Security in the Age of AI

October 2, 2024 No Comments

LLMs and Your Data

more posts:

Personal Identity and Asset Security in the Age of AI