Search CORE

14 research outputs found

Adaptive Hardening of Large Language Model Security

Author: Maltzman Brandon
Namer Assaf
Publication venue: Technical Disclosure Commons
Publication date: 31/07/2024
Field of study

Current techniques to secure large language models against prompt attacks (injection, hacking, and other attacks) are typically static and do not adjust to the continuously evolving prompt attack landscape. They are expensive to operate, introduce latency, and add management/monitoring overhead by requiring ongoing tuning. This disclosure describes techniques to dynamically adjust the security configurations of a large language model (LLM) based on the current risk evaluations of prompt attacks and other vulnerabilities. An attack classification model, trained over various types of prompt attacks, infers an appropriate control profile for the current threat situation and the input prompt sequence and response. The attack classification model adjusts security controls to harden the system against prompt attacks. By leveraging an AI model to automatically fine-tune and match the security controls of an LLM to the threat presented, manual monitoring/tuning is obviated. Users enjoy improved efficiency, throughput, and responsiveness, while the LLM operator enjoys hardened and automated protection at a reduced cost

Technical Disclosure Common

A Cost-Effective Method to Prevent Data Exfiltration from LLM Prompt Responses

Author: Maltzman Brandon
Miller Jim
Namer Assaf
Vagts Hauke
Publication venue: Technical Disclosure Commons
Publication date: 14/11/2023
Field of study

Large language models (LLMs) are susceptible to security risks wherein malicious attackers can manipulate LLMs by poisoning their training data or using malicious text prompts or queries designed to cause the LLM to return output that includes sensitive or confidential information, e.g., that is part of the LLM training dataset. This disclosure describes the use of a data loss prevention (DLP) system to protect LLMs against data exfiltration. The DLP system can be configured to detect specific data types that are to be prevented from being leaked. The LLM output, generated in response to a query from an application or user, is passed through the DLP system which generates a risk score for the LLM output. If the risk score is above a predefined threshold, the LLM output is provided to an additional pre-trained model that has been trained to detect sensitive or confidential data. The output is modified to block, mask, redact, or otherwise remove the sensitive data. The modified output is provided to the application or user. In certain cases, the output may indicate that no response can be provided due to a policy violation

Technical Disclosure Common

Automatically Detecting Expensive Prompts and Configuring Firewall Rules to Mitigate Denial of Service Attacks on Large Language Models

Author: Jeansson Erik
Kulkarni Prashant
Maltzman Brandon
Namer Assaf
Vagts Hauke
Publication venue: Technical Disclosure Commons
Publication date: 29/01/2024
Field of study

Denial of service attacks on generative artificial intelligence systems, e.g., large language models (LLMs), can include sending LLMs requests that include expensive prompts designed to consume computing resources and degrade model performance. This disclosure describes techniques to automatically detect such prompts and then configure firewall rules that prevent such prompts in subsequent requests from reaching the LLM. Per the techniques, prompts provided to an LLM are matched against input and output token size as well as resource utilization to identify prompts that deviate significantly from a baseline. Expensive prompts are identified, and semantically similar prompts are automatically generated using the same LLM or another model. A subset of the generated prompts that are semantics similar to expensive prompts are identified by comparing respective vector embeddings. The subset of prompts and the received expensive prompts are provided to a pre-trained LLM that generates firewall rules, e.g., web application firewall (WAF) rules. Incoming requests from applications are evaluated based on the rules, and expensive prompts are blocked from reaching the LLM or are rate-limited

Technical Disclosure Common

Training Dataset Validation to Protect Machine Learning Models from Data Poisoning

Author: Maltzman Brandon
Miller Jim
Namer Assaf
Rinkevich Guy
Vagts Hauke
Publication venue: Technical Disclosure Commons
Publication date: 02/01/2024
Field of study

Large language models (LLMs) and other machine learning models are trained on large amounts of curated text data, including both public and private datasets of variable quality. Data collection, cleaning, deduplication, and filtering are performed to build appropriate training datasets. However, such operations cannot protect the trained model against data poisoning (i.e., the intentional corruption of training data) that attempts to manipulate or compromise the behavior of the model. This disclosure describes techniques to improve data security and integrity of the training dataset for LLMs via data validation of a subset (or all) of the data points within the dataset available for training. A data validation policy configuration (specified by the entity that is training and/ or tuning the model) is used to determine a level of confidence of correctness of the data by validating it against different sources. Data that is flagged during validation can be marked/ labeled as less reliable or can be excluded during model training. Model responses can include metadata that indicates a data confidence score for each data point in the response

Technical Disclosure Common

AI-based Adaptive Load Balancer for Secure Access to Large Language Models

Author: Diaz Hector
Maltzman Brandon
Miller Jim
Namer Assaf
Vagts Hauke
Publication venue: Technical Disclosure Commons
Publication date: 31/10/2023
Field of study

Different large language models (LLMs) specialized to domains such as writing code, engaging in conversations, generating content, etc. are available. A specialized LLM can only reliably answer questions in domains over which it has been trained. Large numbers of types of specialized LLMs can make it difficult for a user, such as an application that generates LLM queries, to choose the right type of LLM. This disclosure describes techniques to automatically route query payloads between large language models specialized for different domains. The techniques utilize a vector database to semantically match an LLM to a user query. The techniques also provide a real-time feedback and adaptation mechanism. Security checks and access controls are applied in a centralized manner while adhering to security compliance regimes. The techniques provide improved end-to-end security posture of AI-based applications and user experience. The techniques can also reduce the costs of querying large LLMs

Technical Disclosure Common

Tenant Data Security for LLM Applications in Multi-Tenancy Environment

Author: Bisson Jason
Kulkarni Prashant
Maltzman Brandon
Miller Jim
Namer Assaf
Vagts Hauke
Publication venue: Technical Disclosure Commons
Publication date: 12/01/2024
Field of study

Large language models (LLMs) and other types of generative artificial intelligence can be used in a wide variety of business applications. However, there is a possibility of data leakage from LLM responses when an LLM is used in shared multi-tenant environments where each tenant has respective private datasets. Deploying individual adapter layers for each tenant can provide data isolation. However, such implementations can be complex and costly. This disclosure describes techniques to create and maintain a single model that can serve multiple tenants, with security controls for multi-tenancy services to isolate customer data efficiently. Data for different tenants is signed with their respective tenant-specific keys and is then appended with the tenant-specific signature prior to training/tuning a model or use by the model at inference time. When a business application of a particular tenant requests a response from the LLM, the response is generated using the adapter layer. The response includes data citations that are verified prior to the response being provided to the business application. The verification is based on the tenant-specific signature in the citation to ensure that only data that belongs to the particular tenant that requested the response is included

Technical Disclosure Common

Virtual Machine Images Preconfigured with Security Scripts for Data Protection and Alerting

Author: Beanish Devon
Ellis Scott Tyler
Maltzman Brandon
Miller Jim
Namer Assaf
Vagts Hauke
Publication venue: Technical Disclosure Commons
Publication date: 20/12/2023
Field of study

Developers use interactive development environments (IDEs) to create and share documents that contain live code, equations, visualizations, narrative text, etc. as part of the artificial intelligence/ machine learning (AI/ML) development process. Virtual machines that run IDEs may have access to private and/or sensitive data used during model training or use. For data security and compliance, it is necessary to highlight and track the VMs that have been in contact with sensitive information. This disclosure describes techniques to automatically identify and label the presence of sensitive data in virtual machines and disks as part of machine learning. Custom VM images are provided that include data scanning scripts that can identify the presence of sensitive data during or after usage, e.g., by a developer using an IDE. The scripts can automatically log the presence of data and generate alerts. Users of such virtual machines are provided additional controls to perform the training process in a secure and confidential manner in compliance with applicable data regulations

Technical Disclosure Common