15 research outputs found

    Privatheit und Datenschutz in der intelligenten Überwachung: Ein datenschutzgewährendes System, entworfen nach dem ""Privacy by Design"" Prinzip

    Get PDF
    Überwachungssysteme haben sich in den letzten Jahren zu intelligenten Anlagen entwickelt. Sie erzeugen eine große Menge von sensiblen Informationen. Die rechtlichen Datenschutzgrundlagen für diese Systeme werden in dieser Arbeit herausgearbeitet. Es wird der technische Aufbau eines Überwachungssystems nach Privacy by Design aufgezeigt, das geringer in die in Privatsphäre der Betroffenen eingreift als konventionelle Systeme und dabei die technischen Vorteile intelligenter Verarbeitung bietet

    Privatheit und Datenschutz in der intelligenten Überwachung: Ein datenschutzgewährendes System, entworfen nach dem "Privacy by Design" Prinzip

    Get PDF
    Überwachungssysteme haben sich in den letzten Jahren zu intelligenten Anlagen entwickelt. Sie erzeugen eine große Menge von sensiblen Informationen. Die rechtlichen Datenschutzgrundlagen für diese Systeme werden in dieser Arbeit herausgearbeitet. Es wird der technische Aufbau eines Überwachungssystems nach Privacy by Design aufgezeigt, das geringer in die in Privatsphäre der Betroffenen eingreift als konventionelle Systeme und dabei die technischen Vorteile intelligenter Verarbeitung bietet

    A Cost-Effective Method to Prevent Data Exfiltration from LLM Prompt Responses

    Get PDF
    Large language models (LLMs) are susceptible to security risks wherein malicious attackers can manipulate LLMs by poisoning their training data or using malicious text prompts or queries designed to cause the LLM to return output that includes sensitive or confidential information, e.g., that is part of the LLM training dataset. This disclosure describes the use of a data loss prevention (DLP) system to protect LLMs against data exfiltration. The DLP system can be configured to detect specific data types that are to be prevented from being leaked. The LLM output, generated in response to a query from an application or user, is passed through the DLP system which generates a risk score for the LLM output. If the risk score is above a predefined threshold, the LLM output is provided to an additional pre-trained model that has been trained to detect sensitive or confidential data. The output is modified to block, mask, redact, or otherwise remove the sensitive data. The modified output is provided to the application or user. In certain cases, the output may indicate that no response can be provided due to a policy violation

    AI-based Adaptive Load Balancer for Secure Access to Large Language Models

    Get PDF
    Different large language models (LLMs) specialized to domains such as writing code, engaging in conversations, generating content, etc. are available. A specialized LLM can only reliably answer questions in domains over which it has been trained. Large numbers of types of specialized LLMs can make it difficult for a user, such as an application that generates LLM queries, to choose the right type of LLM. This disclosure describes techniques to automatically route query payloads between large language models specialized for different domains. The techniques utilize a vector database to semantically match an LLM to a user query. The techniques also provide a real-time feedback and adaptation mechanism. Security checks and access controls are applied in a centralized manner while adhering to security compliance regimes. The techniques provide improved end-to-end security posture of AI-based applications and user experience. The techniques can also reduce the costs of querying large LLMs

    Automatically Detecting Expensive Prompts and Configuring Firewall Rules to Mitigate Denial of Service Attacks on Large Language Models

    Get PDF
    Denial of service attacks on generative artificial intelligence systems, e.g., large language models (LLMs), can include sending LLMs requests that include expensive prompts designed to consume computing resources and degrade model performance. This disclosure describes techniques to automatically detect such prompts and then configure firewall rules that prevent such prompts in subsequent requests from reaching the LLM. Per the techniques, prompts provided to an LLM are matched against input and output token size as well as resource utilization to identify prompts that deviate significantly from a baseline. Expensive prompts are identified, and semantically similar prompts are automatically generated using the same LLM or another model. A subset of the generated prompts that are semantics similar to expensive prompts are identified by comparing respective vector embeddings. The subset of prompts and the received expensive prompts are provided to a pre-trained LLM that generates firewall rules, e.g., web application firewall (WAF) rules. Incoming requests from applications are evaluated based on the rules, and expensive prompts are blocked from reaching the LLM or are rate-limited

    Training Dataset Validation to Protect Machine Learning Models from Data Poisoning

    Get PDF
    Large language models (LLMs) and other machine learning models are trained on large amounts of curated text data, including both public and private datasets of variable quality. Data collection, cleaning, deduplication, and filtering are performed to build appropriate training datasets. However, such operations cannot protect the trained model against data poisoning (i.e., the intentional corruption of training data) that attempts to manipulate or compromise the behavior of the model. This disclosure describes techniques to improve data security and integrity of the training dataset for LLMs via data validation of a subset (or all) of the data points within the dataset available for training. A data validation policy configuration (specified by the entity that is training and/ or tuning the model) is used to determine a level of confidence of correctness of the data by validating it against different sources. Data that is flagged during validation can be marked/ labeled as less reliable or can be excluded during model training. Model responses can include metadata that indicates a data confidence score for each data point in the response

    Tenant Data Security for LLM Applications in Multi-Tenancy Environment

    Get PDF
    Large language models (LLMs) and other types of generative artificial intelligence can be used in a wide variety of business applications. However, there is a possibility of data leakage from LLM responses when an LLM is used in shared multi-tenant environments where each tenant has respective private datasets. Deploying individual adapter layers for each tenant can provide data isolation. However, such implementations can be complex and costly. This disclosure describes techniques to create and maintain a single model that can serve multiple tenants, with security controls for multi-tenancy services to isolate customer data efficiently. Data for different tenants is signed with their respective tenant-specific keys and is then appended with the tenant-specific signature prior to training/tuning a model or use by the model at inference time. When a business application of a particular tenant requests a response from the LLM, the response is generated using the adapter layer. The response includes data citations that are verified prior to the response being provided to the business application. The verification is based on the tenant-specific signature in the citation to ensure that only data that belongs to the particular tenant that requested the response is included

    Virtual Machine Images Preconfigured with Security Scripts for Data Protection and Alerting

    Get PDF
    Developers use interactive development environments (IDEs) to create and share documents that contain live code, equations, visualizations, narrative text, etc. as part of the artificial intelligence/ machine learning (AI/ML) development process. Virtual machines that run IDEs may have access to private and/or sensitive data used during model training or use. For data security and compliance, it is necessary to highlight and track the VMs that have been in contact with sensitive information. This disclosure describes techniques to automatically identify and label the presence of sensitive data in virtual machines and disks as part of machine learning. Custom VM images are provided that include data scanning scripts that can identify the presence of sensitive data during or after usage, e.g., by a developer using an IDE. The scripts can automatically log the presence of data and generate alerts. Users of such virtual machines are provided additional controls to perform the training process in a secure and confidential manner in compliance with applicable data regulations

    Privacy-aware access control for video data in intelligent surveillance systems

    No full text
    Surveillance systems became powerful. Objects can be identied and intelligent surveillance services can generate events when a specific situation occurs. Such surveillance services can be organized in a Service Oriented Architecture (SOA) to fulfill surveillance tasks for specific purposes. Therefore the services process information on a high level, e.g., just the position of an object. Video data is still required to visualize a situation to an operator and is required as evidence in court. Processing of personal related and sensitive information threatens privacy. To protect the user and to be compliant with legal requirements it must be ensured that sensitive information can only be processed for a defined propose by specific users or services. This work proposes an architecture for Access Control that enforces the separation of data between different surveillance tasks. Access controls are enforced at different levels: for the users starting the tasks, for the services within the tasks processing data stored in central store or calculated by other services and for sensor related services that extract information out of the raw data and provide them

    Access controls for privacy protection in pervasive environments

    No full text
    Pervasive Environments (PE) collect and process a massive amount of person-related and sensitive information. Data collected by a single sensor is in most cases not adequate to provide premium services. Information gathered must rather be combined to offer real benefits. The fused data must be secured by access controls to ensure privacy of the users and their trust in PE with it. This work proposes an Object-oriented World Model (OOWM) as a central information source that is filled with information collected from intelligent sensors and can be accessed and manipulated by smart application devices. It is shown how privacy can be enforced in such a centralized component. Privacy requirements must be specified and enforced. Especially conflicts in different requirements, e. g., user- and operator-specific polices, is an open issue. Existing approaches for specification and enforcement of access controls are discussed. An XACML-based approach for privacy in PE is shown and analgorithm for combining privacy policies is presented
    corecore