8,847 research outputs found
Automated Big Text Security Classification
In recent years, traditional cybersecurity safeguards have proven ineffective
against insider threats. Famous cases of sensitive information leaks caused by
insiders, including the WikiLeaks release of diplomatic cables and the Edward
Snowden incident, have greatly harmed the U.S. government's relationship with
other governments and with its own citizens. Data Leak Prevention (DLP) is a
solution for detecting and preventing information leaks from within an
organization's network. However, state-of-art DLP detection models are only
able to detect very limited types of sensitive information, and research in the
field has been hindered due to the lack of available sensitive texts. Many
researchers have focused on document-based detection with artificially labeled
"confidential documents" for which security labels are assigned to the entire
document, when in reality only a portion of the document is sensitive. This
type of whole-document based security labeling increases the chances of
preventing authorized users from accessing non-sensitive information within
sensitive documents. In this paper, we introduce Automated Classification
Enabled by Security Similarity (ACESS), a new and innovative detection model
that penetrates the complexity of big text security classification/detection.
To analyze the ACESS system, we constructed a novel dataset, containing
formerly classified paragraphs from diplomatic cables made public by the
WikiLeaks organization. To our knowledge this paper is the first to analyze a
dataset that contains actual formerly sensitive information annotated at
paragraph granularity.Comment: Pre-print of Best Paper Award IEEE Intelligence and Security
Informatics (ISI) 2016 Manuscrip
Towards Safer Operations: An Expert-involved Dataset of High-Pressure Gas Incidents for Preventing Future Failures
This paper introduces a new IncidentAI dataset for safety prevention.
Different from prior corpora that usually contain a single task, our dataset
comprises three tasks: named entity recognition, cause-effect extraction, and
information retrieval. The dataset is annotated by domain experts who have at
least six years of practical experience as high-pressure gas conservation
managers. We validate the contribution of the dataset in the scenario of safety
prevention. Preliminary results on the three tasks show that NLP techniques are
beneficial for analyzing incident reports to prevent future failures. The
dataset facilitates future research in NLP and incident management communities.
The access to the dataset is also provided (the IncidentAI dataset is available
at: https://github.com/Cinnamon/incident-ai-dataset).Comment: Accepted by EMNLP 2023 (The Industry Track
Unintended Memorization and Timing Attacks in Named Entity Recognition Models
Named entity recognition models (NER), are widely used for identifying named
entities (e.g., individuals, locations, and other information) in text
documents. Machine learning based NER models are increasingly being applied in
privacy-sensitive applications that need automatic and scalable identification
of sensitive information to redact text for data sharing. In this paper, we
study the setting when NER models are available as a black-box service for
identifying sensitive information in user documents and show that these models
are vulnerable to membership inference on their training datasets. With updated
pre-trained NER models from spaCy, we demonstrate two distinct membership
attacks on these models. Our first attack capitalizes on unintended
memorization in the NER's underlying neural network, a phenomenon NNs are known
to be vulnerable to. Our second attack leverages a timing side-channel to
target NER models that maintain vocabularies constructed from the training
data. We show that different functional paths of words within the training
dataset in contrast to words not previously seen have measurable differences in
execution time. Revealing membership status of training samples has clear
privacy implications, e.g., in text redaction, sensitive words or phrases to be
found and removed, are at risk of being detected in the training dataset. Our
experimental evaluation includes the redaction of both password and health
data, presenting both security risks and privacy/regulatory issues. This is
exacerbated by results that show memorization with only a single phrase. We
achieved 70% AUC in our first attack on a text redaction use-case. We also show
overwhelming success in the timing attack with 99.23% AUC. Finally we discuss
potential mitigation approaches to realize the safe use of NER models in light
of the privacy and security implications of membership inference attacks.Comment: This is the full version of the paper with the same title accepted
for publication in the Proceedings of the 23rd Privacy Enhancing Technologies
Symposium, PETS 202
Data Leaks Detection Mechanism for Small Businesses
The protection of sensitive customer information is a vital responsibility for companies of all sizes. In modern times, there is a significant need for not only protecting the data that is being shared but also gaining knowledge of its leakage points and the circumstances under which it is compromised. After locating the location where data is being lost, it is necessary to identify the person responsible for the breach. When it comes to protecting a company from suffering significant financial damage because of data leakage throughout the course of normal business operations, it is very essential to have a solid understanding of the individuals who are responsible for leaking the data. This study tries to discover how small firms might be assisted in protecting the sensitive information that they own. This study\u27s objective is to determine how sites of companies react to attacks that are damaging to their operations so that appropriate action may be taken
Privacy Leakage in Mobile Computing: Tools, Methods, and Characteristics
The number of smartphones, tablets, sensors, and connected wearable devices
are rapidly increasing. Today, in many parts of the globe, the penetration of
mobile computers has overtaken the number of traditional personal computers.
This trend and the always-on nature of these devices have resulted in
increasing concerns over the intrusive nature of these devices and the privacy
risks that they impose on users or those associated with them. In this paper,
we survey the current state of the art on mobile computing research, focusing
on privacy risks and data leakage effects. We then discuss a number of methods,
recommendations, and ongoing research in limiting the privacy leakages and
associated risks by mobile computing
- …