3,292 research outputs found
AI-enabled Automation for Completeness Checking of Privacy Policies
Technological advances in information sharing have raised concerns about data
protection. Privacy policies contain privacy-related requirements about how the
personal data of individuals will be handled by an organization or a software
system (e.g., a web service or an app). In Europe, privacy policies are subject
to compliance with the General Data Protection Regulation (GDPR). A
prerequisite for GDPR compliance checking is to verify whether the content of a
privacy policy is complete according to the provisions of GDPR. Incomplete
privacy policies might result in large fines on violating organization as well
as incomplete privacy-related software specifications. Manual completeness
checking is both time-consuming and error-prone. In this paper, we propose
AI-based automation for the completeness checking of privacy policies. Through
systematic qualitative methods, we first build two artifacts to characterize
the privacy-related provisions of GDPR, namely a conceptual model and a set of
completeness criteria. Then, we develop an automated solution on top of these
artifacts by leveraging a combination of natural language processing and
supervised machine learning. Specifically, we identify the GDPR-relevant
information content in privacy policies and subsequently check them against the
completeness criteria. To evaluate our approach, we collected 234 real privacy
policies from the fund industry. Over a set of 48 unseen privacy policies, our
approach detected 300 of the total of 334 violations of some completeness
criteria correctly, while producing 23 false positives. The approach thus has a
precision of 92.9% and recall of 89.8%. Compared to a baseline that applies
keyword search only, our approach results in an improvement of 24.5% in
precision and 38% in recall
A Taxonomy for Mining and Classifying Privacy Requirements in Issue Reports
Digital and physical footprints are a trail of user activities collected over
the use of software applications and systems. As software becomes ubiquitous,
protecting user privacy has become challenging. With the increasing of user
privacy awareness and advent of privacy regulations and policies, there is an
emerging need to implement software systems that enhance the protection of
personal data processing. However, existing privacy regulations and policies
only provide high-level principles which are difficult for software engineers
to design and implement privacy-aware systems. In this paper, we develop a
taxonomy that provides a comprehensive set of privacy requirements based on two
well-established and widely-adopted privacy regulations and frameworks, the
General Data Protection Regulation (GDPR) and the ISO/IEC 29100. These
requirements are refined into a level that is implementable and easy to
understand by software engineers, thus supporting them to attend to existing
regulations and standards. We have also performed a study on how two large
open-source software projects (Google Chrome and Moodle) address the privacy
requirements in our taxonomy through mining their issue reports. The paper
discusses how the collected issues were classified, and presents the findings
and insights generated from our study.Comment: Submitted to IEEE Transactions on Software Engineering on 23 December
202
Improving Requirements Completeness: Automated Assistance through Large Language Models
Natural language (NL) is arguably the most prevalent medium for expressing
systems and software requirements. Detecting incompleteness in NL requirements
is a major challenge. One approach to identify incompleteness is to compare
requirements with external sources. Given the rise of large language models
(LLMs), an interesting question arises: Are LLMs useful external sources of
knowledge for detecting potential incompleteness in NL requirements? This
article explores this question by utilizing BERT. Specifically, we employ
BERT's masked language model (MLM) to generate contextualized predictions for
filling masked slots in requirements. To simulate incompleteness, we withhold
content from the requirements and assess BERT's ability to predict terminology
that is present in the withheld content but absent in the disclosed content.
BERT can produce multiple predictions per mask. Our first contribution is
determining the optimal number of predictions per mask, striking a balance
between effectively identifying omissions in requirements and mitigating noise
present in the predictions. Our second contribution involves designing a
machine learning-based filter to post-process BERT's predictions and further
reduce noise. We conduct an empirical evaluation using 40 requirements
specifications from the PURE dataset. Our findings indicate that: (1) BERT's
predictions effectively highlight terminology that is missing from
requirements, (2) BERT outperforms simpler baselines in identifying relevant
yet missing terminology, and (3) our filter significantly reduces noise in the
predictions, enhancing BERT's effectiveness as a tool for completeness checking
of requirements.Comment: Submitted to Requirements Engineering Journal (REJ) - REFSQ'23
Special Issue. arXiv admin note: substantial text overlap with
arXiv:2302.0479
- …