Search CORE

4,061 research outputs found

Automatic Detection of Vague Words and Sentences in Privacy Policies

Author: Lebanoff Logan
Liu Fei
Publication venue
Publication date: 01/01/2018
Field of study

Website privacy policies represent the single most important source of information for users to gauge how their personal data are collected, used and shared by companies. However, privacy policies are often vague and people struggle to understand the content. Their opaqueness poses a significant challenge to both users and policy regulators. In this paper, we seek to identify vague content in privacy policies. We construct the first corpus of human-annotated vague words and sentences and present empirical studies on automatic vagueness detection. In particular, we investigate context-aware and context-agnostic models for predicting vague words, and explore auxiliary-classifier generative adversarial networks for characterizing sentence vagueness. Our experimental results demonstrate the effectiveness of proposed approaches. Finally, we provide suggestions for resolving vagueness and improving the usability of privacy policies.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Measuring vagueness and subjectivity in texts: from symbolic to neural VAGO

Author: Atemezing Ghislain
Claveau Vincent
Icard Benjamin
Égré Paul
Publication venue
Publication date: 23/10/2023
Field of study

We present a hybrid approach to the automated measurement of vagueness and subjectivity in texts. We first introduce the expert system VAGO, we illustrate it on a small benchmark of fact vs. opinion sentences, and then test it on the larger French press corpus FreSaDa to confirm the higher prevalence of subjective markers in satirical vs. regular texts. We then build a neural clone of VAGO, based on a BERT-like architecture, trained on the symbolic VAGO scores obtained on FreSaDa. Using explainability tools (LIME), we show the interest of this neural version for the enrichment of the lexicons of the symbolic version, and for the production of versions in other languages.Comment: Paper to appear in the Proceedings of the 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT

arXiv.org e-Print Archive

GDPR Privacy Policies in CLAUDETTE: Challenges of Omission, Context and Multilingualism

Author: Francesca Lagioia
Giovanni Sartor
Giuseppe Contissa
Hans-Wolfgang Micklitz
Kasper Drazewski
Marco Lippi
Paolo Torroni
Przemys\u142aw Pa\u142ka
Ruta Liepina
Publication venue: place:Aachen
Publication date: 01/01/2019
Field of study

The latest developments in natural language processing and machine learning have created new opportunities in legal text analysis. In particular, we look at the texts of online privacy policies after the implementation of the European General Data Protection Regulation (GDPR). We analyse 32 privacy policies to design a methodology for automated detection and assessment of compliance of these documents. Preliminary results confirm the pressing issues with current privacy policies and the beneficial use of this approach in empowering consumers in making more informed decisions. However, we also encountered several serious issues in the process. This paper introduces the challenges through concrete examples of context dependence, omission of information, and multilingualism

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Using machine learning for automated detection of ambiguity in building requirements

Author: Ma Ling
Zhang Zijing
Publication venue: European Council on Computing in Construction
Publication date: 01/07/2023
Field of study

The rule interpretation step is yet to be fully automated in the compliance checking process, hindering the automation of compliance checking. Whilst existing research has developed numerous methods for automated interpretation of building requirements, none can identify ambiguous requirements. As part of interpreting ambiguous clauses automatically, this research proposed a supervised machine learning method to detect ambiguity automatically, where the best-performing model achieved recall, precision and accuracy scores of 99.0%, 71.1%, and 78.2%, respectively. This research contributes to the body of knowledge by developing a method for automated detection of ambiguity in building requirements to support automated compliance checking

UCL Discovery