4,061 research outputs found

    Automatic Detection of Vague Words and Sentences in Privacy Policies

    Full text link
    Website privacy policies represent the single most important source of information for users to gauge how their personal data are collected, used and shared by companies. However, privacy policies are often vague and people struggle to understand the content. Their opaqueness poses a significant challenge to both users and policy regulators. In this paper, we seek to identify vague content in privacy policies. We construct the first corpus of human-annotated vague words and sentences and present empirical studies on automatic vagueness detection. In particular, we investigate context-aware and context-agnostic models for predicting vague words, and explore auxiliary-classifier generative adversarial networks for characterizing sentence vagueness. Our experimental results demonstrate the effectiveness of proposed approaches. Finally, we provide suggestions for resolving vagueness and improving the usability of privacy policies.Comment: 10 page

    Measuring vagueness and subjectivity in texts: from symbolic to neural VAGO

    Full text link
    We present a hybrid approach to the automated measurement of vagueness and subjectivity in texts. We first introduce the expert system VAGO, we illustrate it on a small benchmark of fact vs. opinion sentences, and then test it on the larger French press corpus FreSaDa to confirm the higher prevalence of subjective markers in satirical vs. regular texts. We then build a neural clone of VAGO, based on a BERT-like architecture, trained on the symbolic VAGO scores obtained on FreSaDa. Using explainability tools (LIME), we show the interest of this neural version for the enrichment of the lexicons of the symbolic version, and for the production of versions in other languages.Comment: Paper to appear in the Proceedings of the 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT

    GDPR Privacy Policies in CLAUDETTE: Challenges of Omission, Context and Multilingualism

    Get PDF
    The latest developments in natural language processing and machine learning have created new opportunities in legal text analysis. In particular, we look at the texts of online privacy policies after the implementation of the European General Data Protection Regulation (GDPR). We analyse 32 privacy policies to design a methodology for automated detection and assessment of compliance of these documents. Preliminary results confirm the pressing issues with current privacy policies and the beneficial use of this approach in empowering consumers in making more informed decisions. However, we also encountered several serious issues in the process. This paper introduces the challenges through concrete examples of context dependence, omission of information, and multilingualism

    Using machine learning for automated detection of ambiguity in building requirements

    Get PDF
    The rule interpretation step is yet to be fully automated in the compliance checking process, hindering the automation of compliance checking. Whilst existing research has developed numerous methods for automated interpretation of building requirements, none can identify ambiguous requirements. As part of interpreting ambiguous clauses automatically, this research proposed a supervised machine learning method to detect ambiguity automatically, where the best-performing model achieved recall, precision and accuracy scores of 99.0%, 71.1%, and 78.2%, respectively. This research contributes to the body of knowledge by developing a method for automated detection of ambiguity in building requirements to support automated compliance checking
    • …
    corecore