14 research outputs found

    Automatic Detection of Vague Words and Sentences in Privacy Policies

    Full text link
    Website privacy policies represent the single most important source of information for users to gauge how their personal data are collected, used and shared by companies. However, privacy policies are often vague and people struggle to understand the content. Their opaqueness poses a significant challenge to both users and policy regulators. In this paper, we seek to identify vague content in privacy policies. We construct the first corpus of human-annotated vague words and sentences and present empirical studies on automatic vagueness detection. In particular, we investigate context-aware and context-agnostic models for predicting vague words, and explore auxiliary-classifier generative adversarial networks for characterizing sentence vagueness. Our experimental results demonstrate the effectiveness of proposed approaches. Finally, we provide suggestions for resolving vagueness and improving the usability of privacy policies.Comment: 10 page

    ELICA: An Automated Tool for Dynamic Extraction of Requirements Relevant Information

    Full text link
    Requirements elicitation requires extensive knowledge and deep understanding of the problem domain where the final system will be situated. However, in many software development projects, analysts are required to elicit the requirements from an unfamiliar domain, which often causes communication barriers between analysts and stakeholders. In this paper, we propose a requirements ELICitation Aid tool (ELICA) to help analysts better understand the target application domain by dynamic extraction and labeling of requirements-relevant knowledge. To extract the relevant terms, we leverage the flexibility and power of Weighted Finite State Transducers (WFSTs) in dynamic modeling of natural language processing tasks. In addition to the information conveyed through text, ELICA captures and processes non-linguistic information about the intention of speakers such as their confidence level, analytical tone, and emotions. The extracted information is made available to the analysts as a set of labeled snippets with highlighted relevant terms which can also be exported as an artifact of the Requirements Engineering (RE) process. The application and usefulness of ELICA are demonstrated through a case study. This study shows how pre-existing relevant information about the application domain and the information captured during an elicitation meeting, such as the conversation and stakeholders' intentions, can be captured and used to support analysts achieving their tasks.Comment: 2018 IEEE 26th International Requirements Engineering Conference Workshop

    Do Mobile App Providers Try Enough to Protect Users’ Privacy? – a Content Analysis of Mobile App Privacy Policies

    Get PDF
    Privacy policies are widely used to draw clear image of risks to users’ personal information in different contexts such as mobile apps. Nonetheless, many believe privacy policies are ineffective tools to notify and aware users about possible risks to information privacy merely because most users have a very low tendency to go through privacy policies to read and comprehend them. Due to intimacy of mobile apps, much of personal information disclosed to them are at risk. Specially, when mobile app users share sensitive personal information to apps chance of privacy violation and consequent risks are higher. It is not only important to understand how mobile developers practically implement a contract to protect users’ privacy based on users’ preferences but also crucial to examine the role of sensitivity of information on developers’ emphasis on different aspects of privacy. This research focuses on two aspects to understand the circumstance users experience when privacy policies are presented: efforts users have to make to read and understand privacy policies in terms of readability and length of statements, and developers’ emphasis on aspects of information privacy with respect to sensitivity of information. To elucidate easiness of reading privacy policy statements, readability and length are calculated. Through the lens of framing concept of prospect theory, this study investigates the information sensitivity level effect on developers’ emphasis on privacy dimensions. Three mobile app categories deal with different levels of sensitive data are health, navigation, and game apps. To differentiate between emphasis on different privacy dimensions when information sensitivity differs, a text mining method is developed in R to analyze the weights of four key privacy dimensions (collection, secondary use, improper access, and error). We downloaded 90 unique mobile app privacy policies. Readability calculations reveal that users should have a minimum of 12 years of secondary education to easily understand privacy policies. The average length of privacy policies is at least 1900 words, which hinders a thorough reading. ANOVA results show a significant difference between secondary uses of information in app privacy policies dealing with higher sensitive data. In addition, the findings demonstrate collection is more emphasized in health than game app privacy policies but do not find any significant difference between improper access dimensions. This study has made two key contributions. First, by building upon the framing concept of prospect theory, this research provides an effective framework to understand the organizational perspective of privacy concerns. Second, the results demonstrate the information sensitivity level is important for measuring privacy concerns

    Assesing Completeness of Solvency and Financial Condition Reports through the use of Machine Learning and Text Classification

    Get PDF
    Text mining is a method for extracting useful information from unstructured data through the identification and exploration of large amounts of text. It is a valuable support tool for organisations. It enables a greater understanding and identification of relevant business insights from text. Critically it identifies connections between information within texts that would otherwise go unnoticed. Its application is prevalent in areas such as marketing and political science however, until recently it has been largely overlooked within economics. Central banks are beginning to investigate the benefits of machine learning, sentiment analysis and natural language processing in light of the large amount of unstructured data available to them. This includes news articles, financial contracts, social media, supervisory and market intelligence and regulatory reports. In this research paper a dataset consisting of regulatory required Solvency and Financial Condition Reports (SFCR) is analysed to determine if machine learning and text classification can assist assessing the completeness of SFCRs. The completeness is determined by whether or not the document adheres to nine European guidelines. Natural language processing and supervised machine learning techniques are implemented to classify pages of the report as belonging to one of the guidelines

    Automatic definition of engineer archetypes: A text mining approach

    Get PDF
    With the rapid and continuous advancements in technology, as well as the constantly evolving competences required in the field of engineering, there is a critical need for the harmonization and unification of engineering professional figures or archetypes. The current limitations in tymely defining and updating engineers' archetypes are attributed to the absence of a structured and automated approach for processing educational and occupational data sources that evolve over time. This study aims to enhance the definition of professional figures in engineering by automating archetype definitions through text mining and adopting a more objective and structured methodology based on topic modeling. This will expand the use of archetypes as a common language, bridging the gap between educational and occupational frameworks by providing a unified and up-to-date engineering professional figure tailored to a specific period, specialization type, and level. We validate the automatically defined industrial engineer archetype against our previously manually defined profile

    Customer Rating Reactions Can Be Predicted Purely Using App Features

    Get PDF
    In this paper we provide empirical evidence that the rating that an app attracts can be accurately predicted from the features it offers. Our results, based on an analysis of 11,537 apps from the Samsung Android and BlackBerry World app stores, indicate that the rating of 89% of these apps can be predicted with 100% accuracy. Our prediction model is built by using feature and rating information from the existing apps offered in the App Store and it yields highly accurate rating predictions, using only a few (11-12) existing apps for case-based prediction. These findings may have important implications for require- ments engineering in app stores: They indicate that app devel- opers may be able to obtain (very accurate) assessments of the customer reaction to their proposed feature sets (requirements), thereby providing new opportunities to support the requirements elicitation process for app developers

    An ebd-enabled design knowledge acquisition framework

    Get PDF
    Having enough knowledge and keeping it up to date enables designers to execute the design assignment effectively and gives them a competitive advantage in the design profession. Knowledge elicitation or acquisition is a crucial component of system design, particularly for tasks requiring transdisciplinary or multidisciplinary cooperation. In system design, extracting domain-specific information is exceedingly tricky for designers. This thesis presents three works that attempt to bridge the gap between designers and domain expertise. First, a systematic literature review on data-driven demand elicitation is given using the Environment-based Design (EBD) approach. This review address two research objectives: (i) to investigate the present state of computer-aided requirement knowledge elicitation in the domains of engineering; (ii) to integrate EBD methodology into the conventional literature review framework by providing a well-structured research question generation methodology. The second study describes a data-driven interview transcript analysis strategy that employs EBD environment analysis, unsupervised machine learning, and a range of natural language processing (NLP) approaches to assist designers and qualitative researchers in extracting needs when domain expertise is lacking. The second research proposes a transfer-learning method-based qualitative text analysis framework that aids researchers in extracting valuable knowledge from interview data for healthcare promotion decision-making. The third work is an EBD-enabled design lexical knowledge acquisition framework that automatically constructs a semantic network -- RomNet from an extensive collection of abstracts from engineering publications. Applying RomNet can improve the design information retrieval quality and communication between each party involved in a design project. To conclude, this thesis integrates artificial intelligence techniques, such as Natural Language Processing (NLP) methods, Machine Learning techniques, and rule-based systems to build a knowledge acquisition framework that supports manual, semi-automatic, and automatic extraction of design knowledge from different types of the textual data source

    Regulatory Compliance-oriented Impediments and Associated Effort Estimation Metrics in Requirements Engineering for Contractual Systems Engineering Projects

    Get PDF
    Large-scale contractual systems engineering projects often need to comply with a myriad of government regulations and standards as part of contractual fulfillment. A key activity in the requirements engineering (RE) process for such a project is to elicit appropriate requirements from the regulations and standards that apply to the target system. However, there are impediments in achieving compliance due to such factors as: the voluminous contract and its high-level specifications, large number of regulatory documents, and multiple domains of the system. Little empirical research has been conducted on developing a shared understanding of the compliance-oriented complexities involved in such projects, and identifying and developing RE support (such as processes, tools, metrics, and methods) to improve overall performance for compliance projects. Through three studies on an industrial RE project, we investigated a number of issues in RE concerning compliance, leading to the following novel results:(i) a meta-model that captures artefacts-types and their compliance-oriented inter-relationships that exist in RE for contractual systems engineering projects; (ii) discovery of key impediments to requirements-compliance due to: (a) contractual complexities (e.g., regulatory requirements specified non-contiguously with non-regulatory requirements in the contract at the ratio of 1:19), (b) complexities in regulatory documents (e.g., over 300 regulatory documents being relevant to the subject system), and (c) large and complex system (e.g., 40% of the contractual regulatory requirements are cross-cutting); (iii) a method for deriving base metrics for estimating the effort needed to do compliance work during RE and demonstrate how a set of derived metrics can be used to create an effort estimation model for such work; (iv) a framework for structuring diverse regulatory documents and requirements for global product developments. These results lay a foundation in RE research on compliance issues with anticipation for its impact in real-world projects and in RE research
    corecore