19 research outputs found

    Hybrid Refining Approach of PrOnto Ontology

    Get PDF
    This paper presents a refinement of PrOnto ontology using a validation test based on legal experts’ annotation of privacy policies combined with an Open Knowledge Extraction (OKE) algorithm. To ensure robustness of the results while preserving an interdisciplinary approach, the integration of legal and technical knowledge has been carried out as follows. The set of privacy policies was first analysed by the legal experts to discover legal concepts and map the text into PrOnto. The mapping was then provided to computer scientists to perform the OKE analysis. Results were validated by the legal experts, who provided feedbacks and refinements (i.e. new classes and modules) of the ontology according to MeLOn methodology. Three iterations were performed on a set of (development) policies, and a final test using a new set of privacy policies. The results are 75,43% of detection of concepts in the policy texts and an increase of roughly 33% in the accuracy gain on the test set, using the new refined version of PrOnto enriched with SKOS-XL lexicon terms and definitions

    Building a data processing activities catalog: representing heterogeneous compliance-related information for GDPR using DCAT-AP and DPV

    Get PDF
    This paper describes a new semantic metadata-based approach to describing and integrating diverse data processing activity descriptions gathered from heterogeneous organisational sources such as departments, divisions, and external processors. This information must be collated to assess and document GDPR legal compliance, such as creating a Register of Processing Activities (ROPA). Most GDPR knowledge graph research to date has focused on developing detailed compliance graphs. However, many organisations already have diverse data collection tools for documenting data processing activities, and this heterogeneity is likely to grow in the future. We provide a new approach extending the well-known DCAT-AP standard utilising the data privacy vocabulary (DPV) to express the concepts necessary to complete a ROPA. This approach enables data catalog implementations to merge and federate the metadata for a ROPA without requiring full alignment or merging all the underlying data sources. To show our approach's feasibility, we demonstrate a deployment use case and develop a prototype system based on diverse data processing records and a standard set of SPARQL queries for a Data Protection Officer preparing a ROPA to monitor compliance. Our catalog's key benefits are that it is a lightweight, metadata-level integration point with a low cost of compliance information integration, capable of representing processing activities from heterogeneous sources

    Creating a vocabulary for data privacy : the first-year report of data privacy vocabularies and controls community group (DPVCG)

    Get PDF
    Managing privacy and understanding handling of personal data has turned into a fundamental right, at least within the European Union, with the General Data Protection Regulation (GDPR) being enforced since May 25th 2018. This has led to tools and services that promise compliance to GDPR in terms of consent management and keeping track of personal data being processed. The information recorded within such tools, as well as that for compliance itself, needs to be interoperable to provide sufficient transparency in its usage. Additionally, interoperability is also necessary towards addressing the right to data portability under GDPR as well as creation of user-configurable and manageable privacy policies. We argue that such interoperability can be enabled through agreement over vocabularies using linked data principles. The W3C Data Privacy Vocabulary and Controls Community Group (DPVCG) was set up to jointly develop such vocabularies towards interoperability in the context of data privacy. This paper presents the resulting Data Privacy Vocabulary (DPV), along with a discussion on its potential uses, and an invitation for feedback and participation

    Data protection regulation ontology for compliance

    Get PDF
    The GDPR is the current data protection regulation in Europe. A significant market demand has been created ever since GDPR came into force. This is mostly due to the fact that it can go outside of European borders if the data processed belongs to European citizens. The number of companies who require some type of regulation or standard compliance is ever-increasing and the need for cyber security and privacy specialists has never been greater. Moreover, the GDPR has inspired a series of similar regulations all over the world. This further increases the market demand and makes the work of companies who work internationally more complicated and difficult to scale. The purpose of this thesis is to help consultancy companies to automate their work by using semantic structures known as ontologies. By doing this, they can increase productivity and reduce costs. Ontologies can store data and their semantics (meaning) in a machine-readable format. In this thesis, an ontology has been designed which is meant to help consultants generate checklists (or runbooks) which they are required to deliver to their clients. The ontology is designed to handle concepts such as security measures, company information, company architecture, data sensitivity, privacy mechanisms, distinction between technical and organisational measures, and even conditionality. The ontology was evaluated using a litmus test. In the context of this ontology, the litmus test was composed of a collection of competency questions. Competency questions were collected based on the use-cases of the ontology. These questions were later translated to SPARQL queries which were run against a test ontology. The ontology has successfully passed the given litmus test. Thus, it can be concluded that the implemented functionality matches the proposed design

    “Who Should I Trust with My Data?” Ethical and Legal Challenges for Innovation in New Decentralized Data Management Technologies

    Get PDF
    News about personal data breaches or data abusive practices, such as Cambridge Analytica, has questioned the trustworthiness of certain actors in the control of personal data. Innovations in the field of personal information management systems to address this issue have regained traction in recent years, also coinciding with the emergence of new decentralized technologies. However, only with ethically and legally responsible developments will the mistakes of the past be avoided. This contribution explores how current data management schemes are insufficient to adequately safeguard data subjects, and in particular, it focuses on making these data flows transparent to provide an adequate level of accountability. To showcase this, and with the goal of enhancing transparency to foster trust, this paper investigates solutions for standardizing machine-readable policies to express personal data processing activities and their application to decentralized personal data stores as an example of ethical, legal, and technical responsible innovation in this field

    DPCat: Specification for an interoperable and machine-readable data processing catalogue based on GDPR

    Get PDF
    The GDPR requires Data Controllers and Data Protection Officers (DPO) to maintain a Register of Processing Activities (ROPA) as part of overseeing the organisation’s compliance processes. The ROPA must include information from heterogeneous sources such as (internal) departments with varying IT systems and (external) data processors. Current practices use spreadsheets or proprietary systems that lack machine-readability and interoperability, presenting barriers to automation. We propose the Data Processing Catalogue (DPCat) for the representation, collection and transfer of ROPA information, as catalogues in a machine-readable and interoperable manner. DPCat is based on the Data Catalog Vocabulary (DCAT) and its extension DCAT Application Profile for data portals in Europe (DCAT-AP), and the Data Privacy Vocabulary (DPV). It represents a comprehensive semantic model developed from GDPR’s Article and an analysis of the 17 ROPA templates from EU Data Protection Authorities (DPA). To demonstrate the practicality and feasibility of DPCat, we present the European Data Protection Supervisor’s (EDPS) ROPA documents using DPCat, verify them with SHACL to ensure the correctness of information based on legal and contextual requirements, and produce reports and ROPA documents based on DPA templates using SPARQL. DPCat supports a data governance process for data processing compliance to harmonise inputs from heterogeneous sources to produce dynamic documentation that can accommodate differences in regulatory approaches across DPAs and ease investigative burdens toward efficient enforcement

    PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data

    Get PDF
    The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to treat sensitive information. However, any treatment requires firstly to identify sensitive text, and appropriate techniques to do it automatically. The Sensitive Information Detection (SID) task has been explored in the literature in different domains and languages, but there is no common benchmark. Current approaches are mostly based on artificial neural networks (ANN) or transformers based on them. Our research focuses on identifying categories of personal data in informal English sentences, by adopting a new logical-symbolic approach, and eventually hybridising it with ANN models. We present a frame-based knowledge graph built for personal data categories defined in the Data Privacy Vocabulary (DPV). The knowledge graph is designed through the logical composition of already existing frames, and has been evaluated as background knowledge for a SID system against a labeled sensitive information dataset. The accuracy of PRIVAFRAME reached 78%. By comparison, a transformer-based model achieved 12% lower performance on the same dataset. The top-down logical-symbolic frame-based model allows a granular analysis, and does not require a training dataset. These advantages lead us to use it as a layer in a hybrid model, where the logical SID is combined with an ANNs SID tested in a previous study by the authors
    corecore