3,139 research outputs found

    Privacy-Preserving Ontology Publishing for EL Instance Stores: Extended Version

    Get PDF
    We make a first step towards adapting an existing approach for privacypreserving publishing of linked data to Description Logic (DL) ontologies. We consider the case where both the knowledge about individuals and the privacy policies are expressed using concepts of the DL EL, which corresponds to the setting where the ontology is an EL instance store. We introduce the notions of compliance of a concept with a policy and of safety of a concept for a policy, and show how optimal compliant (safe) generalizations of a given EL concept can be computed. In addition, we investigate the complexity of the optimality problem

    Reasoning in Description Logic Ontologies for Privacy Management

    Get PDF
    A rise in the number of ontologies that are integrated and distributed in numerous application systems may provide the users to access the ontologies with different privileges and purposes. In this situation, preserving confidential information from possible unauthorized disclosures becomes a critical requirement. For instance, in the clinical sciences, unauthorized disclosures of medical information do not only threaten the system but also, most importantly, the patient data. Motivated by this situation, this thesis initially investigates a privacy problem, called the identity problem, where the identity of (anonymous) objects stored in Description Logic ontologies can be revealed or not. Then, we consider this problem in the context of role-based access control to ontologies and extend it to the problem asking if the identity belongs to a set of known individuals of cardinality smaller than the number k. If it is the case that some confidential information of persons, such as their identity, their relationships or their other properties, can be deduced from an ontology, which implies that some privacy policy is not fulfilled, then one needs to repair this ontology such that the modified one complies with the policies and preserves the information from the original ontology as much as possible. The repair mechanism we provide is called gentle repair and performed via axiom weakening instead of axiom deletion which was commonly used in classical approaches of ontology repair. However, policy compliance itself is not enough if there is a possible attacker that can obtain relevant information from other sources, which together with the modified ontology still violates the privacy policies. Safety property is proposed to alleviate this issue and we investigate this in the context of privacy-preserving ontology publishing. Inference procedures to solve those privacy problems and additional investigations on the complexity of the procedures, as well as the worst-case complexity of the problems, become the main contributions of this thesis.:1. Introduction 1.1 Description Logics 1.2 Detecting Privacy Breaches in Information System 1.3 Repairing Information Systems 1.4 Privacy-Preserving Data Publishing 1.5 Outline and Contribution of the Thesis 2. Preliminaries 2.1 Description Logic ALC 2.1.1 Reasoning in ALC Ontologies 2.1.2 Relationship with First-Order Logic 2.1.3. Fragments of ALC 2.2 Description Logic EL 2.3 The Complexity of Reasoning Problems in DLs 3. The Identity Problem and Its Variants in Description Logic Ontologies 3.1 The Identity Problem 3.1.1 Description Logics with Equality Power 3.1.2 The Complexity of the Identity Problem 3.2 The View-Based Identity Problem 3.3 The k-Hiding Problem 3.3.1 Upper Bounds 3.3.2 Lower Bound 4. Repairing Description Logic Ontologies 4.1 Repairing Ontologies 4.2 Gentle Repairs 4.3 Weakening Relations 4.4 Weakening Relations for EL Axioms 4.4.1 Generalizing the Right-Hand Sides of GCIs 4.4.2 Syntactic Generalizations 4.5 Weakening Relations for ALC Axioms 4.5.1 Generalizations and Specializations in ALC w.r.t. Role Depth 4.5.2 Syntactical Generalizations and Specializations in ALC 5. Privacy-Preserving Ontology Publishing for EL Instance Stores 5.1 Formalizing Sensitive Information in EL Instance Stores 5.2 Computing Optimal Compliant Generalizations 5.3 Computing Optimal Safe^{\exists} Generalizations 5.4 Deciding Optimality^{\exists} in EL Instance Stores 5.5 Characterizing Safety^{\forall} 5.6 Optimal P-safe^{\forall} Generalizations 5.7 Characterizing Safety^{\forall\exists} and Optimality^{\forall\exists} 6. Privacy-Preserving Ontology Publishing for EL ABoxes 6.1 Logical Entailments in EL ABoxes with Anonymous Individuals 6.2 Anonymizing EL ABoxes 6.3 Formalizing Sensitive Information in EL ABoxes 6.4 Compliance and Safety for EL ABoxes 6.5 Optimal Anonymizers 7. Conclusion 7.1 Main Results 7.2 Future Work Bibliograph

    Computing Compliant Anonymisations of Quantified ABoxes w.r.t. EL Policies

    Get PDF
    We adapt existing approaches for privacy-preserving publishing of linked data to a setting where the data are given as Description Logic (DL) ABoxes with possibly anonymised (formally: existentially quantified) individuals and the privacy policies are expressed using sets of concepts of the DL EL. We provide a chacterization of compliance of such ABoxes w.r.t. EL policies, and show how optimal compliant anonymisations of ABoxes that are non-compliant can be computed. This work extends previous work on privacy-preserving ontology publishing, in which a very restricted form of ABoxes, called instance stores, had been considered, but restricts the attention to compliance. The approach developed here can easily be adapted to the problem of computing optimal repairs of quantified ABoxes

    Privacy-Preserving Ontology Publishing:: The Case of Quantified ABoxes w.r.t. a Static Cycle-Restricted EL TBox: Extended Version

    Get PDF
    We review our recent work on how to compute optimal repairs, optimal compliant anonymizations, and optimal safe anonymizations of ABoxes containing possibly anonymized individuals. The results can be used both to remove erroneous consequences from a knowledge base and to hide secret information before publication of the knowledge base, while keeping as much as possible of the original information.Updated on August 27, 2021. This is an extended version of an article accepted at DL 2021

    Enriching information extraction pipelines in clinical decision support systems

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] Os estudos sanitarios de múltiples centros son importantes para aumentar a repercusión dos resultados da investigación médica debido ao número de suxeitos que poden participar neles. Para simplificar a execución destes estudos, o proceso de intercambio de datos debería ser sinxelo, por exemplo, mediante o uso de bases de datos interoperables. Con todo, a consecución desta interoperabilidade segue sendo un tema de investigación en curso, sobre todo debido aos problemas de gobernanza e privacidade dos datos. Na primeira fase deste traballo, propoñemos varias metodoloxías para optimizar os procesos de estandarización das bases de datos sanitarias. Este traballo centrouse na estandarización de fontes de datos heteroxéneas nun esquema de datos estándar, concretamente o OMOP CDM, que foi desenvolvido e promovido pola comunidade OHDSI. Validamos a nosa proposta utilizando conxuntos de datos de pacientes con enfermidade de Alzheimer procedentes de distintas institucións. Na seguinte etapa, co obxectivo de enriquecer a información almacenada nas bases de datos de OMOP CDM, investigamos solucións para extraer conceptos clínicos de narrativas non estruturadas, utilizando técnicas de recuperación de información e de procesamento da linguaxe natural. A validación realizouse a través de conxuntos de datos proporcionados en desafíos científicos, concretamente no National NLP Clinical Challenges(n2c2). Na etapa final, propuxémonos simplificar a execución de protocolos de estudos provenientes de múltiples centros, propoñendo solucións novas para perfilar, publicar e facilitar o descubrimento de bases de datos. Algunhas das solucións desenvolvidas están a utilizarse actualmente en tres proxectos europeos destinados a crear redes federadas de bases de datos de saúde en toda Europa.[Resumen] Los estudios sanitarios de múltiples centros son importantes para aumentar la repercusión de los resultados de la investigación médica debido al número de sujetos que pueden participar en ellos. Para simplificar la ejecución de estos estudios, el proceso de intercambio de datos debería ser sencillo, por ejemplo, mediante el uso de bases de datos interoperables. Sin embargo, la consecución de esta interoperabilidad sigue siendo un tema de investigación en curso, sobre todo debido a los problemas de gobernanza y privacidad de los datos. En la primera fase de este trabajo, proponemos varias metodologías para optimizar los procesos de estandarización de las bases de datos sanitarias. Este trabajo se centró en la estandarización de fuentes de datos heterogéneas en un esquema de datos estándar, concretamente el OMOP CDM, que ha sido desarrollado y promovido por la comunidad OHDSI. Validamos nuestra propuesta utilizando conjuntos de datos de pacientes con enfermedad de Alzheimer procedentes de distintas instituciones. En la siguiente etapa, con el objetivo de enriquecer la información almacenada en las bases de datos de OMOP CDM, hemos investigado soluciones para extraer conceptos clínicos de narrativas no estructuradas, utilizando técnicas de recuperación de información y de procesamiento del lenguaje natural. La validación se realizó a través de conjuntos de datos proporcionados en desafíos científicos, concretamente en el National NLP Clinical Challenges (n2c2). En la etapa final, nos propusimos simplificar la ejecución de protocolos de estudios provenientes de múltiples centros, proponiendo soluciones novedosas para perfilar, publicar y facilitar el descubrimiento de bases de datos. Algunas de las soluciones desarrolladas se están utilizando actualmente en tres proyectos europeos destinados a crear redes federadas de bases de datos de salud en toda Europa.[Abstract] Multicentre health studies are important to increase the impact of medical research findings due to the number of subjects that they are able to engage. To simplify the execution of these studies, the data-sharing process should be effortless, for instance, through the use of interoperable databases. However, achieving this interoperability is still an ongoing research topic, namely due to data governance and privacy issues. In the first stage of this work, we propose several methodologies to optimise the harmonisation pipelines of health databases. This work was focused on harmonising heterogeneous data sources into a standard data schema, namely the OMOP CDM which has been developed and promoted by the OHDSI community. We validated our proposal using data sets of Alzheimer’s disease patients from distinct institutions. In the following stage, aiming to enrich the information stored in OMOP CDM databases, we have investigated solutions to extract clinical concepts from unstructured narratives, using information retrieval and natural language processing techniques. The validation was performed through datasets provided in scientific challenges, namely in the National NLP Clinical Challenges (n2c2). In the final stage, we aimed to simplify the protocol execution of multicentre studies, by proposing novel solutions for profiling, publishing and facilitating the discovery of databases. Some of the developed solutions are currently being used in three European projects aiming to create federated networks of health databases across Europe

    Semantics-based Privacy by Design for Internet of Things Applications

    Get PDF
    As Internet of Things (IoT) technologies become more widespread in everyday life, privacy issues are becoming more prominent. The aim of this research is to develop a personal assistant that can answer software engineers' questions about Privacy by Design (PbD) practices during the design phase of IoT system development. Semantic web technologies are used to model the knowledge underlying PbD measurements, their intersections with privacy patterns, IoT system requirements and the privacy patterns that should be applied across IoT systems. This is achieved through the development of the PARROT ontology, developed through a set of representative IoT use cases relevant for software developers. This was supported by gathering Competency Questions (CQs) through a series of workshops, resulting in 81 curated CQs. These CQs were then recorded as SPARQL queries, and the developed ontology was evaluated using the Common Pitfalls model with the help of the Prot\'eg\'e HermiT Reasoner and the Ontology Pitfall Scanner (OOPS!), as well as evaluation by external experts. The ontology was assessed within a user study that identified that the PARROT ontology can answer up to 58\% of privacy-related questions from software engineers

    Ontology-based Access Control in Open Scenarios: Applications to Social Networks and the Cloud

    Get PDF
    La integració d'Internet a la societat actual ha fet possible compartir fàcilment grans quantitats d'informació electrònica i recursos informàtics (que inclouen maquinari, serveis informàtics, etc.) en entorns distribuïts oberts. Aquests entorns serveixen de plataforma comuna per a usuaris heterogenis (per exemple, empreses, individus, etc.) on es proporciona allotjament d'aplicacions i sistemes d'usuari personalitzades; i on s'ofereix un accés als recursos compartits des de qualsevol lloc i amb menys esforços administratius. El resultat és un entorn que permet a individus i empreses augmentar significativament la seva productivitat. Com ja s'ha dit, l'intercanvi de recursos en entorns oberts proporciona importants avantatges per als diferents usuaris, però, també augmenta significativament les amenaces a la seva privacitat. Les dades electròniques compartides poden ser explotades per tercers (per exemple, entitats conegudes com "Data Brokers"). Més concretament, aquestes organitzacions poden agregar la informació compartida i inferir certes característiques personals sensibles dels usuaris, la qual cosa pot afectar la seva privacitat. Una manera de del.liar aquest problema consisteix a controlar l'accés dels usuaris als recursos potencialment sensibles. En concret, la gestió de control d'accés regula l'accés als recursos compartits d'acord amb les credencials dels usuaris, el tipus de recurs i les preferències de privacitat dels propietaris dels recursos/dades. La gestió eficient de control d'accés és crucial en entorns grans i dinàmics. D'altra banda, per tal de proposar una solució viable i escalable, cal eliminar la gestió manual de regles i restriccions (en la qual, la majoria de les solucions disponibles depenen), atès que aquesta constitueix una pesada càrrega per a usuaris i administradors . Finalment, la gestió del control d'accés ha de ser intuïtiu per als usuaris finals, que en general no tenen grans coneixements tècnics.La integración de Internet en la sociedad actual ha hecho posible compartir fácilmente grandes cantidades de información electrónica y recursos informáticos (que incluyen hardware, servicios informáticos, etc.) en entornos distribuidos abiertos. Estos entornos sirven de plataforma común para usuarios heterogéneos (por ejemplo, empresas, individuos, etc.) donde se proporciona alojamiento de aplicaciones y sistemas de usuario personalizadas; y donde se ofrece un acceso ubicuo y con menos esfuerzos administrativos a los recursos compartidos. El resultado es un entorno que permite a individuos y empresas aumentar significativamente su productividad. Como ya se ha dicho, el intercambio de recursos en entornos abiertos proporciona importantes ventajas para los distintos usuarios, no obstante, también aumenta significativamente las amenazas a su privacidad. Los datos electrónicos compartidos pueden ser explotados por terceros (por ejemplo, entidades conocidas como “Data Brokers”). Más concretamente, estas organizaciones pueden agregar la información compartida e inferir ciertas características personales sensibles de los usuarios, lo cual puede afectar a su privacidad. Una manera de paliar este problema consiste en controlar el acceso de los usuarios a los recursos potencialmente sensibles. En concreto, la gestión de control de acceso regula el acceso a los recursos compartidos de acuerdo con las credenciales de los usuarios, el tipo de recurso y las preferencias de privacidad de los propietarios de los recursos/datos. La gestión eficiente de control de acceso es crucial en entornos grandes y dinámicos. Por otra parte, con el fin de proponer una solución viable y escalable, es necesario eliminar la gestión manual de reglas y restricciones (en la cual, la mayoría de las soluciones disponibles dependen), dado que ésta constituye una pesada carga para usuarios y administradores. Por último, la gestión del control de acceso debe ser intuitivo para los usuarios finales, que por lo general carecen de grandes conocimientos técnicos.Thanks to the advent of the Internet, it is now possible to easily share vast amounts of electronic information and computer resources (which include hardware, computer services, etc.) in open distributed environments. These environments serve as a common platform for heterogeneous users (e.g., corporate, individuals etc.) by hosting customized user applications and systems, providing ubiquitous access to the shared resources and requiring less administrative efforts; as a result, they enable users and companies to increase their productivity. Unfortunately, sharing of resources in open environments has significantly increased the privacy threats to the users. Indeed, shared electronic data may be exploited by third parties, such as Data Brokers, which may aggregate, infer and redistribute (sensitive) personal features, thus potentially impairing the privacy of the individuals. A way to palliate this problem consists on controlling the access of users over the potentially sensitive resources. Specifically, access control management regulates the access to the shared resources according to the credentials of the users, the type of resource and the privacy preferences of the resource/data owners. The efficient management of access control is crucial in large and dynamic environments such as the ones described above. Moreover, in order to propose a feasible and scalable solution, we need to get rid of manual management of rules/constraints (in which most available solutions rely) that constitutes a serious burden for the users and the administrators. Finally, access control management should be intuitive for the end users, who usually lack technical expertise, and they may find access control mechanism more difficult to understand and rigid to apply due to its complex configuration settings

    Big Data and the Internet of Things

    Full text link
    Advances in sensing and computing capabilities are making it possible to embed increasing computing power in small devices. This has enabled the sensing devices not just to passively capture data at very high resolution but also to take sophisticated actions in response. Combined with advances in communication, this is resulting in an ecosystem of highly interconnected devices referred to as the Internet of Things - IoT. In conjunction, the advances in machine learning have allowed building models on this ever increasing amounts of data. Consequently, devices all the way from heavy assets such as aircraft engines to wearables such as health monitors can all now not only generate massive amounts of data but can draw back on aggregate analytics to "improve" their performance over time. Big data analytics has been identified as a key enabler for the IoT. In this chapter, we discuss various avenues of the IoT where big data analytics either is already making a significant impact or is on the cusp of doing so. We also discuss social implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski (eds.) Big Data Analysis: New algorithms for a new society, Springer Series on Studies in Big Data, to appea

    Viewpoints on emergent semantics

    Get PDF
    Authors include:Philippe Cudr´e-Mauroux, and Karl Aberer (editors), Alia I. Abdelmoty, Tiziana Catarci, Ernesto Damiani, Arantxa Illaramendi, Robert Meersman, Erich J. Neuhold, Christine Parent, Kai-Uwe Sattler, Monica Scannapieco, Stefano Spaccapietra, Peter Spyns, and Guy De Tr´eWe introduce a novel view on how to deal with the problems of semantic interoperability in distributed systems. This view is based on the concept of emergent semantics, which sees both the representation of semantics and the discovery of the proper interpretation of symbols as the result of a self-organizing process performed by distributed agents exchanging symbols and having utilities dependent on the proper interpretation of the symbols. This is a complex systems perspective on the problem of dealing with semantics. We highlight some of the distinctive features of our vision and point out preliminary examples of its applicatio
    corecore