10 research outputs found

    Towards Enabling Schema Reuse with Privacy Constraints

    Get PDF
    As the number of schema repositories grows rapidly and several web-based platforms exist to support publishing schemas, \emph{schema reuse} becomes a new trend. Schema reuse is a methodology that allows users to create new schemas by copying and adapting existing ones. This methodology supports to reduce not only the effort of designing new schemas but also the heterogeneity between them. One of the biggest barriers of schema reuse is privacy concerns that discourage the participants from contributing their schemas. Addressing this problem, we develop a framework that enables privacy-preserving schema reuse. To this end, our framework supports users to define their own protection policies in the form of \emph{privacy constraints}. Instead of showing original schemas, the framework returns an \emph{anonymized schema} with maximal \emph{utility} while satisfying these privacy constraints. To validate our approach, we empirically show the efficiency of different heuristics, the correctness of the proposed utility function, the computation time, as well as the trade-off between utility and privacy

    Web Data Integration for Non-Expert Users

    Get PDF
    oday, there is an abundance of structured data available on the web in the form of RDF graphs and relational (i.e., tabular) data. This data comes from heterogeneous sources, and realizing its full value requires integrating these sources so that they can be queried together. Due to the scale and heterogeneity of the data sources on the web, integrating them is typically an automatic process. However, automatic data integration approaches are not completely accurate since they infer semantics from syntax in data sources with a high degree of heterogeneity. Therefore, these automatic approaches can be considered as a first step to quickly get reasonable quality data integration output that can be used in issuing queries over the data sources. A second step is refining this output over time while it is being used. Interacting with the data sources through the output of the data integration system and refining this output requires expertise in data management, which limits the scope of this activity to power users and consequently limits the usability of data integration systems. This thesis focuses on helping non-expert users to access heterogeneous data sources through data integration systems, without requiring the users to have prior knowledge of the queried data sources or exposing them to the details of the output of the data integration system. In addition, the users can provide feedback over the answers to their queries, which can then be used to refine and improve the quality of the data integration output. The thesis studies both RDF and relational data. For RDF data, the thesis focuses on helping non-expert users to query heterogeneous RDF data sources, and utilizing their feedback over query answers to improve the quality of the interlinking between these data sources. For relational data, the thesis focuses on improving the quality of the mediated schema for a set of relational data sources and the semantic mappings between these sources based on user feedback over query answers

    A framework for information integration using ontological foundations

    Get PDF
    With the increasing amount of data, ability to integrate information has always been a competitive advantage in information management. Semantic heterogeneity reconciliation is an important challenge of many information interoperability applications such as data exchange and data integration. In spite of a large amount of research in this area, the lack of theoretical foundations behind semantic heterogeneity reconciliation techniques has resulted in many ad-hoc approaches. In this thesis, I address this issue by providing ontological foundations for semantic heterogeneity reconciliation in information integration. In particular, I investigate fundamental semantic relations between properties from an ontological point of view and show how one of the basic and natural relations between properties – inferring implicit properties from existing properties – can be used to enhance information integration. These ontological foundations have been exploited in four aspects of information integration. First, I propose novel algorithms for semantic enrichment of schema mappings. Second, using correspondences between similar properties at different levels of abstraction, I propose a configurable data integration system, in which query rewriting techniques allows the tradeoff between accuracy and completeness in query answering. Third, to keep the semantics in data exchange, I propose an entity preserving data exchange approach that reflects source entities in the target independent of classification of entities. Finally, to improve the efficiency of the data exchange approach proposed in this thesis, I propose an extended model of the column-store model called sliced column store. Working prototypes of the techniques proposed in this thesis are implemented to show the feasibility of realizing these techniques. Experiments that have been performed using various datasets show the techniques proposed in this thesis outperform many existing techniques in terms of ability to handle semantic heterogeneities and performance of information exchange

    Relaxed Functional Dependencies - A Survey of Approaches

    Get PDF
    Recently, there has been a renovated interest in functional dependencies due to the possibility of employing them in several advanced database operations, such as data cleaning, query relaxation, record matching, and so forth. In particular, the constraints defined for canonical functional dependencies have been relaxed to capture inconsistencies in real data, patterns of semantically related data, or semantic relationships in complex data types. In this paper, we have surveyed 35 of such functional dependencies, providing a classification criteria, motivating examples, and a systematic analysis of them

    Extensible metadata management framework for personal data lake

    Get PDF
    Common Internet users today are inundated with a deluge of diverse data being generated and siloed in a variety of digital services, applications, and a growing body of personal computing devices as we enter the era of the Internet of Things. Alongside potential privacy compromises, users are facing increasing difficulties in managing their data and are losing control over it. There appears to be a de facto agreement in business and scientific fields that there is critical new value and interesting insight that can be attained by users from analysing their own data, if only it can be freed from its silos and combined with other data in meaningful ways. This thesis takes the point of view that users should have an easy-to-use modern personal data management solution that enables them to centralise and efficiently manage their data by themselves, under their full control, for their best interests, with minimum time and efforts. In that direction, we describe the basic architecture of a management solution that is designed based on solid theoretical foundations and state of the art big data technologies. This solution (called Personal Data Lake - PDL) collects the data of a user from a plurality of heterogeneous personal data sources and stores it into a highly-scalable schema-less storage repository. To simplify the user-experience of PDL, we propose a novel extensible metadata management framework (MMF) that: (i) annotates heterogeneous data with rich lineage and semantic metadata, (ii) exploits the garnered metadata for automating data management workflows in PDL – with extensive focus on data integration, and (iii) facilitates the use and reuse of the stored data for various purposes by querying it on the metadata level either directly by the user or through third party personal analytics services. We first show how the proposed MMF is positioned in PDL architecture, and then describe its principal components. Specifically, we introduce a simple yet effective lineage manager for tracking the provenance of personal data in PDL. We then introduce an ontology-based data integration component called SemLinker which comprises two new algorithms; the first concerns generating graph-based representations to express the native schemas of (semi) structured personal data, and the second algorithm metamodels the extracted representations to a common extensible ontology. SemLinker outputs are utilised by MMF to generate user-tailored unified views that are optimised for querying heterogeneous personal data through low-level SPARQL or high-level SQL-like queries. Next, we introduce an unsupervised automatic keyphrase extraction algorithm called SemCluster that specialises in extracting thematically important keyphrases from unstructured data, and associating each keyphrase with ontological information drawn from an extensible WordNet-based ontology. SemCluster outputs serve as semantic metadata and are utilised by MMF to annotate unstructured contents in PDL, thus enabling various management functionalities such as relationship discovery and semantic search. Finally, we describe how MMF can be utilised to perform holistic integration of personal data and jointly querying it in native representations

    Actor & Avatar: A Scientific and Artistic Catalog

    Get PDF
    What kind of relationship do we have with artificial beings (avatars, puppets, robots, etc.)? What does it mean to mirror ourselves in them, to perform them or to play trial identity games with them? Actor & Avatar addresses these questions from artistic and scholarly angles. Contributions on the making of "technical others" and philosophical reflections on artificial alterity are flanked by neuroscientific studies on different ways of perceiving living persons and artificial counterparts. The contributors have achieved a successful artistic-scientific collaboration with extensive visual material
    corecore