112 research outputs found

    A Way to Automatically Enrich Biomedical Ontologies

    Get PDF
    International audienceBiomedical ontologies play an important role for information extraction in the biomedical domain. We present a workflow for updating automatically biomedical ontologies, composed of four steps. We detail two contributions concerning the concept extraction and semantic linkage of extracted terminology

    Indexing query graphs to speedup graph query processing

    Get PDF
    Subgraph/supergraph queries although central to graph analytics, are costly as they entail the NP-Complete problem of subgraph isomorphism. We present a fresh solution, the novel principle of which is to acquire and utilize knowledge from the results of previously executed queries. Our approach, iGQ, encompasses two component subindexes to identify if a new query is a subgraph/supergraph of previously executed queries and stores related key information. iGQ comes with novel query processing and index space management algorithms, including graph replacement policies. The end result is a system that leads to significant reduction in the number of required subgraph isomorphism tests and speedups in query processing time. iGQ can be incorporated into any sub/supergraph query processing method and help improve performance. In fact, it is the only contribution that can speedup significantly both subgraph and supergraph query processing. We establish the principles of iGQ and formally prove its correctness. We have implemented iGQ and have incorporated it within three popular recent state of the art index-based graph query processing solutions. We evaluated its performance using real-world and synthetic graph datasets with different characteristics, and a number of query workloads, showcasing its benefits

    GROM: a general rewriter of semantic mappings

    Get PDF
    We present GROM, a tool conceived to handle high-level schema mappings between semantic descriptions of a source and a target database. GROM rewrites mappings between the virtual, view-based semantic schemas, in terms of mappings between the two physical databases, and then executes them. The system serves the purpose of teaching two main lessons. First, designing mappings among higher-level descriptions is often simpler than working with the original schemas. Second, as soon as the view-definition language becomes more expressive, to handle, for example, negation, the mapping problem becomes extremely challenging from the technical viewpoint, so that one needs to find a proper trade-off between expressiveness and scalability.Peer ReviewedPostprint (published version

    Data, Responsibly: Fairness, Neutrality and Transparency in Data Analysis

    Get PDF
    International audienceBig data technology holds incredible promise of improving people's lives, accelerating scientific discovery and innovation , and bringing about positive societal change. Yet, if not used responsibly, this technology can propel economic inequality , destabilize global markets and affirm systemic bias. While the potential benefits of big data are well-accepted, the importance of using these techniques in a fair and transparent manner is rarely considered. The primary goal of this tutorial is to draw the attention of the data management community to the important emerging subject of responsible data management and analysis. We will offer our perspective on the issue, will give an overview of existing technical work, primarily from the data mining and algorithms communities, and will motivate future research directions

    Adaptive Merging on Phase Change Memory

    Full text link
    Indexing is a well-known database technique used to facilitate data access and speed up query processing. Nevertheless, the construction and modification of indexes are very expensive. In traditional approaches, all records in the database table are equally covered by the index. It is not effective, since some records may be queried very often and some never. To avoid this problem, adaptive merging has been introduced. The key idea is to create index adaptively and incrementally as a side-product of query processing. As a result, the database table is indexed partially depending on the query workload. This paper faces a problem of adaptive merging for phase change memory (PCM). The most important features of this memory type are: limited write endurance and high write latency. As a consequence, adaptive merging should be investigated from the scratch. We solve this problem in two steps. First, we apply several PCM optimization techniques to the traditional adaptive merging approach. We prove that the proposed method (eAM) outperforms a traditional approach by 60%. After that, we invent the framework for adaptive merging (PAM) and a new PCM-optimized index. It further improves the system performance by 20% for databases where search queries interleave with data modifications

    Extending database accelerators for data transformations and predictive analytics

    Get PDF
    The IBM DB2 Analytics Accelerator (IDAA) integrates the strong OLTP capabilities of DB2 for z/OS with very fast processing of OLAP workloads using Netezza technology. The accelerator is attached to DB2 as analytical process- ing resource { completely transparent for user applications. But all data modi_cations must be carried out by DB2 and are replicated to the accelerator internally. However, this behavior is not optimized for ELT processing and predic- tive analytics or data mining workloads where multi-staged data transformations are involved. We present our work for extending IDAA with accelerator-only tables, which enable direct data transformations without any necessary interven- tions by DB2. Further, we present a framework for executing arbitrary in-database analytics operations on the accelerator while ensuring data governance aspects like privilege man- agement on DB2 and allowing to ingest data from any other source directly to the accelerator to enrich analytics e. g., with social media data. The evolutionary framework design maintains compatibility with existing infrastructure and ap- plications, a must-have for the majority of customers, while allowing complex analytics beyond read-only reporting
    • …
    corecore