228 research outputs found

    Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake

    Get PDF
    Data provides the basis for emerging scientific and interdisciplinary data-centric applications with the potential of improving the quality of life for citizens. Big Data plays an important role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Open data initiatives have encouraged the publication of Big Data by exploiting the decentralized nature of the Web, allowing for the availability of heterogeneous data generated and maintained by autonomous data providers. Consequently, the growing volume of data consumed by different applications raise the need for effective data integration approaches able to process a large volume of data that is represented in different format, schema and model, which may also include sensitive data, e.g., financial transactions, medical procedures, or personal data. Data Lakes are composed of heterogeneous data sources in their original format, that reduce the overhead of materialized data integration. Query processing over Data Lakes require the semantic description of data collected from heterogeneous data sources. A Data Lake with such semantic annotations is referred to as a Semantic Data Lake. Transforming Big Data into actionable knowledge demands novel and scalable techniques for enabling not only Big Data ingestion and curation to the Semantic Data Lake, but also for efficient large-scale semantic data integration, exploration, and discovery. Federated query processing techniques utilize source descriptions to find relevant data sources and find efficient execution plan that minimize the total execution time and maximize the completeness of answers. Existing federated query processing engines employ a coarse-grained description model where the semantics encoded in data sources are ignored. Such descriptions may lead to the erroneous selection of data sources for a query and unnecessary retrieval of data, affecting thus the performance of query processing engine. In this thesis, we address the problem of federated query processing against heterogeneous data sources in a Semantic Data Lake. First, we tackle the challenge of knowledge representation and propose a novel source description model, RDF Molecule Templates, that describe knowledge available in a Semantic Data Lake. RDF Molecule Templates (RDF-MTs) describes data sources in terms of an abstract description of entities belonging to the same semantic concept. Then, we propose a technique for data source selection and query decomposition, the MULDER approach, and query planning and optimization techniques, Ontario, that exploit the characteristics of heterogeneous data sources described using RDF-MTs and provide a uniform access to heterogeneous data sources. We then address the challenge of enforcing privacy and access control requirements imposed by data providers. We introduce a privacy-aware federated query technique, BOUNCER, able to enforce privacy and access control regulations during query processing over data sources in a Semantic Data Lake. In particular, BOUNCER exploits RDF-MTs based source descriptions in order to express privacy and access control policies as well as their automatic enforcement during source selection, query decomposition, and planning. Furthermore, BOUNCER implements query decomposition and optimization techniques able to identify query plans over data sources that not only contain the relevant entities to answer a query, but also are regulated by policies that allow for accessing these relevant entities. Finally, we tackle the problem of interest based update propagation and co-evolution of data sources. We present a novel approach for interest-based RDF update propagation that consistently maintains a full or partial replication of large datasets and deal with co-evolution

    SPARQL Query Result Explanation for Linked Data

    Get PDF
    International audienceIn this paper, we present an approach to explain SPARQL query results for Linked Data using why-provenance. We present a non-annotation-based algorithm to generate why-provenance and show its feasibility for Linked Data. We present an explanation-aware federated query processor prototype and show the presentation of our explanations. We present a user study to evaluate the impacts of our explanations. Our study shows that our query result explanations are helpful for end users to understand the result derivations and make trust judgments on the results

    Defining the concept of ‘tick repellency’ in veterinary medicine

    Get PDF
    Although widely used, the term repellency needs to be employed with care when applied to ticks and other periodic or permanent ectoparasites. Repellency has classically been used to describe the effects of a substance that causes a flying arthropod to make oriented movements away from its source. However, for crawling arthropods such as ticks, the term commonly subsumes a range of effects that include arthropod irritation and consequent avoiding or leaving the host, failing to attach, to bite, or to feed. The objective of the present article is to highlight the need for clarity, to propose consensus descriptions and methods for the evaluation of various effects on ticks caused by chemical substances

    Atmospheric and oceanic conditions associated with early and late onset for Eastern Africa short rains

    Get PDF
    Timing of the rainy season is essential for a number of climate sensitive sectors over Eastern Africa. This is particularly true for the agricultural sector, where most activities depend on both the spatial and temporal distribution of rainfall throughout the season. Using a combination of observational and reanalysis datasets, the present study investigates the atmospheric and oceanic conditions associated with early and late onset for Eastern Africa short rains season (October–December). Our results indicate enhanced rainfall in October and November during years with early onset and rainfall deficit in years with late onset for the same months. Early onset years are found to be associated with warmer sea surface temperatures (SSTs) in the western Indian Ocean, and an enhanced moisture flux and anomalous low-level flow into Eastern Africa from as early as the first dekad of September. The late onset years are characterized by cooler SSTs in the western Indian Ocean, anomalous westerly moisture flux and zonal flow limiting moisture supply to the region. The variability in onset date is separated into the interannual and decadal components, and the links with SSTs and low-level circulation over the Indian Ocean basin are examined separately for both timescales. Significant correlations are found between the interannual variability of the onset and the Indian Ocean dipole mode index. On decadal timescales the onset is shown to be partly driven by the variability of the SSTs over the Indian Ocean. Understanding the influence of these potentially predictable SST and moisture patterns on onset variability has huge potential to improve forecasts of the East African short rains. Improved prediction of the variability of the rainy season onset has huge implications for improving key strategic decisions and preparedness action in many sectors, including agriculture

    Genomic Characterization of Cholangiocarcinoma in Primary Sclerosing Cholangitis Reveals Therapeutic Opportunities

    Get PDF
    Background and Aims Lifetime risk of biliary tract cancer (BTC) in primary sclerosing cholangitis (PSC) may exceed 20%, and BTC is currently the leading cause of death in patients with PSC. To open new avenues for management, we aimed to delineate clinically relevant genomic and pathological features of a large panel of PSC-associated BTC (PSC-BTC). Approach and Results We analyzed formalin-fixed, paraffin-embedded tumor tissue from 186 patients with PSC-BTC from 11 centers in eight countries with all anatomical locations included. We performed tumor DNA sequencing at 42 clinically relevant genetic loci to detect mutations, translocations, and copy number variations, along with histomorphological and immunohistochemical characterization. Regardless of the anatomical localization, PSC-BTC exhibited a uniform molecular and histological characteristic similar to extrahepatic cholangiocarcinoma. We detected a high frequency of genomic alterations typical of extrahepatic cholangiocarcinoma, such asTP53(35.5%),KRAS(28.0%),CDKN2A(14.5%), andSMAD4(11.3%), as well as potentially druggable mutations (e.g.,HER2/ERBB2). We found a high frequency of nontypical/nonductal histomorphological subtypes (55.2%) and of the usually rare BTC precursor lesion, intraductal papillary neoplasia (18.3%). Conclusions Genomic alterations in PSC-BTC include a significant number of putative actionable therapeutic targets. Notably, PSC-BTC shows a distinct extrahepatic morpho-molecular phenotype, independent of the anatomical location of the tumor. These findings advance our understanding of PSC-associated cholangiocarcinogenesis and provide strong incentives for clinical trials to test genome-based personalized treatment strategies in PSC-BTC.Peer reviewe
    corecore