160 research outputs found

    From Data Fusion to Knowledge Fusion

    Get PDF
    The task of {\em data fusion} is to identify the true values of data items (eg, the true date of birth for {\em Tom Cruise}) among multiple observed values drawn from different sources (eg, Web sites) of varying (and unknown) reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of various fusion methods on Deep Web data. In this paper, we study the applicability and limitations of different fusion techniques on a more challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true subject-predicate-object triples extracted by multiple information extractors from multiple information sources. These extractors perform the tasks of entity linkage and schema alignment, thus introducing an additional source of noise that is quite different from that traditionally considered in the data fusion literature, which only focuses on factual errors in the original sources. We adapt state-of-the-art data fusion techniques and apply them to a knowledge base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B Web pages, which is three orders of magnitude larger than the data sets used in previous data fusion papers. We show great promise of the data fusion approaches in solving the knowledge fusion problem, and suggest interesting research directions through a detailed error analysis of the methods.Comment: VLDB'201

    Uncertainty-sensitive reasoning for inferring sameAs facts in linked data

    Get PDF
    albakri2016aInternational audienceDiscovering whether or not two URIs described in Linked Data -- in the same or different RDF datasets -- refer to the same real-world entity is crucial for building applications that exploit the cross-referencing of open data. A major challenge in data interlinking is to design tools that effectively deal with incomplete and noisy data, and exploit uncertain knowledge. In this paper, we model data interlinking as a reasoning problem with uncertainty. We introduce a probabilistic framework for modelling and reasoning over uncertain RDF facts and rules that is based on the semantics of probabilistic Datalog. We have designed an algorithm, ProbFR, based on this framework. Experiments on real-world datasets have shown the usefulness and effectiveness of our approach for data linkage and disambiguation

    Regeneration of Scots pine (Pinus sylvestris L.) under drought

    Get PDF
    Drought-induced tree mortality is a phenomenon affecting many forest ecosystems and is predicted to increase under ongoing climate change. Forest stability partly depends on regeneration: the process of renewing mature forest with subsequent generations. As seedlings are more susceptible to drought effects than mature trees, mortality of the seedling bank can represent a major bottleneck controlling forest structure and species composition. Scots Pine (Pinus sylvestris L.) is the most widely distributed of the Pinus species, covering a broad latitudinal gradient of ecological conditions. The thesis aims to deepen understanding of drought-induced mortality, while analysing intra-specific variation in the phenotypic and metabolic profile of Pinus sylvestris seedlings subjected to drought stress. I also consider the relevance of the results to the broader conceptual framework of drought-induced mortality. The experiments utilise seeds from different populations of origin (provenances) across the north-south axis of the European range of Pinus sylvestris, in order to determine the extent of regeneration capacity in this species under drought. Seeds were collected from different populations (provenances) that, along with other climatic and edaphic differences, span a gradient of water availability: from wet (Scotland) to intermediate (Austria, Poland) to dry (Spain). In Chapter 2, the effects of osmotic stress on the initial seedling establishment stage were studied by comparing phenotypic responses across provenances. Seedling germination, early growth, osmotic stress tolerance and survival were investigated using a polyethylene glycol irrigation treatment as a proxy for rapid and severe drought. Treatment, provenance and interaction effects were found for rate of germination, final proportion of seeds germinated, seedling size, and superoxide dismutase activity (an antioxidant enzyme). Root investment was affected by both provenance and time to germination. Although there was no significant effect of provenance on survival, a trend towards increased probability of survival under osmotic stress was indicated for the southernmost (driest) as compared with the northernmost (wettest) provenance. Chapter 3 investigates the responses of older seedlings (at 10 months) to a drying down of soil moisture for 40 days. Morphological and physiological data were collected to assess intra-specific and intra-population variation in the seedling stress response under drought. A metabolomics analysis using Ultra performance Liquid chromatography followed by mass spectrometry (UPLC/MS) was carried out to investigate whether metabolic markers could be identified that are suggestive of heightened oxidative stress and whether populations in different climatic and edaphic environments show variation in metabolic activity under drought. Preliminary results suggest large intra-population variability yet clear differentiation in metabolic responses to drought over the time course of the experiment. Univariate and multivariate analyses indicated that among the most significant increases in response to drought were those involved in osmoprotective and antioxidant capabilities, including the free amino acid proline and a quercetin derivative (a flavonoid). Interestingly, provenances, either under experimental drought or not, did not show significantly different metabolite profiles, even though provenance and its interaction with drought treatment did significantly affect seedling biomass and photochemical efficiency. In Chapter 4 the effects of provenance, maternal parentage and seed weight on germination rate, final germination percentage, as well as seedling drought responses in biomass allocation and the expression of selected antioxidant genes were analysed. Seed weights were measured individually and seed weight was found to have a strong positive effect on: germination rate, seedling dry weights, and number of needles. Expression of two antioxidant enzymes increased under drought. Seed weight was strongly determined by provenance and maternal parentage as well as their interaction. However, root to shoot biomass allocation depended on provenance and maternal effects that were not mediated by seed weight effects. Principal component analysis indicated that the Spanish provenances could be characterised by a higher root to shoot ratio and stem weight. Specific leaf area was also found to be lowest for the Spanish provenances

    Provenance à base de semi-anneaux pour les bases de données graphes

    Get PDF
    The growing amount of data collected by sensors or generated by human interaction has led to an increasing use of graph databases, an efficient model for representing intricate data.Techniques to keep track of the history of computations applied to the data inside classical relational database systems are also topical because of their application to enforce Data Protection Regulations (e.g., GDPR).Our research work mixes the two by considering a semiring-based provenance model for navigational queries over graph databases.We first present a comprehensive survey on semiring theory and their applications in different fields of computer sciences, geared towards their relevance for our context. From the richness of the literature, we notably obtain a lower bound for the complexity of the full provenance computation in our setting.In a second part, we focus on the model itself by introducing a toolkit of provenance-aware algorithms, each targeting specific properties of the semiring of use.We notably introduce a new method based on lattice theory permitting an efficient provenance computation for complex graph queries.We propose an open-source implementation of the above-mentioned algorithms, and we conduct an experimental study over real transportation networks of large size, witnessing the practical efficiency of our approach in practical scenarios.We finally consider how this framework is positioned compared to other provenance models such as the semiring-based Datalog provenance model.We make explicit how the methods we applied for graph databases can be extended to Datalog queries, and we show how they can be seen as an extension of the semi-naĂŻve evaluation strategy.To leverage this fact, we extend the capabilities of SoufflĂ©, a state-of-the-art Datalog solver, to design an efficient provenance-aware Datalog evaluator. Experimental results based on our open-source implementation entail the fact this approach stays competitive with dedicated graph solutions, despite being more general.In a final round, we discuss on some research ideas for improving the model, and state open questions raised by our work.L'augmentation du volume de donnĂ©es collectĂ©es par des capteurs et gĂ©nĂ©rĂ©es par des interactions humaines a menĂ© Ă  l'utilisation des bases de donnĂ©es orientĂ©es graphes en tant que modĂšle de reprĂ©sentation efficace pour les donnĂ©es complexes.Les techniques permettant de tracer les calculs qui ont Ă©tĂ© appliquĂ©s aux donnĂ©es au sein d'une base de donnĂ©es relationnelle classique sont sur le devant de la scĂšne, notamment grĂące Ă  leur utilitĂ© pourfaire respecter les rĂ©gulations sur les donnĂ©es privĂ©es telles que le RGPD en Union EuropĂ©enne.Notre travail de recherche croise ces deux problĂ©matiques en s'intĂ©ressant Ă  un modĂšle de provenance Ă  base de semi-anneaux pour les requĂȘtes navigationnelles.Nous commençons par prĂ©senter une Ă©tude approfondie de la thĂ©orie des semi-anneaux et de leurs applications au sein des sciences informatiques en se concentrant sur les rĂ©sultats ayant un intĂ©rĂȘt direct pour notre travail de recherche.La richesse de la littĂ©rature sur le domaine nous a notamment permis d'obtenir une borne infĂ©rieure sur la complexitĂ© de notre modĂšle.Dans une seconde partie, nous Ă©tudions le modĂšle en lui-mĂȘme et introduisons un ensemble cohĂ©rent d'algorithmes permettant d'effectuer des calculs de provenance et adaptĂ©s aux propriĂ©tĂ©s des semi-anneaux utilisĂ©s.Nous introduisons notablement une nouvelle mĂ©thode basĂ©e sur la thĂ©orie des treillis permettant de calculer la provenance pour des requĂȘtes complexes.Nous proposons une implĂ©mentation open-source de ces algorithmes et faisons une Ă©tude expĂ©rimentale sur de larges rĂ©seaux de transport issus de la vie rĂ©elle pour attester de l'efficacitĂ© pratique de notre approche.On s'intĂ©resse finalement au positionnement de ce cadre de travail par rapport Ă  d'autres modĂšles de provenance Ă  base de semi-anneaux. Nous nous intĂ©ressons Ă  Datalog en particulier.Nous dĂ©montrons que les mĂ©thodes que nous avons dĂ©veloppĂ©es pour les bases de donnĂ©es orientĂ©es graphes peuvent se gĂ©nĂ©raliser sur des requĂȘtes Datalog. Nous montrons de plus qu'elles peuvent ĂȘtre vues comme des gĂ©nĂ©ralisations de la mĂ©thode semi-naĂŻve.En se basant sur ce fait-lĂ , nous Ă©tendons les capacitĂ©s de SoufflĂ©, un Ă©valuateur Datalog appartenant Ă  l'Ă©tat de l'art, afin d'effectuer des calculs de provenance pour des requĂȘtes Datalog.Les Ă©tudes expĂ©rimentales basĂ©es sur cette implĂ©mentation open-source confirment que cette approche reste compĂ©titive avec les solutions spĂ©cifiques pour les graphes, mais tout en Ă©tant plus gĂ©nĂ©rale.Nous terminons par une discussion sur les amĂ©liorations possibles du modĂšle et Ă©nonçons les questions ouvertes qui ont Ă©tĂ© soulevĂ©es au cours de ce travail

    Advances in Methane Production from Coal, Shale and Other Tight Rocks

    Get PDF
    This collection reports on the state of the art in fundamental discipline application in hydrocarbon production and associated challenges in geoengineering activities. Zheng et al. (2022) report an NMR-based method for multiphase methane characterization in coals. Wang et al. (2022) studied the genesis of bedding fractures in Ordovician to Silurian marine shale in the Sichuan basin. Kang et al. (2022) proposed research focusing on the prediction of shale gas production from horizontal wells. Liang et al. (2022) studied the pore structure of marine shale by adsorption method in terms of molecular interaction. Zhang et al. (2022) focus on the coal measures sandstones in the Xishanyao Formation, southern Junggar Basin, and the sandstone diagenetic characteristics are fully revealed. Yao et al. (2022) report the source-to-sink system in the Ledong submarine channel and the Dongfang submarine fan in the Yinggehai Basin, South China Sea. There are four papers focusing on the technologies associated with hydrocarbon productions. Wang et al. (2022) reported the analysis of pre-stack inversion in a carbonate karst reservoir. Chen et al. (2022) conducted an inversion study on the parameters of cascade coexisting gas-bearing reservoirs in coal measures in Huainan. To ensure the safety CCS, Zhang et al (2022) report their analysis of available conditions for InSAR surface deformation monitoring. Additionally, to ensure production safety in coal mines, Zhang et al. (2022) report the properties and application of gel materials for coal gangue control

    Global forest management certification: future development potential

    Get PDF

    REDD options as a risk management instrument under policy uncertainty and market volatility

    Get PDF
    • 

    corecore