Search CORE

160 research outputs found

From Data Fusion to Knowledge Fusion

Author: Dong Xin Luna
Gabrilovich Evgeniy
Heitz Geremy
Horn Wilko
Murphy Kevin
Sun Shaohua
Zhang Wei
Publication venue
Publication date: 01/01/2014
Field of study

The task of {\em data fusion} is to identify the true values of data items (eg, the true date of birth for {\em Tom Cruise}) among multiple observed values drawn from different sources (eg, Web sites) of varying (and unknown) reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of various fusion methods on Deep Web data. In this paper, we study the applicability and limitations of different fusion techniques on a more challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true subject-predicate-object triples extracted by multiple information extractors from multiple information sources. These extractors perform the tasks of entity linkage and schema alignment, thus introducing an additional source of noise that is quite different from that traditionally considered in the data fusion literature, which only focuses on factual errors in the original sources. We adapt state-of-the-art data fusion techniques and apply them to a knowledge base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B Web pages, which is three orders of magnitude larger than the data sets used in previous data fusion papers. We show great promise of the data fusion approaches in solving the knowledge fusion problem, and suggest interesting research directions through a detailed error analysis of the methods.Comment: VLDB'201

arXiv.org e-Print Archive

CiteSeerX

Uncertainty-sensitive reasoning for inferring sameAs facts in linked data

Author: Al-Bakri Mustafa
Atencia Manuel
David Jérôme
Lalande Steffen
Rousset Marie-Christine
Publication venue: 'IOS Press'
Publication date: 29/08/2016
Field of study

albakri2016aInternational audienceDiscovering whether or not two URIs described in Linked Data -- in the same or different RDF datasets -- refer to the same real-world entity is crucial for building applications that exploit the cross-referencing of open data. A major challenge in data interlinking is to design tools that effectively deal with incomplete and noisy data, and exploit uncertain knowledge. In this paper, we model data interlinking as a reasoning problem with uncertainty. We introduce a probabilistic framework for modelling and reasoning over uncertain RDF facts and rules that is based on the semantics of probabilistic Datalog. We have designed an algorithm, ProbFR, based on this framework. Experiments on real-world datasets have shown the usefulness and effectiveness of our approach for data linkage and disambiguation

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Regeneration of Scots pine (Pinus sylvestris L.) under drought

Author: MacAllister Sarah Louise
Publication venue: The University of Edinburgh
Publication date: 28/11/2016
Field of study

Drought-induced tree mortality is a phenomenon affecting many forest ecosystems and is predicted to increase under ongoing climate change. Forest stability partly depends on regeneration: the process of renewing mature forest with subsequent generations. As seedlings are more susceptible to drought effects than mature trees, mortality of the seedling bank can represent a major bottleneck controlling forest structure and species composition. Scots Pine (Pinus sylvestris L.) is the most widely distributed of the Pinus species, covering a broad latitudinal gradient of ecological conditions. The thesis aims to deepen understanding of drought-induced mortality, while analysing intra-specific variation in the phenotypic and metabolic profile of Pinus sylvestris seedlings subjected to drought stress. I also consider the relevance of the results to the broader conceptual framework of drought-induced mortality. The experiments utilise seeds from different populations of origin (provenances) across the north-south axis of the European range of Pinus sylvestris, in order to determine the extent of regeneration capacity in this species under drought. Seeds were collected from different populations (provenances) that, along with other climatic and edaphic differences, span a gradient of water availability: from wet (Scotland) to intermediate (Austria, Poland) to dry (Spain). In Chapter 2, the effects of osmotic stress on the initial seedling establishment stage were studied by comparing phenotypic responses across provenances. Seedling germination, early growth, osmotic stress tolerance and survival were investigated using a polyethylene glycol irrigation treatment as a proxy for rapid and severe drought. Treatment, provenance and interaction effects were found for rate of germination, final proportion of seeds germinated, seedling size, and superoxide dismutase activity (an antioxidant enzyme). Root investment was affected by both provenance and time to germination. Although there was no significant effect of provenance on survival, a trend towards increased probability of survival under osmotic stress was indicated for the southernmost (driest) as compared with the northernmost (wettest) provenance. Chapter 3 investigates the responses of older seedlings (at 10 months) to a drying down of soil moisture for 40 days. Morphological and physiological data were collected to assess intra-specific and intra-population variation in the seedling stress response under drought. A metabolomics analysis using Ultra performance Liquid chromatography followed by mass spectrometry (UPLC/MS) was carried out to investigate whether metabolic markers could be identified that are suggestive of heightened oxidative stress and whether populations in different climatic and edaphic environments show variation in metabolic activity under drought. Preliminary results suggest large intra-population variability yet clear differentiation in metabolic responses to drought over the time course of the experiment. Univariate and multivariate analyses indicated that among the most significant increases in response to drought were those involved in osmoprotective and antioxidant capabilities, including the free amino acid proline and a quercetin derivative (a flavonoid). Interestingly, provenances, either under experimental drought or not, did not show significantly different metabolite profiles, even though provenance and its interaction with drought treatment did significantly affect seedling biomass and photochemical efficiency. In Chapter 4 the effects of provenance, maternal parentage and seed weight on germination rate, final germination percentage, as well as seedling drought responses in biomass allocation and the expression of selected antioxidant genes were analysed. Seed weights were measured individually and seed weight was found to have a strong positive effect on: germination rate, seedling dry weights, and number of needles. Expression of two antioxidant enzymes increased under drought. Seed weight was strongly determined by provenance and maternal parentage as well as their interaction. However, root to shoot biomass allocation depended on provenance and maternal effects that were not mediated by seed weight effects. Principal component analysis indicated that the Spanish provenances could be characterised by a higher root to shoot ratio and stem weight. Specific leaf area was also found to be lowest for the Spanish provenances

Edinburgh Research Archive

Provenance à base de semi-anneaux pour les bases de données graphes

Author: Ramusat Yann
Publication venue: HAL CCSD
Publication date: 28/04/2022
Field of study

The growing amount of data collected by sensors or generated by human interaction has led to an increasing use of graph databases, an efficient model for representing intricate data.Techniques to keep track of the history of computations applied to the data inside classical relational database systems are also topical because of their application to enforce Data Protection Regulations (e.g., GDPR).Our research work mixes the two by considering a semiring-based provenance model for navigational queries over graph databases.We first present a comprehensive survey on semiring theory and their applications in different fields of computer sciences, geared towards their relevance for our context. From the richness of the literature, we notably obtain a lower bound for the complexity of the full provenance computation in our setting.In a second part, we focus on the model itself by introducing a toolkit of provenance-aware algorithms, each targeting specific properties of the semiring of use.We notably introduce a new method based on lattice theory permitting an efficient provenance computation for complex graph queries.We propose an open-source implementation of the above-mentioned algorithms, and we conduct an experimental study over real transportation networks of large size, witnessing the practical efficiency of our approach in practical scenarios.We finally consider how this framework is positioned compared to other provenance models such as the semiring-based Datalog provenance model.We make explicit how the methods we applied for graph databases can be extended to Datalog queries, and we show how they can be seen as an extension of the semi-naïve evaluation strategy.To leverage this fact, we extend the capabilities of Soufflé, a state-of-the-art Datalog solver, to design an efficient provenance-aware Datalog evaluator. Experimental results based on our open-source implementation entail the fact this approach stays competitive with dedicated graph solutions, despite being more general.In a final round, we discuss on some research ideas for improving the model, and state open questions raised by our work.L'augmentation du volume de données collectées par des capteurs et générées par des interactions humaines a mené à l'utilisation des bases de données orientées graphes en tant que modèle de représentation efficace pour les données complexes.Les techniques permettant de tracer les calculs qui ont été appliqués aux données au sein d'une base de données relationnelle classique sont sur le devant de la scène, notamment grâce à leur utilité pourfaire respecter les régulations sur les données privées telles que le RGPD en Union Européenne.Notre travail de recherche croise ces deux problématiques en s'intéressant à un modèle de provenance à base de semi-anneaux pour les requêtes navigationnelles.Nous commençons par présenter une étude approfondie de la théorie des semi-anneaux et de leurs applications au sein des sciences informatiques en se concentrant sur les résultats ayant un intérêt direct pour notre travail de recherche.La richesse de la littérature sur le domaine nous a notamment permis d'obtenir une borne inférieure sur la complexité de notre modèle.Dans une seconde partie, nous étudions le modèle en lui-même et introduisons un ensemble cohérent d'algorithmes permettant d'effectuer des calculs de provenance et adaptés aux propriétés des semi-anneaux utilisés.Nous introduisons notablement une nouvelle méthode basée sur la théorie des treillis permettant de calculer la provenance pour des requêtes complexes.Nous proposons une implémentation open-source de ces algorithmes et faisons une étude expérimentale sur de larges réseaux de transport issus de la vie réelle pour attester de l'efficacité pratique de notre approche.On s'intéresse finalement au positionnement de ce cadre de travail par rapport à d'autres modèles de provenance à base de semi-anneaux. Nous nous intéressons à Datalog en particulier.Nous démontrons que les méthodes que nous avons développées pour les bases de données orientées graphes peuvent se généraliser sur des requêtes Datalog. Nous montrons de plus qu'elles peuvent être vues comme des généralisations de la méthode semi-naïve.En se basant sur ce fait-là, nous étendons les capacités de Soufflé, un évaluateur Datalog appartenant à l'état de l'art, afin d'effectuer des calculs de provenance pour des requêtes Datalog.Les études expérimentales basées sur cette implémentation open-source confirment que cette approche reste compétitive avec les solutions spécifiques pour les graphes, mais tout en étant plus générale.Nous terminons par une discussion sur les améliorations possibles du modèle et énonçons les questions ouvertes qui ont été soulevées au cours de ce travail

INRIA a CCSD electronic archive server

'Carbon debt’ – lost in the forest?

Author: Bentsen Niclas Scott
Felby Claus
Graudal Lars
Madsen Palle
Publication venue: 'Commonwealth Forestry Association'
Publication date: 01/01/2014
Field of study

Copenhagen University Research Information System

Recommended from our members

Extracting and Representing Entities, Types, and Relations

Author: Verga Patrick
Publication venue: ScholarWorks@UMass Amherst
Publication date: 30/10/2019
Field of study

Making complex decisions in areas like science, government policy, finance, and clinical treatments all require integrating and reasoning over disparate data sources. While some decisions can be made from a single source of information, others require considering multiple pieces of evidence and how they relate to one another. Knowledge graphs (KGs) provide a natural approach for addressing this type of problem: they can serve as long-term stores of abstracted knowledge organized around concepts and their relationships, and can be populated from heterogeneous sources including databases and text. KGs can facilitate higher level reasoning, influence the interpretation of new data, and serve as a scaffolding for knowledge that enhances the acquisition of new information. A symbolic graph over a fixed, human-defined schema encoding facts about entities and their relations is the predominant method for representing knowledge, but this approach is brittle, lacks specificity, and is inevitably highly incomplete. On the other extreme, recent work on purely text-based knowledge models lack abstractions necessary for complex reasoning. In this thesis I will present work incorporating neural models, rich structured ontologies, and unstructured raw text for representing knowledge. I will first discuss my work enhancing universal schema, a method for learning a latent schema over both existing structured resources and unstructured free text, embedding them jointly within a shared semantic space. Next, I inject additional hierarchical structure into the embedding space of concepts, resulting in more efficient statistical sharing among related concepts and improved accuracy in both fine-grained entity typing and linking. I then present initial work representing knowledge in context, including a single model for extracting all entities and long-range relations simultaneously over full paragraphs while jointly linking these entities to a KG. I will conclude by discussing possible future directions for representing knowledge in context

ScholarWorks@UMass Amherst

Advances in Methane Production from Coal, Shale and Other Tight Rocks

Author
Publication venue: 'MDPI AG'
Publication date: 05/01/2023
Field of study

This collection reports on the state of the art in fundamental discipline application in hydrocarbon production and associated challenges in geoengineering activities. Zheng et al. (2022) report an NMR-based method for multiphase methane characterization in coals. Wang et al. (2022) studied the genesis of bedding fractures in Ordovician to Silurian marine shale in the Sichuan basin. Kang et al. (2022) proposed research focusing on the prediction of shale gas production from horizontal wells. Liang et al. (2022) studied the pore structure of marine shale by adsorption method in terms of molecular interaction. Zhang et al. (2022) focus on the coal measures sandstones in the Xishanyao Formation, southern Junggar Basin, and the sandstone diagenetic characteristics are fully revealed. Yao et al. (2022) report the source-to-sink system in the Ledong submarine channel and the Dongfang submarine fan in the Yinggehai Basin, South China Sea. There are four papers focusing on the technologies associated with hydrocarbon productions. Wang et al. (2022) reported the analysis of pre-stack inversion in a carbonate karst reservoir. Chen et al. (2022) conducted an inversion study on the parameters of cascade coexisting gas-bearing reservoirs in coal measures in Huainan. To ensure the safety CCS, Zhang et al (2022) report their analysis of available conditions for InSAR surface deformation monitoring. Additionally, to ensure production safety in coal mines, Zhang et al. (2022) report the properties and application of gel materials for coal gangue control

Directory of Open Access Books (DOAB)

Forest fires and adaptation options in Europe

Author: Camia A.
Dosio A.
Durrant T.
Khabarov N.
Krasovskii A.
Migliavacca M.
Obersteiner M.
San-Miguel-Ayanz J.
Swart R.
Publication venue: Proceedings of the XXIV IUFRO World Congress, 5-11 October 2014, Salt Lake City
Publication date: 01/10/2014
Field of study

Springer - Publisher Connector

International Institute for Applied Systems Analysis (IIASA)

Global forest management certification: future development potential

Author: Aoki K.
Fuss S.
Kraxner F.
Lunnan A.
Shchepashchenko D.
Shvidenko A.
Publication venue: Proceedings of the XXIV IUFRO World Congress, 5-11 October 2014, Salt Lake City
Publication date: 01/10/2014
Field of study

International Institute for Applied Systems Analysis (IIASA)

REDD options as a risk management instrument under policy uncertainty and market volatility

Author: Fuss S.
Khabarov N.
Obersteiner M.
Szolgayova J.
Publication venue: Proceedings of the XXIV IUFRO World Congress, 5-11 October 2014, Salt Lake City
Publication date: 01/10/2014
Field of study

International Institute for Applied Systems Analysis (IIASA)