160 research outputs found
From Data Fusion to Knowledge Fusion
The task of {\em data fusion} is to identify the true values of data items
(eg, the true date of birth for {\em Tom Cruise}) among multiple observed
values drawn from different sources (eg, Web sites) of varying (and unknown)
reliability. A recent survey\cite{LDL+12} has provided a detailed comparison of
various fusion methods on Deep Web data. In this paper, we study the
applicability and limitations of different fusion techniques on a more
challenging problem: {\em knowledge fusion}. Knowledge fusion identifies true
subject-predicate-object triples extracted by multiple information extractors
from multiple information sources. These extractors perform the tasks of entity
linkage and schema alignment, thus introducing an additional source of noise
that is quite different from that traditionally considered in the data fusion
literature, which only focuses on factual errors in the original sources. We
adapt state-of-the-art data fusion techniques and apply them to a knowledge
base with 1.6B unique knowledge triples extracted by 12 extractors from over 1B
Web pages, which is three orders of magnitude larger than the data sets used in
previous data fusion papers. We show great promise of the data fusion
approaches in solving the knowledge fusion problem, and suggest interesting
research directions through a detailed error analysis of the methods.Comment: VLDB'201
Uncertainty-sensitive reasoning for inferring sameAs facts in linked data
albakri2016aInternational audienceDiscovering whether or not two URIs described in Linked Data -- in the same or different RDF datasets -- refer to the same real-world entity is crucial for building applications that exploit the cross-referencing of open data. A major challenge in data interlinking is to design tools that effectively deal with incomplete and noisy data, and exploit uncertain knowledge. In this paper, we model data interlinking as a reasoning problem with uncertainty. We introduce a probabilistic framework for modelling and reasoning over uncertain RDF facts and rules that is based on the semantics of probabilistic Datalog. We have designed an algorithm, ProbFR, based on this framework. Experiments on real-world datasets have shown the usefulness and effectiveness of our approach for data linkage and disambiguation
Regeneration of Scots pine (Pinus sylvestris L.) under drought
Drought-induced tree mortality is a phenomenon affecting many forest ecosystems and
is predicted to increase under ongoing climate change. Forest stability partly depends
on regeneration: the process of renewing mature forest with subsequent generations.
As seedlings are more susceptible to drought effects than mature trees, mortality of the
seedling bank can represent a major bottleneck controlling forest structure and species
composition. Scots Pine (Pinus sylvestris L.) is the most widely distributed of the
Pinus species, covering a broad latitudinal gradient of ecological conditions. The thesis
aims to deepen understanding of drought-induced mortality, while analysing intra-specific
variation in the phenotypic and metabolic profile of Pinus sylvestris seedlings
subjected to drought stress. I also consider the relevance of the results to the broader
conceptual framework of drought-induced mortality. The experiments utilise seeds
from different populations of origin (provenances) across the north-south axis of the
European range of Pinus sylvestris, in order to determine the extent of regeneration
capacity in this species under drought. Seeds were collected from different populations
(provenances) that, along with other climatic and edaphic differences, span a gradient
of water availability: from wet (Scotland) to intermediate (Austria, Poland) to dry
(Spain).
In Chapter 2, the effects of osmotic stress on the initial seedling establishment stage
were studied by comparing phenotypic responses across provenances. Seedling
germination, early growth, osmotic stress tolerance and survival were investigated
using a polyethylene glycol irrigation treatment as a proxy for rapid and severe
drought. Treatment, provenance and interaction effects were found for rate of
germination, final proportion of seeds germinated, seedling size, and superoxide
dismutase activity (an antioxidant enzyme). Root investment was affected by both
provenance and time to germination. Although there was no significant effect of
provenance on survival, a trend towards increased probability of survival under
osmotic stress was indicated for the southernmost (driest) as compared with the
northernmost (wettest) provenance.
Chapter 3 investigates the responses of older seedlings (at 10 months) to a drying down
of soil moisture for 40 days. Morphological and physiological data were collected to
assess intra-specific and intra-population variation in the seedling stress response
under drought. A metabolomics analysis using Ultra performance Liquid
chromatography followed by mass spectrometry (UPLC/MS) was carried out to
investigate whether metabolic markers could be identified that are suggestive of
heightened oxidative stress and whether populations in different climatic and edaphic
environments show variation in metabolic activity under drought. Preliminary results
suggest large intra-population variability yet clear differentiation in metabolic
responses to drought over the time course of the experiment. Univariate and
multivariate analyses indicated that among the most significant increases in response
to drought were those involved in osmoprotective and antioxidant capabilities,
including the free amino acid proline and a quercetin derivative (a flavonoid).
Interestingly, provenances, either under experimental drought or not, did not show
significantly different metabolite profiles, even though provenance and its interaction
with drought treatment did significantly affect seedling biomass and photochemical
efficiency. In Chapter 4 the effects of provenance, maternal parentage and seed weight on
germination rate, final germination percentage, as well as seedling drought responses
in biomass allocation and the expression of selected antioxidant genes were analysed.
Seed weights were measured individually and seed weight was found to have a strong
positive effect on: germination rate, seedling dry weights, and number of needles.
Expression of two antioxidant enzymes increased under drought. Seed weight was
strongly determined by provenance and maternal parentage as well as their interaction.
However, root to shoot biomass allocation depended on provenance and maternal
effects that were not mediated by seed weight effects. Principal component analysis
indicated that the Spanish provenances could be characterised by a higher root to shoot
ratio and stem weight. Specific leaf area was also found to be lowest for the Spanish
provenances
Provenance à base de semi-anneaux pour les bases de données graphes
The growing amount of data collected by sensors or generated by human interaction has led to an increasing use of graph databases, an efficient model for representing intricate data.Techniques to keep track of the history of computations applied to the data inside classical relational database systems are also topical because of their application to enforce Data Protection Regulations (e.g., GDPR).Our research work mixes the two by considering a semiring-based provenance model for navigational queries over graph databases.We first present a comprehensive survey on semiring theory and their applications in different fields of computer sciences, geared towards their relevance for our context. From the richness of the literature, we notably obtain a lower bound for the complexity of the full provenance computation in our setting.In a second part, we focus on the model itself by introducing a toolkit of provenance-aware algorithms, each targeting specific properties of the semiring of use.We notably introduce a new method based on lattice theory permitting an efficient provenance computation for complex graph queries.We propose an open-source implementation of the above-mentioned algorithms, and we conduct an experimental study over real transportation networks of large size, witnessing the practical efficiency of our approach in practical scenarios.We finally consider how this framework is positioned compared to other provenance models such as the semiring-based Datalog provenance model.We make explicit how the methods we applied for graph databases can be extended to Datalog queries, and we show how they can be seen as an extension of the semi-naĂŻve evaluation strategy.To leverage this fact, we extend the capabilities of SoufflĂ©, a state-of-the-art Datalog solver, to design an efficient provenance-aware Datalog evaluator. Experimental results based on our open-source implementation entail the fact this approach stays competitive with dedicated graph solutions, despite being more general.In a final round, we discuss on some research ideas for improving the model, and state open questions raised by our work.L'augmentation du volume de donnĂ©es collectĂ©es par des capteurs et gĂ©nĂ©rĂ©es par des interactions humaines a menĂ© Ă l'utilisation des bases de donnĂ©es orientĂ©es graphes en tant que modĂšle de reprĂ©sentation efficace pour les donnĂ©es complexes.Les techniques permettant de tracer les calculs qui ont Ă©tĂ© appliquĂ©s aux donnĂ©es au sein d'une base de donnĂ©es relationnelle classique sont sur le devant de la scĂšne, notamment grĂące Ă leur utilitĂ© pourfaire respecter les rĂ©gulations sur les donnĂ©es privĂ©es telles que le RGPD en Union EuropĂ©enne.Notre travail de recherche croise ces deux problĂ©matiques en s'intĂ©ressant Ă un modĂšle de provenance Ă base de semi-anneaux pour les requĂȘtes navigationnelles.Nous commençons par prĂ©senter une Ă©tude approfondie de la thĂ©orie des semi-anneaux et de leurs applications au sein des sciences informatiques en se concentrant sur les rĂ©sultats ayant un intĂ©rĂȘt direct pour notre travail de recherche.La richesse de la littĂ©rature sur le domaine nous a notamment permis d'obtenir une borne infĂ©rieure sur la complexitĂ© de notre modĂšle.Dans une seconde partie, nous Ă©tudions le modĂšle en lui-mĂȘme et introduisons un ensemble cohĂ©rent d'algorithmes permettant d'effectuer des calculs de provenance et adaptĂ©s aux propriĂ©tĂ©s des semi-anneaux utilisĂ©s.Nous introduisons notablement une nouvelle mĂ©thode basĂ©e sur la thĂ©orie des treillis permettant de calculer la provenance pour des requĂȘtes complexes.Nous proposons une implĂ©mentation open-source de ces algorithmes et faisons une Ă©tude expĂ©rimentale sur de larges rĂ©seaux de transport issus de la vie rĂ©elle pour attester de l'efficacitĂ© pratique de notre approche.On s'intĂ©resse finalement au positionnement de ce cadre de travail par rapport Ă d'autres modĂšles de provenance Ă base de semi-anneaux. Nous nous intĂ©ressons Ă Datalog en particulier.Nous dĂ©montrons que les mĂ©thodes que nous avons dĂ©veloppĂ©es pour les bases de donnĂ©es orientĂ©es graphes peuvent se gĂ©nĂ©raliser sur des requĂȘtes Datalog. Nous montrons de plus qu'elles peuvent ĂȘtre vues comme des gĂ©nĂ©ralisations de la mĂ©thode semi-naĂŻve.En se basant sur ce fait-lĂ , nous Ă©tendons les capacitĂ©s de SoufflĂ©, un Ă©valuateur Datalog appartenant Ă l'Ă©tat de l'art, afin d'effectuer des calculs de provenance pour des requĂȘtes Datalog.Les Ă©tudes expĂ©rimentales basĂ©es sur cette implĂ©mentation open-source confirment que cette approche reste compĂ©titive avec les solutions spĂ©cifiques pour les graphes, mais tout en Ă©tant plus gĂ©nĂ©rale.Nous terminons par une discussion sur les amĂ©liorations possibles du modĂšle et Ă©nonçons les questions ouvertes qui ont Ă©tĂ© soulevĂ©es au cours de ce travail
Recommended from our members
Extracting and Representing Entities, Types, and Relations
Making complex decisions in areas like science, government policy, finance, and clinical treatments all require integrating and reasoning over disparate data sources. While some decisions can be made from a single source of information, others require considering multiple pieces of evidence and how they relate to one another. Knowledge graphs (KGs) provide a natural approach for addressing this type of problem: they can serve as long-term stores of abstracted knowledge organized around concepts and their relationships, and can be populated from heterogeneous sources including databases and text. KGs can facilitate higher level reasoning, influence the interpretation of new data, and serve as a scaffolding for knowledge that enhances the acquisition of new information. A symbolic graph over a fixed, human-defined schema encoding facts about entities and their relations is the predominant method for representing knowledge, but this approach is brittle, lacks specificity, and is inevitably highly incomplete. On the other extreme, recent work on purely text-based knowledge models lack abstractions necessary for complex reasoning.
In this thesis I will present work incorporating neural models, rich structured ontologies, and unstructured raw text for representing knowledge. I will first discuss my work enhancing universal schema, a method for learning a latent schema over both existing structured resources and unstructured free text, embedding them jointly within a shared semantic space. Next, I inject additional hierarchical structure into the embedding space of concepts, resulting in more efficient statistical sharing among related concepts and improved accuracy in both fine-grained entity typing and linking. I then present initial work representing knowledge in context, including a single model for extracting all entities and long-range relations simultaneously over full paragraphs while jointly linking these entities to a KG. I will conclude by discussing possible future directions for representing knowledge in context
Advances in Methane Production from Coal, Shale and Other Tight Rocks
This collection reports on the state of the art in fundamental discipline application in hydrocarbon production and associated challenges in geoengineering activities. Zheng et al. (2022) report an NMR-based method for multiphase methane characterization in coals. Wang et al. (2022) studied the genesis of bedding fractures in Ordovician to Silurian marine shale in the Sichuan basin. Kang et al. (2022) proposed research focusing on the prediction of shale gas production from horizontal wells. Liang et al. (2022) studied the pore structure of marine shale by adsorption method in terms of molecular interaction. Zhang et al. (2022) focus on the coal measures sandstones in the Xishanyao Formation, southern Junggar Basin, and the sandstone diagenetic characteristics are fully revealed. Yao et al. (2022) report the source-to-sink system in the Ledong submarine channel and the Dongfang submarine fan in the Yinggehai Basin, South China Sea. There are four papers focusing on the technologies associated with hydrocarbon productions. Wang et al. (2022) reported the analysis of pre-stack inversion in a carbonate karst reservoir. Chen et al. (2022) conducted an inversion study on the parameters of cascade coexisting gas-bearing reservoirs in coal measures in Huainan. To ensure the safety CCS, Zhang et al (2022) report their analysis of available conditions for InSAR surface deformation monitoring. Additionally, to ensure production safety in coal mines, Zhang et al. (2022) report the properties and application of gel materials for coal gangue control
- âŠ