5,157 research outputs found

    Entity-centric knowledge discovery for idiosyncratic domains

    Get PDF
    Technical and scientific knowledge is produced at an ever-accelerating pace, leading to increasing issues when trying to automatically organize or process it, e.g., when searching for relevant prior work. Knowledge can today be produced both in unstructured (plain text) and structured (metadata or linked data) forms. However, unstructured content is still themost dominant formused to represent scientific knowledge. In order to facilitate the extraction and discovery of relevant content, new automated and scalable methods for processing, structuring and organizing scientific knowledge are called for. In this context, a number of applications are emerging, ranging fromNamed Entity Recognition (NER) and Entity Linking tools for scientific papers to specific platforms leveraging information extraction techniques to organize scientific knowledge. In this thesis, we tackle the tasks of Entity Recognition, Disambiguation and Linking in idiosyncratic domains with an emphasis on scientific literature. Furthermore, we study the related task of co-reference resolution with a specific focus on named entities. We start by exploring Named Entity Recognition, a task that aims to identify the boundaries of named entities in textual contents. We propose a newmethod to generate candidate named entities based on n-gram collocation statistics and design several entity recognition features to further classify them. In addition, we show how the use of external knowledge bases (either domain-specific like DBLP or generic like DBPedia) can be leveraged to improve the effectiveness of NER for idiosyncratic domains. Subsequently, we move to Entity Disambiguation, which is typically performed after entity recognition in order to link an entity to a knowledge base. We propose novel semi-supervised methods for word disambiguation leveraging the structure of a community-based ontology of scientific concepts. Our approach exploits the graph structure that connects different terms and their definitions to automatically identify the correct sense that was originally picked by the authors of a scientific publication. We then turn to co-reference resolution, a task aiming at identifying entities that appear using various forms throughout the text. We propose an approach to type entities leveraging an inverted index built on top of a knowledge base, and to subsequently re-assign entities based on the semantic relatedness of the introduced types. Finally, we describe an application which goal is to help researchers discover and manage scientific publications. We focus on the problem of selecting relevant tags to organize collections of research papers in that context. We experimentally demonstrate that the use of a community-authored ontology together with information about the position of the concepts in the documents allows to significantly increase the precision of tag selection over standard methods

    Understanding past population dynamics: Bayesian coalescent-based modeling with covariates

    Get PDF
    Effective population size characterizes the genetic variability in a population and is a parameter of paramount importance in population genetics. Kingman's coalescent process enables inference of past population dynamics directly from molecular sequence data, and researchers have developed a number of flexible coalescent-based models for Bayesian nonparametric estimation of the effective population size as a function of time. A major goal of demographic reconstruction is understanding the association between the effective population size and potential explanatory factors. Building upon Bayesian nonparametric coalescent-based approaches, we introduce a flexible framework that incorporates time-varying covariates through Gaussian Markov random fields. To approximate the posterior distribution, we adapt efficient Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. Incorporating covariates into the demographic inference framework enables the modeling of associations between the effective population size and covariates while accounting for uncertainty in population histories. Furthermore, it can lead to more precise estimates of population dynamics. We apply our model to four examples. We reconstruct the demographic history of raccoon rabies in North America and find a significant association with the spatiotemporal spread of the outbreak. Next, we examine the effective population size trajectory of the DENV-4 virus in Puerto Rico along with viral isolate count data and find similar cyclic patterns. We compare the population history of the HIV-1 CRF02_AG clade in Cameroon with HIV incidence and prevalence data and find that the effective population size is more reflective of incidence rate. Finally, we explore the hypothesis that the population dynamics of musk ox during the Late Quaternary period were related to climate change

    Landscape attributes governing local transmission of an endemic zoonosis: rabies virus in domestic dogs

    Get PDF
    Landscape heterogeneity plays an important role in disease spread and persistence, but quantifying landscape influences and their scale dependence is challenging. Studies have focused on how environmental features or global transport networks influence pathogen invasion and spread, but their influence on local transmission dynamics that underpin the persistence of endemic diseases remains unexplored. Bayesian phylogeographic frameworks that incorporate spatial heterogeneities are promising tools for analysing linked epidemiological, environmental and genetic data. Here, we extend these methodological approaches to decipher the relative contribu- tion and scale-dependent effects of landscape influences on the transmission of endemic rabies virus in Serengeti district, Tanzania (area ~4,900 km2). Utilizing detailed epidemiological data and 152 complete viral genomes collected between 2004 and 2013, we show that the localized presence of dogs but not their density is the most important determinant of diffusion, implying that culling will be ineffec- tive for rabies control. Rivers and roads acted as barriers and facilitators to viral spread, respectively, and vaccination impeded diffusion despite variable annual cov- erage. Notably, we found that landscape effects were scale-dependent: rivers were barriers and roads facilitators on larger scales, whereas the distribution of dogs was important for rabies dispersal across multiple scales. This nuanced understanding of the spatial processes that underpin rabies transmission can be exploited for targeted control at the scale where it will have the greatest impact. Moreover, this research demonstrates how current phylogeographic frameworks can be adapted to improve our understanding of endemic disease dynamics at different spatial scales

    Calibration of quasi-isotropic parallel kinematic Machines: Orthoglide

    Get PDF
    International audienceThe paper proposes a novel approach for the geometrical model calibration of quasi-isotropic parallel kinematic mechanisms of the Orthoglide family. It is based on the observations of the manipulator leg parallelism during motions between the specific test postures and employs a low-cost measuring system composed of standard comparator indicators attached to the universal magnetic stands. They are sequentially used for measuring the deviation of the relevant leg location while the manipulator moves the TCP along the Cartesian axes. Using the measured differences, the developed algorithm estimates the joint offsets and the leg lengths that are treated as the most essential parameters. Validity of the proposed calibration technique is confirmed by the experimental results

    Agathe Euzen, Catherine Jeandel et Rémy Mosseri (dir.), 2015, L’eau à découvert, Paris, CNRS Éditions, 368 pages.

    Get PDF
    Dans sa contribution à L’eau à découvert (« Comment communiquer et sensibiliser le grand public sur les enjeux liés à l’eau ? »), l’hydrologue Bernard Chocat attire l’attention sur le fait que « […] près de 50% des Français pensent que l’eau distribuée dans le réseau d’eau potable est fabriquée en recyclant les eaux usées ». Voilà qui pourrait justifier l’intérêt d’un ouvrage de synthèse abordant de manière rigoureuse les différentes facettes de l’objet « eau », le tout avec un souci d’access..

    Learning to Extract Protein-Protein Interactions using Distant Supervision

    Get PDF
    Most relation extraction methods, especially in the domain of biology, rely on machine learning methods to classify a cooccurring pair of entities in a sentence to be related or not. Such an approach requires a training corpus, which involves expert annotation and is tedious, time-consuming, and expensive. We overcome this problem by the use of existing knowledge in structured databases to automatically generate a training corpus for protein-protein interactions. An extensive evaluation of different instance selection strategies is performed to maximize robustness on this presumably noisy resource. Successful strategies to consistently improve performance include a majority voting ensemble of classifiers trained on subsets of the training corpus and the use of knowledge bases consisting of proven non-interactions. Our best configured model built without manually annotated data shows very competitive results on several publicly available benchmark corpor
    • …
    corecore