626,998 research outputs found

    Entity Identification Problem in Big and Open Data

    Get PDF
    Big and Open Data provide great opportunities to businesses to enhance their competitive advantages if utilized properly. However, during past few years’ research in Big and Open Data process, we have encountered big challenge in entity identification reconciliation, when trying to establish accurate relationships between entities from different data sources. In this paper, we present our innovative Intelligent Reconciliation Platform and Virtual Graphs solution that addresses this issue. With this solution, we are able to efficiently extract Big and Open Data from heterogeneous source, and integrate them into a common analysable format. Further enhanced with the Virtual Graphs technology, entity identification reconciliation is processed dynamically to produce more accurate result at system runtime. Moreover, we believe that our technology can be applied to a wide diversity of entity identification problems in several domains, e.g., e- Health, cultural heritage, and company identities in financial world.Ministerio de Ciencia e Innovación TIN2013-46928-C3-3-

    Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records

    Get PDF
    Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.Comment: Proceedings of the 1st ACM WSDM Health Search and Data Mining Workshop (HSDM2020), 202

    Protein Functional Families to characterise drug-target interactions.

    Get PDF
    The quest for “magic bullets” has been the driving force in drug discovery during the last two decades. However, the increasing rate of drug failure over this period has occurred concurrently with the assumption that a drug is a selective ligand for a single target. It now seems likely that polypharmacology is the rule rather than the exception [1]. Our previous research shows that protein domains are a good proxy for drug targets, and that drug polypharmacology emerges as a consequence of the multi-domain composition of proteins [2]. In this study, we investigate further the idea that the domain is the druggable entity within a protein target. We have identified a specific class of domains (CATH Functional Families) as the best currently available for identifying drug-target interactions. We show how this opens a new direction in target identification with potential application in drug repurposing.1. Hopkins, AL. (2008) Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol; 4: 682 2. Moya-García AA & Ranea JAG (2013) Insights into polypharmacology from drug-domain associations. Bioinformatics 29: 1934–1937)Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Universidad de Granad

    A geometric inverse problem for the Boussinesq system

    Get PDF
    In this work we present some results for the inverse problem of the identification of a single rigid body immersed in a fluid governed by the stationary Boussinesq equations. First, we establish a uniqueness result. Then, we show the way the observation depends on perturbations of the rigid body and we deduce some consequences. Finally, we present a new method for the partial identification of the body assuming that it can be deformed only through fields that, in some sense, are finite dimensional. In the proofs, we use various techniques, related to Carleman estimates, differentiation with respect to domains, data assimilation and controllability of PDEs.Dirección General de Enseñanza SuperiorComisión Nacional de Investigación Científica y Tecnológica (Chile)Fondo Nacional de Desarrollo Científico y Tecnológico (Chile

    De novo variants disturbing the transactivation capacity of POU3F3 cause a characteristic neurodevelopmental disorder

    No full text
    POU3F3, also referred to as Brain-1, is a well-known transcription factor involved in the development of the central nervous system, but it has not previously been associated with a neurodevelopmental disorder. Here, we report the identification of 19 individuals with heterozygous POU3F3 disruptions, most of which are de novo variants. All individuals had developmental delays and/or intellectual disability and impairments in speech and language skills. Thirteen individuals had characteristic low-set, prominent, and/or cupped ears. Brain abnormalities were observed in seven of eleven MRI reports. POU3F3 is an intronless gene, insensitive to nonsense-mediated decay, and 13 individuals carried protein-truncating variants. All truncating variants that we tested in cellular models led to aberrant subcellular localization of the encoded protein. Luciferase assays demonstrated negative effects of these alleles on transcriptional activation of a reporter with a FOXP2-derived binding motif. In addition to the loss-of-function variants, five individuals had missense variants that clustered at specific positions within the functional domains, and one small in-frame deletion was identified. Two missense variants showed reduced transactivation capacity in our assays, whereas one variant displayed gain-of-function effects, suggesting a distinct pathophysiological mechanism. In bioluminescence resonance energy transfer (BRET) interaction assays, all the truncated POU3F3 versions that we tested had significantly impaired dimerization capacities, whereas all missense variants showed unaffected dimerization with wild-type POU3F3. Taken together, our identification and functional cell-based analyses of pathogenic variants in POU3F3, coupled with a clinical characterization, implicate disruptions of this gene in a characteristic neurodevelopmental disorder

    Uniqueness and partial identification in a geometric inverse problem for the Boussinesq system

    Get PDF
    We analyze the inverse problem of the identification of a rigid body immersed in a fluid governed by the stationary Boussinesq system. First, we establish a uniqueness result. Then, we present a new method for the partial identification of the body. The proofs use local Carleman estimates, differentiation with respect to domains, data assimilation techniques and controllability results for PDEs.Sur l’unicité et l’identification partielle d’un problème inverse géométrique pour le système de Boussinesq. On analyse le problème inverse de l’identification d’un corps rigide dans un fluide régi par le système stationnaire de Boussinesq. On établit d’abord un résultat d’unicité. Ensuite on présente une nouvelle méthode pour l’identification partielle du corps. Les preuves utilisent des estimations locales de Carleman, la différentiation par rapport au domaine, des techniques d’assimilation de données et des résultats de contrôlabilité des EDPs.Dirección General de Enseñanza SuperiorFondo Nacional de Desarrollo Científico y Tecnológico (Comisión Nacional de Investigación Científica y Tecnológica) (Chile

    A machine learning based framework to identify and classify long terminal repeat retrotransposons

    Get PDF
    Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-LEARNER, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: REPEATMASKER, CENSOR and LTRDIGEST. In contrast to these methods, TE-LEARNER is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance , while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-LEARNER'S predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE
    • …
    corecore