626,998 research outputs found
Entity Identification Problem in Big and Open Data
Big and Open Data provide great opportunities to businesses to enhance their competitive advantages if
utilized properly. However, during past few years’ research in Big and Open Data process, we have
encountered big challenge in entity identification reconciliation, when trying to establish accurate
relationships between entities from different data sources. In this paper, we present our innovative Intelligent
Reconciliation Platform and Virtual Graphs solution that addresses this issue. With this solution, we are able
to efficiently extract Big and Open Data from heterogeneous source, and integrate them into a common
analysable format. Further enhanced with the Virtual Graphs technology, entity identification reconciliation
is processed dynamically to produce more accurate result at system runtime. Moreover, we believe that our
technology can be applied to a wide diversity of entity identification problems in several domains, e.g., e-
Health, cultural heritage, and company identities in financial world.Ministerio de Ciencia e InnovaciĂłn TIN2013-46928-C3-3-
Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records
Unstructured information in electronic health records provide an invaluable
resource for medical research. To protect the confidentiality of patients and
to conform to privacy regulations, de-identification methods automatically
remove personally identifying information from these medical records. However,
due to the unavailability of labeled data, most existing research is
constrained to English medical text and little is known about the
generalizability of de-identification methods across languages and domains. In
this study, we construct a varied dataset consisting of the medical records of
1260 patients by sampling data from 9 institutes and three domains of Dutch
healthcare. We test the generalizability of three de-identification methods
across languages and domains. Our experiments show that an existing rule-based
method specifically developed for the Dutch language fails to generalize to
this new data. Furthermore, a state-of-the-art neural architecture performs
strongly across languages and domains, even with limited training data.
Compared to feature-based and rule-based methods the neural method requires
significantly less configuration effort and domain-knowledge. We make all code
and pre-trained de-identification models available to the research community,
allowing practitioners to apply them to their datasets and to enable future
benchmarks.Comment: Proceedings of the 1st ACM WSDM Health Search and Data Mining
Workshop (HSDM2020), 202
Protein Functional Families to characterise drug-target interactions.
The quest for “magic bullets” has been the driving force in drug discovery during the last two decades. However, the increasing rate of drug failure over this period has occurred concurrently with the assumption that a drug is a selective ligand for a single target. It now seems likely that polypharmacology is the rule rather than the exception [1].
Our previous research shows that protein domains are a good proxy for drug targets, and that drug polypharmacology emerges as a consequence of the multi-domain composition of proteins [2]. In this study, we investigate further the idea that the domain is the druggable entity within a protein target. We have identified a specific class of domains (CATH Functional Families) as the best currently available for identifying drug-target interactions. We show how this opens a new direction in target identification with potential application in drug repurposing.1. Hopkins, AL. (2008) Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol; 4: 682
2. Moya-GarcĂa AA & Ranea JAG (2013) Insights into polypharmacology from drug-domain associations. Bioinformatics 29: 1934–1937)Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech. Universidad de Granad
A geometric inverse problem for the Boussinesq system
In this work we present some results for the inverse problem of the identification of a single rigid body immersed in a fluid governed by the
stationary Boussinesq equations. First, we establish a uniqueness result. Then, we show the way the observation depends on perturbations of the rigid body and we deduce some consequences. Finally, we present a new method for the partial identification of the body assuming that it can be deformed only through fields that, in some sense, are finite dimensional. In the proofs, we use various techniques, related to Carleman estimates, differentiation with respect to domains, data assimilation and controllability of PDEs.DirecciĂłn General de Enseñanza SuperiorComisiĂłn Nacional de InvestigaciĂłn CientĂfica y TecnolĂłgica (Chile)Fondo Nacional de Desarrollo CientĂfico y TecnolĂłgico (Chile
De novo variants disturbing the transactivation capacity of POU3F3 cause a characteristic neurodevelopmental disorder
POU3F3, also referred to as Brain-1, is a well-known transcription factor involved in the development of the central nervous system, but it has not previously been associated with a neurodevelopmental disorder. Here, we report the identification of 19 individuals with heterozygous POU3F3 disruptions, most of which are de novo variants. All individuals had developmental delays and/or intellectual disability and impairments in speech and language skills. Thirteen individuals had characteristic low-set, prominent, and/or cupped ears. Brain abnormalities were observed in seven of eleven MRI reports. POU3F3 is an intronless gene, insensitive to nonsense-mediated decay, and 13 individuals carried protein-truncating variants. All truncating variants that we tested in cellular models led to aberrant subcellular localization of the encoded protein. Luciferase assays demonstrated negative effects of these alleles on transcriptional activation of a reporter with a FOXP2-derived binding motif. In addition to the loss-of-function variants, five individuals had missense variants that clustered at specific positions within the functional domains, and one small in-frame deletion was identified. Two missense variants showed reduced transactivation capacity in our assays, whereas one variant displayed gain-of-function effects, suggesting a distinct pathophysiological mechanism. In bioluminescence resonance energy transfer (BRET) interaction assays, all the truncated POU3F3 versions that we tested had significantly impaired dimerization capacities, whereas all missense variants showed unaffected dimerization with wild-type POU3F3. Taken together, our identification and functional cell-based analyses of pathogenic variants in POU3F3, coupled with a clinical characterization, implicate disruptions of this gene in a characteristic neurodevelopmental disorder
Uniqueness and partial identification in a geometric inverse problem for the Boussinesq system
We analyze the inverse problem of the identification of a rigid body immersed in a fluid governed by the stationary
Boussinesq system. First, we establish a uniqueness result. Then, we present a new method for the partial
identification of the body. The proofs use local Carleman estimates, differentiation with respect to domains, data
assimilation techniques and controllability results for PDEs.Sur l’unicitĂ© et l’identification partielle d’un problème inverse gĂ©omĂ©trique pour le système de Boussinesq. On analyse le problème inverse de l’identification d’un corps rigide dans un fluide rĂ©gi par le système stationnaire de Boussinesq. On Ă©tablit d’abord un rĂ©sultat d’unicitĂ©. Ensuite on prĂ©sente une nouvelle mĂ©thode pour l’identification partielle du corps. Les preuves utilisent des estimations locales de Carleman, la diffĂ©rentiation par rapport au domaine, des techniques d’assimilation de donnĂ©es et des rĂ©sultats de contrĂ´labilitĂ© des EDPs.DirecciĂłn General de Enseñanza SuperiorFondo Nacional de Desarrollo CientĂfico y TecnolĂłgico (ComisiĂłn Nacional de InvestigaciĂłn CientĂfica y TecnolĂłgica) (Chile
A machine learning based framework to identify and classify long terminal repeat retrotransposons
Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-LEARNER, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: REPEATMASKER, CENSOR and LTRDIGEST. In contrast to these methods, TE-LEARNER is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance , while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-LEARNER'S predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE
- …