14 research outputs found

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    The Beni-Ilmane (Algeria) seismic sequence of May 2010: Seismic sources and stress tensor calculations,

    No full text
    International audienceA moderate earthquake with a moment magnitude of Mw 5.5 struck the Sub-Bibanique region of eastern Algeria on 14 May 2010, killing three people, injuring hundreds of others, and causing moderate damages in the epicentral area, mainly in the villages of Beni-Ilmane and Samma. The focal mechanism of the seismic source for the first shock, obtained by near-field waveform modelling, exhibits left-lateral strike-slip faulting with the first nodal plane oriented at N345°, and right-lateral strike-slip faulting with the second nodal plane oriented at N254°. A second earthquake that struck the region on 16 May 2010, with a moment magnitude of Mw 5.1, was located 9 km SW of the first earthquake. The focal mechanism obtained by waveform modelling showed reverse faulting with nodal planes oriented NE–SW (N25° and N250°). A third earthquake that struck the region on 23 May 2010, with a moment magnitude of Mw 5.2, was located 7 km S of the first shock. The obtained focal mechanism showed a left-lateral strike-slip plane oriented at N12° and a right-lateral strike-slip plane oriented at N257°. Field investigations combined with geological and seismotectonic analyses indicate that the three earthquake shocks were generated by activity on three distinct faults. The second and third shocks were generated on faults oriented WSW–ENE and NNE–SSW, respectively. The regional stress tensor calculated in the region gives an orientation of N340° for the maximum compressive stress direction (σ1) which is close to the horizontal, with a stress shape factor indicating either a compressional or a strike-slip regime

    Task-Oriented Complex Ontology Alignment: Two Alignment Evaluation Sets

    Get PDF
    International audienceSimple ontology alignments, largely studied, link one entity of a source ontology to one entity of a target ontology. One of the limitations of these alignments is, however, their lack of expressiveness which can be overcome by complex alignments. Although different complex matching approaches have emerged in the literature, there is a lack of complex reference alignments on which these approaches can be systematically evaluated. This paper proposes two sets of complex alignments between 10 pairs of ontologies from the well-known OAEI conference simple alignment dataset. The methodology for creating the alignment sets is described and takes into account the use of the alignments for two tasks: ontology merging and query rewriting. The ontology merging alignment set contains 313 correspondences and the query rewriting one 431. We report an evaluation of state-of-the art complex matchers on the proposed alignment sets

    Leveraging Food and Drug Administration Adverse Event Reports for the Automated Monitoring of Electronic Health Records in a Pediatric Hospital

    No full text
    The objective of this study was to determine whether the Food and Drug Administration’s Adverse Event Reporting System (FAERS) data set could serve as the basis of automated electronic health record (EHR) monitoring for the adverse drug reaction (ADR) subset of adverse drug events. We retrospectively collected EHR entries for 71 909 pediatric inpatient visits at Cincinnati Children’s Hospital Medical Center. Natural language processing (NLP) techniques were used to identify positive diseases/disorders and signs/symptoms (DDSSs) from the patients’ clinical narratives. We downloaded all FAERS reports submitted by medical providers and extracted the reported drug-DDSS pairs. For each patient, we aligned the drug-DDSS pairs extracted from their clinical notes with the corresponding drug-DDSS pairs from the FAERS data set to identify Drug-Reaction Pair Sentences (DRPSs). The DRPSs were processed by NLP techniques to identify ADR-related DRPSs. We used clinician annotated, real-world EHR data as reference standard to evaluate the proposed algorithm. During evaluation, the algorithm achieved promising performance and showed great potential in identifying ADRs accurately for pediatric patients
    corecore