Search CORE

1,760 research outputs found

Information retrieval and text mining technologies for chemistry

Author: Abacha A. B.
Alberts D.
Alfonso Valencia
American Chemical Society
Anália Lourenço
Aphinyanaphongs Y.
Appelt D. E.
Aramaki E.
Aronson A. R.
Asahara M.
Babych B.
Baeza-Yates R.
Bambenek J.
Barnard J. M.
Bast H.
Batista-Navarro R.
Batista-Navarro R. T.
Bian J.
Bies A.
Bikel D. M.
Blaschke C.
Brecher J. S.
Brill E.
Bunescu R.
Bunescu R. C.
Califf M. E.
Carpenter B.
Caruana R.
Chee B. W.
Chhieng D.
Chinchor N.
Chiticariu L.
Chowdhury M. F. M.
Chowdhury M. F. M.
Ciravegna F.
Cleverdon C. W.
Coden A.
Cohen R.
Collier N.
Corbett P.
Corbett P.
Cover T. M.
Craven M.
Cummings M. D.
Currano J. N.
Currano J. N.
Currano J. N.
Currano J. N.
Cutting D. R.
Davis C. H.
Dieb T. M.
Dieb T. M.
Dogan R. I.
Downs G. M.
Dunikowski L. G.
Embarek M.
Eom J.-H.
Faber J.
Fall C. J.
Fattore M.
Fennell R. W.
Freund Y.
Fujiyoshi A.
Fukuda K.
Gale W. A.
Garcelon N.
Garnier J.-P.
Garten Y.
Ginn R.
Giuliano C.
Gold S.
Grefenstette G.
Grishman R.
Gurulingappa H.
Gurulingappa H.
Gusfield D.
He Y.
Hearst M. A.
Hersh W.
Hersh W.
Hirschman L.
Hobbs J. R.
Hodge G. M.
Holzinger A.
Hsueh P.-Y.
Huber T.
Iyer S. V
Jackson P.
Joachims T.
Johnson D.
Jonnalagadda S.
Jonnalagadda S.
Julen Oyarzabal
Jurafsky D.
Kaewphan S.
Kaewphan S.
Karkaletsis V.
Katragadda S.
Kazama J.
Kazawa H.
Kelly L.
Kenny P. W.
Kim J.-D.
Kim Y.
Kleene S. C.
Kolárik C.
Kongburan W.
Kornai A.
Kraaij W.
Krallinger M.
Krallinger M.
Krallinger M.
Kremer G.
Kreuzthaler M.
Kucera H.
Lai H.
Lawson A. J.
Leaman R.
Leaman R.
Lee C.-H.
Levenshtein V. I.
Levin M. A.
Li J.
Li N.
Li Y.
Liu X.
Locke W. N.
Lovins J. B.
Lowe D. M.
Lupu M.
Lupu M.
Mackenzie C. E.
Manning C. D.
Mansouri A.
Martin E.
Martin Krallinger
Mattmann C.
Maynard D.
McCallum A.
McEwen L.
McKnight L.
McNaught A.
Meystre S. M.
Michalski S. R.
Michie D.
Mihalcea R.
Mitton R.
Miwa M.
Mollá D.
Murray-Rust P.
Müller B.
Nebel A.
Nikfarjam A.
Névéol A.
Névéol A.
Obdulia Rabal
Pang B.
Panico R.
Perez-Iratxeta C.
Ponomareva N.
Ratinov L.
Ratnaparkhi A.
Read J.
Rebholz-Schuhmann D.
Reeker L. H.
Rocchio J. J.
Rohbeck H.-G.
Rosario B.
Roth D. L.
Rupp C. J.
Rupp C. J.
Sagae K.
Salim N.
Salton G.
Sanchez-Cisneros D.
Saracevic T.
Sasaki Y.
Schapire R. E.
Schenck R.
Schenck R. J.
Schlaf A.
Schuemie M. J.
Segura Bedmar I.
Segura-Bedmar I.
Sekine S.
Sequeira E.
Settles B.
Settles B.
Sewell W.
Shen D.
Shidha M. V
Singhal A.
Smith E. G.
Stamatatos E.
Sutton C.
Sætre R.
Taylor K. T.
Tharatipyakul A.
Tomanek K.
Tomanek K.
Tsuruoka Y.
Tsuruoka Y.
Täger W.
Urbain J.
van Rijsbergen C. J.
Vapnik V. N.
Vasserman A.
Visweswaran S.
Voorhees E. M.
Wang W.
Wang Y.
Wei C.-H.
Wei C.-H.
Wermter J.
Wilbur W. J.
Willett P.
Willett P.
Williams A. J.
Witten I. H.
Workman M. L.
Wrublewski D. T.
Xu R.
Xue N.
Yan S.
Yang C.
Yang C. C.
Yang Y.
Zass E.
Zipf G. K.
Zipf G. K.
Zitnik S.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Computer analysis of chemical reaction information for storage and retrieval.

Author: Willett Peter
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/01/1978
Field of study

White Rose E-theses Online

Information Extraction from Text for Improving Research on Small Molecules and Histone Modifications

Author: Klein Corinna
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

The cumulative number of publications, in particular in the life sciences, requires efficient methods for the automated extraction of information and semantic information retrieval. The recognition and identification of information-carrying units in text – concept denominations and named entities – relevant to a certain domain is a fundamental step. The focus of this thesis lies on the recognition of chemical entities and the new biological named entity type histone modifications, which are both important in the field of drug discovery. As the emergence of new research fields as well as the discovery and generation of novel entities goes along with the coinage of new terms, the perpetual adaptation of respective named entity recognition approaches to new domains is an important step for information extraction. Two methodologies have been investigated in this concern: the state-of-the-art machine learning method, Conditional Random Fields (CRF), and an approximate string search method based on dictionaries. Recognition methods that rely on dictionaries are strongly dependent on the availability of entity terminology collections as well as on its quality. In the case of chemical entities the terminology is distributed over more than 7 publicly available data sources. The join of entries and accompanied terminology from selected resources enables the generation of a new dictionary comprising chemical named entities. Combined with the automatic processing of respective terminology – the dictionary curation – the recognition performance reached an F1 measure of 0.54. That is an improvement by 29 % in comparison to the raw dictionary. The highest recall was achieved for the class of TRIVIAL-names with 0.79. The recognition and identification of chemical named entities provides a prerequisite for the extraction of related pharmacological relevant information from literature data. Therefore, lexico-syntactic patterns were defined that support the automated extraction of hypernymic phrases comprising pharmacological function terminology related to chemical compounds. It was shown that 29-50 % of the automatically extracted terms can be proposed for novel functional annotation of chemical entities provided by the reference database DrugBank. Furthermore, they are a basis for building up concept hierarchies and ontologies or for extending existing ones. Successively, the pharmacological function and biological activity concepts obtained from text were included into a novel descriptor for chemical compounds. Its successful application for the prediction of pharmacological function of molecules and the extension of chemical classification schemes, such as the the Anatomical Therapeutic Chemical (ATC), is demonstrated. In contrast to chemical entities, no comprehensive terminology resource has been available for histone modifications. Thus, histone modification concept terminology was primary recognized in text via CRFs with a F1 measure of 0.86. Subsequent, linguistic variants of extracted histone modification terms were mapped to standard representations that were organized into a newly assembled histone modification hierarchy. The mapping was accomplished by a novel developed term mapping approach described in the thesis. The combination of term recognition and term variant resolution builds up a new procedure for the assembly of novel terminology collections. It supports the generation of a term list that is applicable in dictionary-based methods. For the recognition of histone modification in text it could be shown that the named entity recognition method based on dictionaries is superior to the used machine learning approach. In conclusion, the present thesis provides techniques which enable an enhanced utilization of textual data, hence, supporting research in epigenomics and drug discovery

bonndoc – Der Publikationsserver der Universität Bonn

Recommended from our members

Cheminformatics for genome-scale metabolic reconstructions

Author: May John W.
Publication venue: University of Cambridge
Publication date: 06/01/2015
Field of study

Genome-scale metabolic reconstructions are an important resource in the study of metabolism. They provide both a system and component level view of the biochemical transformations of metabolites. As more reconstructions have been created it remains a challenge to integrate and reason about their contents. This thesis focuses on the development of computational methods to allow on-demand comparison and alignment of metabolic reconstructions. A novel method is introduced that utilises chemical structure representations to identify equivalent metabolites between reconstructions. Using a graph theoretic representation allows the identification and reasoning of metabolites that have a non-exact match. A key advantage is that the method uses the contents of reconstructions directly and does not rely on the creation or use of a common reference. To annotate reconstructions with chemical structure representations an interactive desktop application is introduced. The application assists in the creation and curation of metabolic information using manual, semi-auto\-mated, and automated methods. Chemical structure representations can be retrieved, drawn, or generated to allow precise metabolite annotation. In processing chemical information, efficient and optimised algorithms are required. Several areas are addressed and implementations have been contributed to the Chemistry Development Kit. Rings are a fundamental property of chemical structures therefore multiple ring definitions and fast algorithms are explored. Conversion and standardisation between structure representations present a challenge. Efficient algorithms to determine aromaticity, assign a Kekulé form, and generate tautomers are detailed. Many enzymes are selective and specific to stereochemistry. Methods for the identification, depiction, comparison, and description of stereochemistry are described.The project was funded by Unilever, the Biotechnology and Biological Sciences Research Council [BB/I532153/1], and the European Molecular Biology Laboratory

Apollo (Cambridge)

Similarity Methods in Chemoinformatics

Author: A-Razzak
Adamson
Adamson
Agrafiotis
Agrafiotis
Agrafiotis
Agrafiotis
Ajay Walters
Allen
Attias
Baber
Bajorath
Ballester
Ballester
Barker
Barker
Barnard
Barnard
Barton
Bawden
Bayley
Beitzel
Belkin
Ben-Dor
Bender
Bender
Berks
Berman
Blair
Boecker
Bohl
Bohl
Bostrom
Boyd
Breiman
Bremser
Briem
Brint
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Bunin
Burbridge
Butina
Byvatov
Böhm
Böhm
Cannon
Capelli
Carbó
Carhart
Charifsen
Cheeseright
Chen
Chen
Chen
Chen
Chen
Chen
Cheng
Christianini
Clark
Clark
Clark
Clark
Clark
Clark
Clark
Cleves
Cole
Coles
Congreve
Corey
Corey
Cornell
Cosgrove
Cramer
Cramer
Cramer
Cramer
Cramer
Cramer
Crandell
Croft
Cruciani
Cuissart
Dalby
Danziger
Davis
DesJarlais
Diestel
DiMasi
Dittmar
Dixon
Dixon
Dixon
Dixon
Doman
Doweyko
Downie
Downs
Downs
Downs
Eckert
Eckert
Edgar
Egan
El-Hamdouchi
Engels
Erickson
Estrada
Everitt
Ewing
Ewing
Feher
Feldman
Fetchner
Fisanick
Fligner
Flower
Free
Freeland
Friesner
Frimurer
Gasteiger
Gedeck
Gillet
Gillet
Gillet
Gillet
Gillet
Gillet
Gillet
Gillet
Ginn
Ginn
Glen
Godden
Godden
Godden
Godden
Goldman
Good
Good
Good
Good
Good
Gorse
Graf
Grant
Gray
Greco
Green
Griffiths
Gund
Gund
Hagadone
Haigh
Hall
Hann
Hann
Hansch
Hansch
Hansch
Hansch
Harper
Harper
Hassan
Hassan
Hawkins
Hawkins
Hawkins
He
Hert
Hert
Hert
Hert
Hertzberg
Hessler
Hiller
Hinchcliffe
Holliday
Holliday
Holliday
Holliday
Hsu
Huang
Hudson
Hurst
Hyland
Jakes
Jakes
Jarvis
Jones
Jorissen
Kauvar
Kearsley
Keiser
Kelley
Kier
Klein
Klein
Kogej
Kubinyi
Kubinyi
Kubinyi
Kuntz
Kurogi
Lajiness
Langridge
Leach
Leach
Leach
Lee
Leeson
Leiter
Lemmen
Lengauer
Lesk
Lewis
Lind
Lindsay
Lipinski
Lipinski
Lipscomb
Loftus
Lombardino
Longley
Low
Lynch
Lynch
Lynch
Lyne
Maggiora
Mahe
Maizel
Makara
Maldonado
Marshall
Martin
Martin
Martin
Martin
Martin
Mason
Mason
Matter
Medina-Franco
Mestres
Mestres
Mestres
Monge
Moock
Moock
Moon
Morgan
Muller
Munk
Murrall
Murtagh
Ng
Nikolova
Nishibata
Nübling
Oda
Onodera
Oprea
Oprea
Oprea
Oprea
Ott
Paolini
Paris
Patterson
Pearlman
Pearlman
Pearlman
Perekhodtsev
Pickett
Prathipati
Pretsch
Proudfoot
Raha
Rarey
Rarey
Rarey
Rasmussen
Ray
Raymond
Raymond
Raymond
Raymond
Raymond
Raymond
Robertson
Rogers
Rush
Rush
Rusinko
Rössler
Sadowski
Saeh
Salim
Salton
Sasaki
Schneider
Schneider
Schneider
Schofield
Schreyer
Schuffenhauer
Schuffenhauer
Schuffenhauer
Schuffenhauer
Shanmugasundaram
Shelley
Shemetulskis
Shenton
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Sheridan
Shively
Sirois
Smeaton
Snarey
Sneath
Spärck Jones
Spärck Jones
Stahl
Stahura
Steinbach
Steindl
Stiefl
Stiefl
Sultan
Sussenguth
Svetnik
Takahashi
Tate
Taylor
Teague
Terrett
Thorner
Thorner
Todeschini
Tong
Tong
Triballeau
Truchon
Tversky
Ullmann
van de Waterbeemd
van de Waterbeemd
van Rijsbergen
Veber
Verdonk
Verheij
Vieth
Vleduts
Wagener
Waldman
Walters
Wang
Wang
Ward
Warmuth
Warr
Warren
Weininger
Weisgerber
Whittle
Whittle
Whittle
Wild
Wild
Wild
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Willett
Williams
Wilson
Wilton
Wipke
Wipke
Worboys
Xia
Xue
Yang
Yin
Yu
Zernov
Zhang
Zupan
Publication venue: 'Wiley'
Publication date: 01/01/2009
Field of study

promoting access to White Rose research paper

CiteSeerX

Crossref

White Rose Research Online

Kinetic model construction using chemoinformatics

Author: Vandewiele Nick
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2014
Field of study

Kinetic models of chemical processes not only provide an alternative to costly experiments; they also have the potential to accelerate the pace of innovation in developing new chemical processes or in improving existing ones. Kinetic models are most powerful when they reflect the underlying chemistry by incorporating elementary pathways between individual molecules. The downside of this high level of detail is that the complexity and size of the models also steadily increase, such that the models eventually become too difficult to be manually constructed. Instead, computers are programmed to automate the construction of these models, and make use of graph theory to translate chemical entities such as molecules and reactions into computer-understandable representations. This work studies the use of automated methods to construct kinetic models. More particularly, the need to account for the three-dimensional arrangement of atoms in molecules and reactions of kinetic models is investigated and illustrated by two case studies. First of all, the thermal rearrangement of two monoterpenoids, cis- and trans-2-pinanol, is studied. A kinetic model that accounts for the differences in reactivity and selectivity of both pinanol diastereomers is proposed. Secondly, a kinetic model for the pyrolysis of the fuel “JP-10” is constructed and highlights the use of state-of-the-art techniques for the automated estimation of thermochemistry of polycyclic molecules. A new code is developed for the automated construction of kinetic models and takes advantage of the advances made in the field of chemo-informatics to tackle fundamental issues of previous approaches. Novel algorithms are developed for three important aspects of automated construction of kinetic models: the estimation of symmetry of molecules and reactions, the incorporation of stereochemistry in kinetic models, and the estimation of thermochemical and kinetic data using scalable structure-property methods. Finally, the application of the code is illustrated by the automated construction of a kinetic model for alkylsulfide pyrolysis

Ghent University Academic Bibliography

A treatment of stereochemistry in computer aided organic synthesis

Author: Cook Anthony Peter Fendick
Publication venue: University of Leeds
Publication date: 01/01/2015
Field of study

This thesis describes the author’s contributions to a new stereochemical processing module constructed for the ARChem retrosynthesis program. The purpose of the module is to add the ability to perform enantioselective and diastereoselective retrosynthetic disconnections and generate appropriate precursor molecules. The module uses evidence based rules generated from a large database of literature reactions. Chapter 1 provides an introduction and critical review of the published body of work for computer aided synthesis design. The role of computer perception of key structural features (rings, functions groups etc.) and the construction and use of reaction transforms for generating precursors is discussed. Emphasis is also given to the application of strategies in retrosynthetic analysis. The availability of large reaction databases has enabled a new generation of retrosynthesis design programs to be developed that use automatically generated transforms assembled from published reactions. A brief description of the transform generation method employed by ARChem is given. Chapter 2 describes the algorithms devised by the author for handling the computer recognition and representation of the stereochemical features found in molecule and reaction scheme diagrams. The approach is generalised and uses flexible recognition patterns to transform information found in chemical diagrams into concise stereo descriptors for computer processing. An algorithm for efficiently comparing and classifying pairs of stereo descriptors is described. This algorithm is central for solving the stereochemical constraints in a variety of substructure matching problems addressed in chapter 3. The concise representation of reactions and transform rules as hyperstructure graphs is described. Chapter 3 is concerned with the efficient and reliable detection of stereochemical symmetry in both molecules, reactions and rules. A novel symmetry perception algorithm, based on a constraints satisfaction problem (CSP) solver, is described. The use of a CSP solver to implement an isomorph‐free matching algorithm for stereochemical substructure matching is detailed. The prime function of this algorithm is to seek out unique retron locations in target molecules and then to generate precursor molecules without duplications due to symmetry. Novel algorithms for classifying asymmetric, pseudo‐asymmetric and symmetric stereocentres; meso, centro, and C2 symmetric molecules; and the stereotopicity of trigonal (sp2) centres are described. Chapter 4 introduces and formalises the annotated structural language used to create both retrosynthetic rules and the patterns used for functional group recognition. A novel functional group recognition package is described along with its use to detect important electronic features such as electron‐withdrawing or donating groups and leaving groups. The functional groups and electronic features are used as constraints in retron rules to improve transform relevance. Chapter 5 details the approach taken to design detailed stereoselective and substrate controlled transforms from organised hierarchies of rules. The rules employ a rich set of constraints annotations that concisely describe the keying retrons. The application of the transforms for collating evidence based scoring parameters from published reaction examples is described. A survey of available reaction databases and the techniques for mining stereoselective reactions is demonstrated. A data mining tool was developed for finding the best reputable stereoselective reaction types for coding as transforms. For various reasons it was not possible during the research period to fully integrate this work with the ARChem program. Instead, Chapter 6 introduces a novel one‐step retrosynthesis module to test the developed transforms. The retrosynthesis algorithms use the organisation of the transform rule hierarchy to efficiently locate the best retron matches using all applicable stereoselective transforms. This module was tested using a small set of selected target molecules and the generated routes were ranked using a series of measured parameters including: stereocentre clearance and bond cleavage; example reputation; estimated stereoselectivity with reliability; and evidence of tolerated functional groups. In addition a method for detecting regioselectivity issues is presented. This work presents a number of algorithms using common set and graph theory operations and notations. Appendix A lists the set theory symbols and meanings. Appendix B summarises and defines the common graph theory terminology used throughout this thesis

White Rose E-theses Online

Recommended from our members

Chemical Information Bulletin

Author: American Chemical Society. Division of Chemical Information.
Vogel Teri M.
Publication venue: American Chemical Society. Division of Chemical Information.
Publication date: 01/11/2020
Field of study

Periodic supplement for "the regular journals of the American Chemical Society," containing annotated bibliographies of chemical documentation literature as well as information about meetings, conferences, awards, scholarships, and other news from the American Chemical Society (ACS) Division of Chemical Literature

UNT Digital Library