Search CORE

14 research outputs found

Information retrieval and text mining technologies for chemistry

Author: Abacha A. B.
Alberts D.
Alfonso Valencia
American Chemical Society
Anália Lourenço
Aphinyanaphongs Y.
Appelt D. E.
Aramaki E.
Aronson A. R.
Asahara M.
Babych B.
Baeza-Yates R.
Bambenek J.
Barnard J. M.
Bast H.
Batista-Navarro R.
Batista-Navarro R. T.
Bian J.
Bies A.
Bikel D. M.
Blaschke C.
Brecher J. S.
Brill E.
Bunescu R.
Bunescu R. C.
Califf M. E.
Carpenter B.
Caruana R.
Chee B. W.
Chhieng D.
Chinchor N.
Chiticariu L.
Chowdhury M. F. M.
Chowdhury M. F. M.
Ciravegna F.
Cleverdon C. W.
Coden A.
Cohen R.
Collier N.
Corbett P.
Corbett P.
Cover T. M.
Craven M.
Cummings M. D.
Currano J. N.
Currano J. N.
Currano J. N.
Currano J. N.
Cutting D. R.
Davis C. H.
Dieb T. M.
Dieb T. M.
Dogan R. I.
Downs G. M.
Dunikowski L. G.
Embarek M.
Eom J.-H.
Faber J.
Fall C. J.
Fattore M.
Fennell R. W.
Freund Y.
Fujiyoshi A.
Fukuda K.
Gale W. A.
Garcelon N.
Garnier J.-P.
Garten Y.
Ginn R.
Giuliano C.
Gold S.
Grefenstette G.
Grishman R.
Gurulingappa H.
Gurulingappa H.
Gusfield D.
He Y.
Hearst M. A.
Hersh W.
Hersh W.
Hirschman L.
Hobbs J. R.
Hodge G. M.
Holzinger A.
Hsueh P.-Y.
Huber T.
Iyer S. V
Jackson P.
Joachims T.
Johnson D.
Jonnalagadda S.
Jonnalagadda S.
Julen Oyarzabal
Jurafsky D.
Kaewphan S.
Kaewphan S.
Karkaletsis V.
Katragadda S.
Kazama J.
Kazawa H.
Kelly L.
Kenny P. W.
Kim J.-D.
Kim Y.
Kleene S. C.
Kolárik C.
Kongburan W.
Kornai A.
Kraaij W.
Krallinger M.
Krallinger M.
Krallinger M.
Kremer G.
Kreuzthaler M.
Kucera H.
Lai H.
Lawson A. J.
Leaman R.
Leaman R.
Lee C.-H.
Levenshtein V. I.
Levin M. A.
Li J.
Li N.
Li Y.
Liu X.
Locke W. N.
Lovins J. B.
Lowe D. M.
Lupu M.
Lupu M.
Mackenzie C. E.
Manning C. D.
Mansouri A.
Martin E.
Martin Krallinger
Mattmann C.
Maynard D.
McCallum A.
McEwen L.
McKnight L.
McNaught A.
Meystre S. M.
Michalski S. R.
Michie D.
Mihalcea R.
Mitton R.
Miwa M.
Mollá D.
Murray-Rust P.
Müller B.
Nebel A.
Nikfarjam A.
Névéol A.
Névéol A.
Obdulia Rabal
Pang B.
Panico R.
Perez-Iratxeta C.
Ponomareva N.
Ratinov L.
Ratnaparkhi A.
Read J.
Rebholz-Schuhmann D.
Reeker L. H.
Rocchio J. J.
Rohbeck H.-G.
Rosario B.
Roth D. L.
Rupp C. J.
Rupp C. J.
Sagae K.
Salim N.
Salton G.
Sanchez-Cisneros D.
Saracevic T.
Sasaki Y.
Schapire R. E.
Schenck R.
Schenck R. J.
Schlaf A.
Schuemie M. J.
Segura Bedmar I.
Segura-Bedmar I.
Sekine S.
Sequeira E.
Settles B.
Settles B.
Sewell W.
Shen D.
Shidha M. V
Singhal A.
Smith E. G.
Stamatatos E.
Sutton C.
Sætre R.
Taylor K. T.
Tharatipyakul A.
Tomanek K.
Tomanek K.
Tsuruoka Y.
Tsuruoka Y.
Täger W.
Urbain J.
van Rijsbergen C. J.
Vapnik V. N.
Vasserman A.
Visweswaran S.
Voorhees E. M.
Wang W.
Wang Y.
Wei C.-H.
Wei C.-H.
Wermter J.
Wilbur W. J.
Willett P.
Willett P.
Williams A. J.
Witten I. H.
Workman M. L.
Wrublewski D. T.
Xu R.
Xue N.
Yan S.
Yang C.
Yang C. C.
Yang Y.
Zass E.
Zipf G. K.
Zipf G. K.
Zitnik S.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Chemical Information Bulletin

Author: American Chemical Society. Division of Chemical Information.
Vogel Teri M.
Publication venue: American Chemical Society. Division of Chemical Information.
Publication date: 01/11/2020
Field of study

Periodic supplement for "the regular journals of the American Chemical Society," containing annotated bibliographies of chemical documentation literature as well as information about meetings, conferences, awards, scholarships, and other news from the American Chemical Society (ACS) Division of Chemical Literature

UNT Digital Library

Integrative Systems Approaches Towards Brain Pharmacology and Polypharmacology

Author: Shahid Mohammad
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Polypharmacology is considered as the future of drug discovery and emerges as the next paradigm of drug discovery. The traditional drug design is primarily based on a “one target-one drug” paradigm. In polypharmacology, drug molecules always interact with multiple targets, and therefore it imposes new challenges in developing and designing new and effective drugs that are less toxic by eliminating the unexpected drug-target interactions. Although still in its infancy, the use of polypharmacology ideas appears to already have a remarkable impact on modern drug development. The current thesis is a detailed study on various pharmacology approaches at systems level to understand polypharmacology in complex brain and neurodegnerative disorders. The research work in this thesis focuses on the design and construction of a dedicated knowledge base for human brain pharmacology. This pharmacology knowledge base, referred to as the Human Brain Pharmacome (HBP) is a unique and comprehensive resource that aggregates data and knowledge around current drug treatments that are available for major brain and neurodegenerative disorders. The HBP knowledge base provides data at a single place for building models and supporting hypotheses. The HBP also incorporates new data obtained from similarity computations over drugs and proteins structures, which was analyzed from various aspects including network pharmacology and application of in-silico computational methods for the discovery of novel multi-target drug candidates. Computational tools and machine learning models were developed to characterize protein targets for their polypharmacological profiles and to distinguish indications specific or target specific drugs from other drugs. Systems pharmacology approaches towards drug property predictions provided a highly enriched compound library that was virtually screened against an array of network pharmacology based derived protein targets by combined docking and molecular dynamics simulation workflows. The developed approaches in this work resulted in the identification of novel multi-target drug candidates that are backed up by existing experimental knowledge, and propose repositioning of existing drugs, that are undergoing further experimental validations

bonndoc – Der Publikationsserver der Universität Bonn

From Knowledgebases to Toxicity Prediction and Promiscuity Assessment

Author: Siramshetty Vishal Babu
Publication venue
Publication date: 01/01/2019
Field of study

Polypharmacology marked a paradigm shift in drug discovery from the traditional ‘one drug, one target’ approach to a multi-target perspective, indicating that highly effective drugs favorably modulate multiple biological targets. This ability of drugs to show activity towards many targets is referred to as promiscuity, an essential phenomenon that may as well lead to undesired side-effects. While activity at therapeutic targets provides desired biological response, toxicity often results from non-specific modulation of off-targets. Safety, efficacy and pharmacokinetics have been the primary concerns behind the failure of a majority of candidate drugs. Computer-based (in silico) models that can predict the pharmacological and toxicological profiles complement the ongoing efforts to lower the high attrition rates. High-confidence bioactivity data is a prerequisite for the development of robust in silico models. Additionally, data quality has been a key concern when integrating data from publicly-accessible bioactivity databases. A majority of the bioactivity data originates from high- throughput screening campaigns and medicinal chemistry literature. However, large numbers of screening hits are considered false-positives due to a number of reasons. In stark contrast, many compounds do not demonstrate biological activity despite being tested in hundreds of assays. This thesis work employs cheminformatics approaches to contribute to the aforementioned diverse, yet highly related, aspects that are crucial in rationalizing and expediting drug discovery. Knowledgebase resources of approved and withdrawn drugs were established and enriched with information integrated from multiple databases. These resources are not only useful in small molecule discovery and optimization, but also in the elucidation of mechanisms of action and off- target effects. In silico models were developed to predict the effects of small molecules on nuclear receptor and stress response pathways and human Ether-à-go-go-Related Gene encoded potassium channel. Chemical similarity and machine-learning based methods were evaluated while highlighting the challenges involved in the development of robust models using public domain bioactivity data. Furthermore, the true promiscuity of the potentially frequent hitter compounds was identified and their mechanisms of action were explored at the molecular level by investigating target-ligand complexes. Finally, the chemical and biological spaces of the extensively tested, yet inactive, compounds were investigated to reconfirm their potential to be promising candidates.Die Polypharmakologie beschreibt einen Paradigmenwechsel von "einem Wirkstoff - ein Zielmolekül" zu "einem Wirkstoff - viele Zielmoleküle" und zeigt zugleich auf, dass hochwirksame Medikamente nur durch die Interaktion mit mehreren Zielmolekülen Ihre komplette Wirkung entfalten können. Hierbei ist die biologische Aktivität eines Medikamentes direkt mit deren Nebenwirkungen assoziiert, was durch die Interaktion mit therapeutischen bzw. Off-Targets erklärt werden kann (Promiskuität). Ein Ungleichgewicht dieser Wechselwirkungen resultiert oftmals in mangelnder Wirksamkeit, Toxizität oder einer ungünstigen Pharmakokinetik, anhand dessen man das Scheitern mehrerer potentieller Wirkstoffe in ihrer präklinischen und klinischen Entwicklungsphase aufzeigen kann. Die frühzeitige Vorhersage des pharmakologischen und toxikologischen Profils durch computergestützte Modelle (in-silico) anhand der chemischen Struktur kann helfen den Prozess der Medikamentenentwicklung zu verbessern. Eine Voraussetzung für die erfolgreiche Vorhersage stellen zuverlässige Bioaktivitätsdaten dar. Allerdings ist die Datenqualität oftmals ein zentrales Problem bei der Datenintegration. Die Ursache hierfür ist die Verwendung von verschiedenen Bioassays und „Readouts“, deren Daten zum Großteil aus primären und bestätigenden Bioassays gewonnen werden. Während ein Großteil der Treffer aus primären Assays als falsch-positiv eingestuft werden, zeigen einige Substanzen keine biologische Aktivität, obwohl sie in beiden Assay- Typen ausgiebig getestet wurden (“extensively assayed compounds”). In diese Arbeit wurden verschiedene chemoinformatische Methoden entwickelt und angewandt, um die zuvor genannten Probleme zu thematisieren sowie Lösungsansätze aufzuzeigen und im Endeffekt die Arzneimittelforschung zu beschleunigen. Hierfür wurden nicht redundante, Hand-validierte Wissensdatenbanken für zugelassene und zurückgezogene Medikamente erstellt und mit weiterführenden Informationen angereichert, um die Entdeckung und Optimierung kleiner organischer Moleküle voran zu treiben. Ein entscheidendes Tool ist hierbei die Aufklärung derer Wirkmechanismen sowie Off-Target-Interaktionen. Für die weiterführende Charakterisierung von Nebenwirkungen, wurde ein Hauptaugenmerk auf Nuklearrezeptoren, Pathways in welchen Stressrezeptoren involviert sind sowie den hERG-Kanal gelegt und mit in-silico Modellen simuliert. Die Erstellung dieser Modelle wurden Mithilfe eines integrativen Ansatzes aus “state-of-the-art” Algorithmen wie Ähnlichkeitsvergleiche und “Machine- Learning” umgesetzt. Um ein hohes Maß an Vorhersagequalität zu gewährleisten, wurde bei der Evaluierung der Datensätze explizit auf die Datenqualität und deren chemische Vielfalt geachtet. Weiterführend wurden die in-silico-Modelle dahingehend erweitert, das Substrukturfilter genauer betrachtet wurden, um richtige Wirkmechanismen von unspezifischen Bindungsverhalten (falsch- positive Substanzen) zu unterscheiden. Abschließend wurden der chemische und biologische Raum ausgiebig getesteter, jedoch inaktiver, kleiner organischer Moleküle (“extensively assayed compounds”) untersucht und mit aktuell zugelassenen Medikamenten verglichen, um ihr Potenzial als vielversprechende Kandidaten zu bestätigen

Institutional Repository of the Freie Universität Berlin

Unified processing framework of high-dimensional and overly imbalanced chemical datasets for virtual screening.

Author: Rafati-Afshar Amir Ali
Publication venue
Publication date
Field of study

Virtual screening in drug discovery involves processing large datasets containing unknown molecules in order to find the ones that are likely to have the desired effects on a biological target, typically a protein receptor or an enzyme. Molecules are thereby classified into active or non-active in relation to the target. Misclassification of molecules in cases such as drug discovery and medical diagnosis is costly, both in time and finances. In the process of discovering a drug, it is mainly the inactive molecules classified as active towards the biological target i.e. false positives that cause a delay in the progress and high late-stage attrition. However, despite the pool of techniques available, the selection of the suitable approach in each situation is still a major challenge. This PhD thesis is designed to develop a pioneering framework which enables the analysis of the virtual screening of chemical compounds datasets in a wide range of settings in a unified fashion. The proposed method provides a better understanding of the dynamics of innovatively combining data processing and classification methods in order to screen massive, potentially high dimensional and overly imbalanced datasets more efficiently

Bournemouth University Research Online

Recommended from our members

Geometric Learning for Quantum-Informed, Machine Learning and Analysis of Electrostatic Preorganization

Author: Vargas Santiago
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

This thesis is organized in a slightly unconventional fashion: algorithms lead and appli-cations fill out the content. I think this emphasizes my interests during graduate school - I built algorithms and tools to address issues that were otherwise inaccessible to different areas of computational chemistry (including applied machine learning) and enzymology. Two sets of scientific thrusts underscore the bulk of my work: algorithms to analyze dynamic, heterogeneous fields in the context of enzymology and flexible machine learning algorithms, including those that leverage quantum descriptors, for rigorous molecular and reaction-level properties. Each section will include grounding on applications and broader impacts for the reader as well. Now we pivot to discussing the main thrusts and outlining each chapter briefly.General ML and Quantum Theory of Atoms-in-Molecules (QTAIM): QTAIMserves as a mathematical decomposition algorithm for electronic basins within a molecule. The algorithm intakes molecular densities, as computed (typically) by density functional theory (DFT), and uses the flux of density to partition the scalar field into 3-dimensional atomic basins of density [14, 16]. These objects are known as atomic basins and represent the quantum atom within a molecule. By constructing these structures, we compute a rich set of mathematical descriptors that map to many features including energies, bonding, and electron delocalization. These features have been correlated, in the past, to activation energies, reactivity, and overall system energies, but these uses largely relied on human intervention and small datasets [44, 62, 65, 111, 142, 287]. By developing software centered around high-throughput QTAIM calculations and machine learning, I was able to bring these descriptors to larger datasets and a wide host of applications. In Chapter 2, I discuss an algorithm I implemented to predict Diels-Alder reaction barriers from QTAIM signatures alone. In this study, we showed that QTAIM features, can be used to surmise reaction barriers while also using machine learning techniques to understand what signatures were most informative to our models. Here QTAIM electrostatic potentials and delocalization indices alone were able to yield great performance on withheld datasets. In addition, we demonstrated that QTAIM features can allow a machine learning model to generalize, to an extent, to much larger Diels-Alder reactions. This chapter was adapted from the following: Machine Learning to Predict Diels–Alder Reaction Barriers from the Reactant State Electron Density. S. Vargas*, M. Hannefarth, Z. Liu, A.N. Alexandrova. Journal of Chemical Theory and Computation 2021 17 (10), 6203-6213. 10.1021/acs.jctc.1c00623. In Chapter 3, I discuss a package developed to perform high-throughput QTAIM calculations on datasets of molecules and reactions. This package is currently adapted to work with open-source packages such as ORCA and Multiwfn. These softwares, respectively, compute DFT densities at a user-specified level of theory and subsequently compute QTAIM descriptors. The package is built with high-performance compute (HPC) in mind as it can operate on a single dataset with an arbitrary number of concurrent jobs. Here I also used the package to compute QTAIM values for a diverse set of important and difficult datasets and developed graph neural networks to predict molecular and reaction properties leveraging QTAIM as inputs. This chapter was adapted from the following: This was adapted from High-throughput quantum theory of atoms in molecules (QTAIM) for geometric deep learning of molecular and reaction properties Santiago Vargas, Winston Gee, and Anastassia N. Alexandrova. Digital Discovery 2024 3, 987-998.Advancing Analysis of Electric Fields in Proteins: The later chapters follow ourwork in developing algorithms to ingest, interpret, and predict on electric fields in protein active sites. This work builds on the notion of electrostatic preorganization, a theory that posits that protein scaffolds arrange to electrostatically catalyse chemical reactions, and thereby, destabilizing reactants while suppressing transition state energies [299, 301]. Chapter 4 depicts exhaustive efforts to apply heterogenous electric field analysis to understanding directed evolution in the context of a protoglobin directed evolution (DE) trajectory. Previous DE efforts optimized protoglobin to efficiently catalyze carbene transfer reactions. We show that traditional explanations for increased catalytic activity across the DE lineage, substrate access and binding, cannot account for the dramatic improvements in protein activity. By tracking the 3-D electric field and using clustering algorithms, we pinpoint representative structures for QM/MM calculations and show that changes in the electric field, along DE, improve carbene transfer reactivity. These findings highlight the role electrostatic organization, notably its dynamic effect, has on determining protein function and points to its future importance in designing proteins for relevant chemical processes. This chapter is adapted from Directed Evolution of Protoglobin Optimizes the Enzyme Electric Field. Shobhit S. Chaturvedi, Santiago Vargas, Pujan Ajmera, and Anastassia N. Alexandrova. Journal of the American Chemical Society 2024 146 (24), 16670-16680 DOI: 10.1021/jacs.4c03914. In Chapter 5, I introduce a machine learning framework designed to predict enzyme functionality directly from the heterogeneous electric fields applied to protein active sites. We apply this method to a dataset of Heme-Iron Oxidoreductases. Previous studies here, focused on simple, point electric fields along the Fe-O bond, are insufficient for reasonable accuracy. On the otherhand, our 3-D, heterogenous model can accurately predict protein activity without relying on additional protein-specific information. In addition, feature selection elucidates what electric field components most inform our models and thus highlight important components to reactivity and selectivity. Finally, we apply previously-mentioned electric field clustering algorithms and QM/MM calculations to reveal how dynamic complexities in protein structures can complicate predictions and thus provides a path forward for improved models in this space. This chapter is adapted from Machine-learning prediction of protein function from the portrait of its intramolecular electric field. S. Vargas*, S. Chaturvedi, A.N. Alexandrova. (Accepted, Journal of the American Chemical Society

eScholarship - University of California

Recommended from our members

Integrative omics approaches for new target identification and therapeutics development

Author: Kanapeckaitė Austė
Publication venue
Publication date: 31/03/2022
Field of study

The growing research and commercial pressures for novel therapeutics development accentuate why better strategies are needed for drug discovery. The costly nature of developing a pharmaceutical compound as well as the shrinking pool of ‘easy’ targets are some of the key reasons why there is a research paradigm shift towards integrative and systems biology driven approaches. Moreover, multifactorial aspects of many diseases require more innovative clinical strategies rather than just focusing on a single target. Cardiovascular diseases as well as associated immune components exemplify this complexity well. This thesis aimed to introduce a gradual and highly integrative analytical framework by incorporating a full range of studies from disease target selection to high-throughput virtual screening so that a cost-effective and efficient stratification of targets and associated compounds could be achieved. Heart failure served as a case study for complex diseases where the first in-depth omics study on cardiomyopathies helped to elucidate new therapeutic avenues. This research tied in with a development of a novel scoring function and integrated machine learning approach for multiple therapeutic target classification and exploration. Finally, all pieces of the introduced research were used to create a highly integrative in silico screening workflow. Some of the key results included the first reported molecular dynamics analyses for a complex immunotherapeutic target, c-Rel, as well as 15 new therapeutic compounds that could potentially modulate this transcription factor subunit. Thus, this dissertation provided several important improvements for target identification, validation, and drug discovery that could significantly advance current development strategies and accelerate new therapeutics production

Central Archive at the University of Reading