26 research outputs found

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Sección bibliográfica

    Get PDF

    Unweaving complex reactivity: graph-based tools to handle chemical reaction networks

    Get PDF
    La informació a nivell molecular obtinguda mitjançant estudis "in silico" s’ha establert com una eina essencial per a la caracterització de mecanismes de reacció complexos. A més, l’aplicabilitat de la química computacional s’ha vist substancialment ampliada a causa de l’increment continuat de la potència de càlcul disponible durant les darreres dècades. Així, no només han augmentat la precisió dels mètodes a utilitzar o la mida dels sistemes a modelitzar sinó també el grau de detall que es pot aconseguir en les descripcions mecanístiques resultants. Tanmateix, aquestes caracteritzacions més profundes, usualment assistides per tècniques d’automatització que permeten l’exploració de regions més extenses de l’espai químic, suposen un increment de la complexitat dels sistemes estudiats i per tant una limitació de la seva interpretabilitat. En aquesta Tesi s’han proposat, desenvolupat i posat a prova diverses eines amb el fi de fer el processament d’aquest tipus de xarxes de reacció químiques (CRNs) més simple i millorar la comprensió de processos reactius i catalítics complexos. Aquesta col·lecció d’eines té com fonament la utilització de grafs per modelitzar les xarxes (CRNs) corresponents, per poder fer servir els mètodes de la Teoria de Grafs (cerca de camins, isomorfismes...) en un context químic. Més concretament, aquestes eines inclouen amk-tools, una llibreria per a la visualització interactiva de xarxes de reacció descobertes de manera automàtica, gTOFfee, per a l’aplicació del "energy span model" pel càlcul de la freqüència de recanvi de cicles catalítics complexos calculats computacionalment, i OntoRXN, una ontologia per descriure CRNs de forma semàntica, integrant la topologia de la xarxa i la informació calculada en una única entitat organitzada segons els principis del "Semantic Data".La información a nivel molecular obtenida por medio de estudios "in silico" se ha convertido en una herramienta indispensable para la caracterización y comprensión de mecanismos de reacción complejos. Asimismo, la aplicabilidad de la química computacional se ha ampliado sustancialmente como consecuencia del continuo incremento de la potencia de cálculo durante las últimas décadas. Así, no sólo han aumentado la precisión de los métodos o el tamaño de los sistemas modelizables, sino también el grado de detalle en la descripción mecanística. Sin embargo, aumentar la profundidad de la caracterización de un sistema químico, usualmente a través de técnicas de automatización que permiten explorar ecciones más extensas del espacio químico, supone un aumento en la complejidad de los sistemas resultantes, dificultando la interpretación de los resultados. En esta Tesis se han propuesto, desarrollado y puesto a prueba distintas herramientas para simplificar el procesado de este tipo de redes de reacción químicas (CRNs), con el fin de mejorar la comprensión de procesos reactivos y catalíticos complejos. Este conjunto de herramientas se basa en el uso de grafos para modelizar las redes (CRNs) correspondientes, con tal de poder emplear los métodos de la Teoría de Grafos (búsqueda de caminos, isomorfismos...) bajo un contexto químico. Concretamente, estas herramientas incluyen amk-tools, para la visualización interactiva de redes de reacción descubiertas automáticamente, gTOFfee, para la aplicación del “energy span model” para calcular la frecuencia de recambio de ciclos catalíticos complejos caracterizados computacionalmente, y OntoRXN, una ontología para describir CRNs de manera semántica, integrando la topología de la red y la información calculada en una única entidad organizada bajo los principios del “Semantic Data”.The molecular-level insights gathered through "in silico" studies have become an essential asset for the elucidation and understanding of complex reaction mechanisms. Indeed, the applicability of computational chemistry has strongly widened due to the vast increase in computational power along the last decades. In this sense, not only the accuracy of the applied methods or the size of the target systems have increased, but also the level of detail attained for the mechanistic description. However, performing deeper descriptions of chemical systems, most often resorting to automation techniques that allow to easily explore larger parts of the chemical space, comes at the cost of also augmenting their complexity, rendering the results much harder to interpret. Throughout this Thesis, we have proposed, developed and tested a collection of tools aiming to process this kind of complex chemical reaction networks (CRNs), in order to provide new insights on reactive and catalytic processes. All of these tools employ graphs to model the target CRNs, in order to be able to use the methods of Graph Theory (e.g. path searches, isomorphisms...) in a chemical context. The tools that are discussed include amk-tools, a framework for the interactive visualization of automatically discovered reaction networks, gTOFfee, for the application of the energy span model to compute the turnover frequency of computationally characterized catalytic cycles, and OntoRXN, an ontology for the description of CRNs in a semantic manner integrating network topology and calculation information in a single, highly-structured entity

    Development of Improved Torsional Potentials in Classical Force Field Models of Poly (Lactic Acid)

    Get PDF
    In this work, existing force field descriptions of poly (lactic acid), or PLA, were improved by modifying the torsional potential energy terms to more accurately model the bond rotational behavior of PLA. Extensive calculations were carried out using density functional theory (DFT), for small PLA molecules in vacuo, and also using DFT with a continuum model to approximate the electronic structure of PLA in its condensed phase. From these results, improved force field parameters were developed using a combination of the OPLS and CHARMM force fields. The new force field, PLAFF2, is an update to the previously developed PLAFF model developed in David Bruce\u27s group, and results in more realistic conformational distributions during simulation of bulk amorphous PLA. It is demonstrated that the PLAFF2 model retains the accuracy of the original PLAFF in simulating the crystalline α polymorph of PLA. The PLAFF2 model has superior performance to any other publicly available force field for use with PLA; hence, we recommend its use in future modeling studies on the material, whether in its crystalline or amorphous form

    Towards Efficient Novel Materials Discovery

    Get PDF
    Die Entdeckung von neuen Materialien mit speziellen funktionalen Eigenschaften ist eins der wichtigsten Ziele in den Materialwissenschaften. Das Screening des strukturellen und chemischen Phasenraums nach potentiellen neuen Materialkandidaten wird häufig durch den Einsatz von Hochdurchsatzmethoden erleichtert. Schnelle und genaue Berechnungen sind eins der Hauptwerkzeuge solcher Screenings, deren erster Schritt oft Geometrierelaxationen sind. In Teil I dieser Arbeit wird eine neue Methode der eingeschränkten Geometrierelaxation vorgestellt, welche die perfekte Symmetrie des Kristalls erhält, Resourcen spart sowie Relaxationen von metastabilen Phasen und Systemen mit lokalen Symmetrien und Verzerrungen erlaubt. Neben der Verbesserung solcher Berechnungen um den Materialraum schneller zu durchleuchten ist auch eine bessere Nutzung vorhandener Daten ein wichtiger Pfeiler zur Beschleunigung der Entdeckung neuer Materialien. Obwohl schon viele verschiedene Datenbanken für computerbasierte Materialdaten existieren ist die Nutzbarkeit abhängig von der Darstellung dieser Daten. Hier untersuchen wir inwiefern semantische Technologien und Graphdarstellungen die Annotation von Daten verbessern können. Verschiedene Ontologien und Wissensgraphen werden entwickelt anhand derer die semantische Darstellung von Kristallstrukturen, Materialeigenschaften sowie experimentellen Ergebenissen im Gebiet der heterogenen Katalyse ermöglicht werden. Wir diskutieren, wie der Ansatz Ontologien und Wissensgraphen zu separieren, zusammenbricht wenn neues Wissen mit künstlicher Intelligenz involviert ist. Eine Zwischenebene wird als Lösung vorgeschlagen. Die Ontologien bilden das Hintergrundwissen, welches als Grundlage von zukünftigen autonomen Agenten verwendet werden kann. Zusammenfassend ist es noch ein langer Weg bis Materialdaten für Maschinen verständlich gemacht werden können, so das der direkte Nutzen semantischer Technologien nach aktuellem Stand in den Materialwissenschaften sehr limitiert ist.The discovery of novel materials with specific functional properties is one of the highest goals in materials science. Screening the structural and chemical space for potential new material candidates is often facilitated by high-throughput methods. Fast and still precise computations are a main tool for such screenings and often start with a geometry relaxation to find the nearest low-energy configuration relative to the input structure. In part I of this work, a new constrained geometry relaxation is presented which maintains the perfect symmetry of a crystal, saves time and resources as well as enables relaxations of meta-stable phases and systems with local symmetries or distortions. Apart from improving such computations for a quicker screening of the materials space, better usage of existing data is another pillar that can accelerate novel materials discovery. While many different databases exists that make computational results accessible, their usability depends largely on how the data is presented. We here investigate how semantic technologies and graph representations can improve data annotation. A number of different ontologies and knowledge graphs are developed enabling the semantic representation of crystal structures, materials properties as well experimental results in the field of heterogeneous catalysis. We discuss the breakdown of the knowledge-graph approach when knowledge is created using artificial intelligence and propose an intermediate information layer. The underlying ontologies can provide background knowledge for possible autonomous intelligent agents in the future. We conclude that making materials science data understandable to machines is still a long way to go and the usefulness of semantic technologies in the domain of materials science is at the moment very limited

    Representing chemical structures using OWL and discriptions graphs

    Get PDF
    Objects can be said to be structured when their representation also contains their parts. While OWL in general can describe structured objects, description graphs are a recent, decidable extension to OWL which support the description of classes of structured objects whose parts are related in complex ways. Classes of chemical entities such as molecules, ions and groups (parts of molecules) are often characterised by the way in which the constituent atoms of their instances are connected via chemical bonds. For chemoinformatics tools and applications, this internal structure is represented using chemical graphs. We here present a chemical knowledge base based on the standard chemical graph model using description graphs, OWL and rules. We include in our ontology chemical classes, groups, and molecules, together with their structures encoded as description graphs. We show how role-safe rules can be used to determine parthood between groups and molecules based on the graph structures and to determine basic chemical properties. Finally, we investigate the scalability of the technology used through the development of an automatic utility to convert standard chemical graphs into description graphs, and converting a large number of diverse graphs obtained from a publicly available chemical database.Computer Science (School of Computing)M. Sc. (Computer Science

    Compact numeric alkane codes derived from IUPAC nomenclature

    No full text

    Improving the performance of gas sensor systems with advanced data evaluation, operation, and calibration methods

    Get PDF
    In order to facilitate the widespread use of gas sensors, some challenges must still be overcome. Many of those are related to the reliable quantification of ultra-low concentrations of specific compounds in a background of other gases. This thesis focuses on three important items in the measurement chain: sensor material and operating modes, evaluation of the resulting data, and test gas generation for efficient sensor calibration. New operating modes and materials for gas-sensitive field-effect transistors have been investigated. Tungsten trioxide as gate oxide can improve the selectivity to hazardous volatile organic compounds like naphthalene even in a strong and variable ethanol background. The influence of gate bias and ultraviolet light has been studied with respect to the transport of oxygen anions on the sensor surface and was used to improve classification and quantification of different gases. DAV3E, an internationally recognized MATLAB-based toolbox for the evaluation of cyclic sensor data, has been developed and published as opensource. It provides a user-friendly graphical interface and specially tailored algorithms from multivariate statistics. The laboratory tests conducted during this project have been extended with an interlaboratory study and a field test, both yielding valuable insights for future, more complex sensor calibration. A novel, efficient calibration approach has been proposed and evaluated with ten different gas sensor systems.Vor der weitverbreiteten Nutzung von Gassensoren stehen noch einige Herausforderungen, insbesondere die zuverlässige Messung ultrakleiner Konzentrationen bestimmter Substanzen vor einem Hintergrund anderer Gase. Diese Arbeit konzentriert sich auf drei wichtige Glieder der erforderlichen Messkette: Material und Betriebsweise von Sensoren, Auswertung der anfallenden Daten sowie Generierung von Testgasen zur effizienten Kalibrierung. Neue Betriebsmodi und Materialien für gassensitive Feldeffekttransistoren wurden getestet. Wolframtrioxid kann als Gateoxid die Selektivität für flüchtige organische Verbindungen wie Naphthalin in einem variierenden Ethanolhintergrund verbessern. Der Einfluss von Gate-Bias und ultravioletter Strahlung auf die Bewegung von Sauerstoffionen auf der Oberfläche wurde untersucht und genutzt, um die Klassifizierung und Quantifizierung von Gasen zu verbessern. Eine international anerkannte MATLAB-Toolbox zur Auswertung zyklischer Sensordaten, DAV3E, wurde entwickelt und als open source veröffentlicht. Sie stellt eine nutzerfreundliche Oberfläche und speziell angepasste Algorithmen der multivariaten Statistik zur Verfügung. Die Laborexperimente wurden ergänzt durch vergleichende Messungen in zwei unabhängigen Laboren und einen Feldtest, womit wertvolle Erkenntnisse für die künftig notwendige, komplexe Kalibrierung von Sensoren gewonnen wurden. Ein neuartiger, effizienter Kalibrieransatz wurde vorgestellt und mit zehn unterschiedlichen Sensorsystemen evaluiert

    Physical Adsorption of Linear Hydrocarbon Quadrupoles on Graphite and MgO (100): Effects of the Compatibility of Surface and Molecular Symmetries

    Get PDF
    The process of physical adsorption finds a practical role in wide-ranging fields from catalysis, to lubrication, and even optoelectronics. Furthermore, it provides a mechanism to probe the fundamental understanding of intermolecular forces and how symmetries can play a role in the behavior of a system. Linear quadrupoles preferentially adopt square-T configurations when confined in two dimensions. This would lead the system to adopt a four-fold symmetry in the molecular lattice. Two archetypal surfaces often studied in physisorption research are MgO (100), which has a four-fold symmetry of alternating charges, and the basal plane of graphite, which has a six-fold symmetry to its non-polar, weakly corrugated surface. These differing surface symmetries provide two test cases for comparison. In the case of MgO (100), the molecule-molecule and molecule-surface interaction are synergistic, both driving the film towards the same symmetry; whereas for graphite, the six-fold surface symmetry is incompatible with the preferred four-fold interaction symmetry of the molecules. This presents the opportunity for structurally frustrated systems to arise. Acetylene and allene are both simple, linear, rigid hydrocarbons with large quadrupole moments of similar strength. The most distinct variations between these two molecules are size and axial rotational symmetry. These molecules, just like the surface, provide two simple, but contrasting symmetry effects. The simple point group of truly linear molecules of acetylene allow for them to lie completely flat against a surface. The 90-degree dihedral angle between the hydrogen pairs on opposing sides of allene molecules prevent them from easily being able to lie perfectly flat against the surface, creating another opportunity for broken symmetry in the molecule-surface interactions – this instance in the vertical direction rather than the two dimensional adsorption plane. This investigation aims to study the behavioral properties of acetylene and allene films through thermodynamic, structural, and phase behavior analyses when adsorbed on both graphite and MgO. To this end, a combination of volumetric adsorption isotherms, elastic neutron diffraction, and computational modeling have been employed
    corecore