120 research outputs found

    Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

    No full text
    Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    EU US Roadmap Nanoinformatics 2030

    Get PDF
    The Nanoinformatics Roadmap 2030 is a compilation of state-of-the-art commentaries from multiple interconnecting scientific fields, combined with issues involving nanomaterial (NM) risk assessment and governance. In bringing these issues together into a coherent set of milestones, the authors address three recognised challenges facing nanoinformatics: (1) limited data sets; (2) limited data access; and (3) regulatory requirements for validating and accepting computational models. It is also recognised that data generation will progress unequally and unstructured if not captured within a nanoinformatics framework based on harmonised, interconnected databases and standards. The implicit coordination efforts within such a framework ensure early use of the data for regulatory purposes, e.g., for the read-across method of filling data gaps

    Structural and semantic similarity metrics for chemical compound classification

    Get PDF
    Tese de mestrado, Bioquímica, Universidade de Lisboa, Faculdade de Ciências, 2010Ao longo das últimas décadas, tem-se assistido a um grande aumento na quantidade de dados produzidos e disponibilizados em química, em especial após a introdução de métodos de análise mecanizados. Devido a este crescimento no número de dados, existe cada vez mais uma necessidade de implementar sistemas automáticos computacionais capazes de armazenar, estudar e interpretar estes dados de forma eficiente. Uma das tarefas mais importantes em quimio-informática é, de facto, a utilização dos dados obtidos em laboratório em sistemas de comparação e classificação de compostos químicos. Os métodos actuais mais eficazes baseiam-se na premissa de que a função de um composto químico está intimamente relacionada com a sua estrutura. Apesar de esta premissa estar geralmente correcta, como comprovam os métodos actuais, eles podem falhar, especialmente quando moléculas parecidas desempenham funções diferentes (como acontece com os l- e d-aminoácidos) ou moléculas diferentes desempenham uma função biológica semelhante (como acontece com inúmeros exemplos de inibidores). O trabalho proposto neste documento apresenta uma solução para resolver este problema através da utilização de uma métrica híbrida que integre no seu núcleo informação não só estrutural mas também semântica, ou seja, o sistema desenvolvido tem a capacidade de explorar a informação acerca do significado das moléculas num contexto bioquímico. Para este efeito, utilizei o ChEBI como fonte de informação semântica, tendo criado uma ferramenta denominada Chym (Chemical Hybrid Metric) que é capaz de lidar com problemas de classificação de compostos químicos. Resumidamente, para decidir se um composto químico possui uma determinada característica, por exemplo se atravessa a barreira hematoencefálica, este sistema atribui ao composto um coeficiente de actividade que é calculado com base nos compostos químicos que se sabe possuírem a característica; por comparação com um valor de corte, o Chym classifica o composto em estudo como possuidor ou não dessa característica. A ferramenta que resultou do trabalho desta tese foi aqui explorada e validada. Assim, o trabalho apresentado mostra evidências substanciais que suportam a eficácia do Chym, uma vez que este apresenta melhores resultados do que todos os modelos com os quais foi comparado. Particularmente, para três problemas seleccionados, o Chym decide correctamente qual a classificação de um composto 90.9%, 87.7% e 84.2% das vezes: pela ordem apresentada, esses valores referem-se à classificação de compostos como permeáveis à barreira hematoencefálica, como substratos da glicoproteína-P, ou como ligandos de um receptor de estrogénio. Para efeitos de comparação, estes três problemas foram anteriormente resolvidos com exactidão de 81.5%, 80.6% e 82.8% respectivamente. Comprova-se, portanto, a hipótese da tese, ou seja, que a integração de informação semântica em sistemas de comparação e classificação de compostos químicos aumenta, por vezes de forma substancial, a fidelidade do método. Desta forma, o objectivo da tese foi bem sucedido em duas frentes. Por um lado a tese serviu para validar a hipótese, e por outro culminou na criação de uma ferramenta de classificação de compostos químicos que pode vir a ser usada no futuro em projectos mais abrangentes, nomeadamente no estudo da evolução das vias metabólicas, na área de desenvolvimento de fármacos ou na análise preliminar da toxicidade de compostos químicos.Over the last few decades, there has been an increasing number of attempts at creating systems capable of comparing and classifying chemical compounds based on their structure and/or physicochemical properties. While the rate of success of these approaches has been increasing, particularly with the introduction of new and ever more sophisticated methods of machine learning, there is still room for improvement. One of the problems of these methods is that they fail to consider that similar molecules may have different roles in nature, or, to a lesser extend, that disparate molecules may have similar roles. This thesis proposes the exploitation of the semantic properties of chemical compounds, as described in the ChEBI ontology, to create an efficient system able to automatically deal with the binary classification of chemical compounds. To that effect, I developed Chym (Chemical Hybrid Metric) as a tool that integrates structural and semantic information in a unique hybrid metric. The work here presented shows substantial evidence supporting the effectiveness of Chym, since it has outperformed all the models with which it was compared. Particularly, it achieved accuracy values of 90.9%, 87.7% and 84.2% when solving three classification problems which, previously, had only been solved with accuracy values of 81.5%, 80.6% and 82.8% respectively. Other results show that the tool is appropriate to use even if the problem at hand is not well represented in the ChEBI ontology. Thus, Chym shows that considering the semantic properties of a compound helps solving classification problems. Therefore, Chym can be used in projects that require the classification and/or the comparison of chemical compounds, such as the study of the evolution of metabolic pathways, drug discovery or in preliminary toxicity analysis

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Development Of Database And Computational Methods For Disease Detection And Drug Discovery

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Integrative Systems Approaches Towards Brain Pharmacology and Polypharmacology

    Get PDF
    Polypharmacology is considered as the future of drug discovery and emerges as the next paradigm of drug discovery. The traditional drug design is primarily based on a “one target-one drug” paradigm. In polypharmacology, drug molecules always interact with multiple targets, and therefore it imposes new challenges in developing and designing new and effective drugs that are less toxic by eliminating the unexpected drug-target interactions. Although still in its infancy, the use of polypharmacology ideas appears to already have a remarkable impact on modern drug development. The current thesis is a detailed study on various pharmacology approaches at systems level to understand polypharmacology in complex brain and neurodegnerative disorders. The research work in this thesis focuses on the design and construction of a dedicated knowledge base for human brain pharmacology. This pharmacology knowledge base, referred to as the Human Brain Pharmacome (HBP) is a unique and comprehensive resource that aggregates data and knowledge around current drug treatments that are available for major brain and neurodegenerative disorders. The HBP knowledge base provides data at a single place for building models and supporting hypotheses. The HBP also incorporates new data obtained from similarity computations over drugs and proteins structures, which was analyzed from various aspects including network pharmacology and application of in-silico computational methods for the discovery of novel multi-target drug candidates. Computational tools and machine learning models were developed to characterize protein targets for their polypharmacological profiles and to distinguish indications specific or target specific drugs from other drugs. Systems pharmacology approaches towards drug property predictions provided a highly enriched compound library that was virtually screened against an array of network pharmacology based derived protein targets by combined docking and molecular dynamics simulation workflows. The developed approaches in this work resulted in the identification of novel multi-target drug candidates that are backed up by existing experimental knowledge, and propose repositioning of existing drugs, that are undergoing further experimental validations
    corecore