246 research outputs found

    Internet based molecular collaborative and publishing tools

    No full text
    The scientific electronic publishing model has hitherto been an Internet based delivery of electronic articles that are essentially replicas of their paper counterparts. They contain little in the way of added semantics that may better expose the science, assist the peer review process and facilitate follow on collaborations, even though the enabling technologies have been around for some time and are mature. This thesis will examine the evolution of chemical electronic publishing over the past 15 years. It will illustrate, which the help of two frameworks, how publishers should be exploiting technologies to improve the semantics of chemical journal articles, namely their value added features and relationships with other chemical resources on the Web. The first framework is an early exemplar of structured and scalable electronic publishing where a Web content management system and a molecular database are integrated. It employs a test bed of articles from several RSC journals and supporting molecular coordinate and connectivity information. The value of converting 3D molecular expressions in chemical file formats, such as the MOL file, into more generic 3D graphics formats, such as Web3D, is assessed. This exemplar highlights the use of metadata management for bidirectional hyperlink maintenance in electronic publishing. The second framework repurposes this metadata management concept into a Semantic Web application called SemanticEye. SemanticEye demonstrates how relationships between chemical electronic articles and other chemical resources are established. It adapts the successful semantic model used for digital music metadata management by popular applications such as iTunes. Globally unique identifiers enable relationships to be established between articles and other resources on the Web and SemanticEye implements two: the Document Object Identifier (DOI) for articles and the IUPAC International Chemical Identifier (InChI) for molecules. SemanticEye’s potential as a framework for seeding collaborations between researchers, who have hitherto never met, is explored using FOAF, the friend-of-a-friend Semantic Web standard for social networks

    A metadata-driven approach to data repository design

    Get PDF
    The design and use of a metadata-driven data repository for research data management is described. Metadata is collected automatically during the submission process whenever possible and is registered with DataCite in accordance with their current metadata schema, in exchange for a persistent digital object identifier. Two examples of data preview are illustrated, including the demonstration of a method for integration with commercial software that confers rich domain-specific data analytics without introducing customisation into the repository itself

    Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

    No full text
    Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information

    Design and implementation of a platform for predicting pharmacological properties of molecules

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2019O processo de descoberta e desenvolvimento de novos medicamentos prolonga-se por vários anos e implica o gasto de imensos recursos monetários. Como tal, vários métodos in silico são aplicados com o intuito de dimiuir os custos e tornar o processo mais eficiente. Estes métodos incluem triagem virtual, um processo pelo qual vastas coleções de compostos são examinadas para encontrar potencial terapêutico. QSAR (Quantitative Structure Activity Relationship) é uma das tecnologias utilizada em triagem virtual e em optimização de potencial farmacológico, em que a informação estrutural de ligandos conhecidos do alvo terapêutico é utilizada para prever a actividade biológica de um novo composto para com o alvo. Vários investigadores desenvolvem modelos de aprendizagem automática de QSAR para múltiplos alvos terapêuticos. Mas o seu uso está dependente do acesso aos mesmos e da facilidade em ter os modelos funcionais, o que pode ser complexo quando existem várias dependências ou quando o ambiente de desenvolvimento difere bastante do ambiente em que é usado. A aplicação ao qual este documento se refere foi desenvolvida para lidar com esta questão. Esta é uma plataforma centralizada onde investigadores podem aceder a vários modelos de QSAR, podendo testar os seus datasets para uma multitude de alvos terapêuticos. A aplicação permite usar identificadores moleculares como SMILES e InChI, e gere a sua integração em descritores moleculares para usar como input nos modelos. A plataforma pode ser acedida através de uma aplicação web com interface gráfica desenvolvida com o pacote Shiny para R e directamente através de uma REST API desenvolvida com o pacote flask-restful para Python. Toda a aplicação está modularizada através de teconologia de “contentores”, especificamente o Docker. O objectivo desta plataforma é divulgar o acesso aos modelos criados pela comunidade, condensando-os num só local e removendo a necessidade do utilizador de instalar ou parametrizar qualquer tipo de software. Fomentando assim o desenvolvimento de conhecimento e facilitando o processo de investigação.The drug discovery and design process is expensive, time-consuming and resource-intensive. Various in silico methods are used to make the process more efficient and productive. Methods such as Virtual Screening often take advantage of QSAR machine learning models to more easily pinpoint the most promising drug candidates, from large pools of compounds. QSAR, which means Quantitative Structure Activity Relationship, is a ligand-based method where structural information of known ligands of a specific target is used to predict the biological activity of another molecule against that target. They are also used to improve upon an existing molecule’s pharmacologic potential by elucidating the structural composition with desirable properties. Several researchers create and develop QSAR machine learning models for a variety of different therapeutic targets. However, their use is limited by lack of access to said models. Beyond access, there are often difficulties in using published software given the need to manage dependencies and replicating the development environment. To address this issue, the application documented here was designed and developed. In this centralized platform, researchers can access several QSAR machine learning models and test their own datasets for interaction with various therapeutic targets. The platform allows the use of widespread molecule identifiers as input, such as SMILES and InChI, handling the necessary integration into the appropriate molecular descriptors to be used in the model. The platform can be accessed through a Web Application with a full graphical user interface developed with the R package Shiny and through a REST API developed with the Flask Restful package for Python. The complete application is packaged up in container technology, specifically Docker. The main goal of this platform is to grant widespread access to the QSAR models developed by the scientific community, by concentrating them in a single location and removing the user’s need to install or set up software unfamiliar to them. This intends to incite knowledge creation and facilitate the research process

    Characterisation of data resources for in silico modelling: benchmark datasets for ADME properties.

    Get PDF
    Introduction: The cost of in vivo and in vitro screening of ADME properties of compounds has motivated efforts to develop a range of in silico models. At the heart of the development of any computational model are the data; high quality data are essential for developing robust and accurate models. The characteristics of a dataset, such as its availability, size, format and type of chemical identifiers used, influence the modelability of the data. Areas covered: This review explores the usefulness of publicly available ADME datasets for researchers to use in the development of predictive models. More than 140 ADME datasets were collated from publicly available resources and the modelability of 31selected datasets were assessed using specific criteria derived in this study. Expert opinion: Publicly available datasets differ significantly in information content and presentation. From a modelling perspective, datasets should be of adequate size, available in a user-friendly format with all chemical structures associated with one or more chemical identifiers suitable for automated processing (e.g. CAS number, SMILES string or InChIKey). Recommendations for assessing dataset suitability for modelling and publishing data in an appropriate format are discussed
    • …
    corecore