89 research outputs found

    Self-organizing ontology of biochemically relevant small molecules

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advent of high-throughput experimentation in biochemistry has led to the generation of vast amounts of chemical data, necessitating the development of novel analysis, characterization, and cataloguing techniques and tools. Recently, a movement to publically release such data has advanced biochemical structure-activity relationship research, while providing new challenges, the biggest being the curation, annotation, and classification of this information to facilitate useful biochemical pattern analysis. Unfortunately, the human resources currently employed by the organizations supporting these efforts (e.g. ChEBI) are expanding linearly, while new useful scientific information is being released in a seemingly exponential fashion. Compounding this, currently existing chemical classification and annotation systems are not amenable to automated classification, formal and transparent chemical class definition axiomatization, facile class redefinition, or novel class integration, thus further limiting chemical ontology growth by necessitating human involvement in curation. Clearly, there is a need for the automation of this process, especially for novel chemical entities of biological interest.</p> <p>Results</p> <p>To address this, we present a formal framework based on Semantic Web technologies for the automatic design of chemical ontology which can be used for automated classification of novel entities. We demonstrate the automatic self-assembly of a structure-based chemical ontology based on 60 MeSH and 40 ChEBI chemical classes. This ontology is then used to classify 200 compounds with an accuracy of 92.7%. We extend these structure-based classes with molecular feature information and demonstrate the utility of our framework for classification of functionally relevant chemicals. Finally, we discuss an iterative approach that we envision for future biochemical ontology development.</p> <p>Conclusions</p> <p>We conclude that the proposed methodology can ease the burden of chemical data annotators and dramatically increase their productivity. We anticipate that the use of formal logic in our proposed framework will make chemical classification criteria more transparent to humans and machines alike and will thus facilitate predictive and integrative bioactivity model development.</p

    Comparative evaluation of atom mapping algorithms for balanced metabolic reactions: application to Recon 3D

    Get PDF
    The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, many algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice

    The Gene Ontology Handbook

    Get PDF
    bioinformatics; biotechnolog

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Information Extraction from Text for Improving Research on Small Molecules and Histone Modifications

    Get PDF
    The cumulative number of publications, in particular in the life sciences, requires efficient methods for the automated extraction of information and semantic information retrieval. The recognition and identification of information-carrying units in text – concept denominations and named entities – relevant to a certain domain is a fundamental step. The focus of this thesis lies on the recognition of chemical entities and the new biological named entity type histone modifications, which are both important in the field of drug discovery. As the emergence of new research fields as well as the discovery and generation of novel entities goes along with the coinage of new terms, the perpetual adaptation of respective named entity recognition approaches to new domains is an important step for information extraction. Two methodologies have been investigated in this concern: the state-of-the-art machine learning method, Conditional Random Fields (CRF), and an approximate string search method based on dictionaries. Recognition methods that rely on dictionaries are strongly dependent on the availability of entity terminology collections as well as on its quality. In the case of chemical entities the terminology is distributed over more than 7 publicly available data sources. The join of entries and accompanied terminology from selected resources enables the generation of a new dictionary comprising chemical named entities. Combined with the automatic processing of respective terminology – the dictionary curation – the recognition performance reached an F1 measure of 0.54. That is an improvement by 29 % in comparison to the raw dictionary. The highest recall was achieved for the class of TRIVIAL-names with 0.79. The recognition and identification of chemical named entities provides a prerequisite for the extraction of related pharmacological relevant information from literature data. Therefore, lexico-syntactic patterns were defined that support the automated extraction of hypernymic phrases comprising pharmacological function terminology related to chemical compounds. It was shown that 29-50 % of the automatically extracted terms can be proposed for novel functional annotation of chemical entities provided by the reference database DrugBank. Furthermore, they are a basis for building up concept hierarchies and ontologies or for extending existing ones. Successively, the pharmacological function and biological activity concepts obtained from text were included into a novel descriptor for chemical compounds. Its successful application for the prediction of pharmacological function of molecules and the extension of chemical classification schemes, such as the the Anatomical Therapeutic Chemical (ATC), is demonstrated. In contrast to chemical entities, no comprehensive terminology resource has been available for histone modifications. Thus, histone modification concept terminology was primary recognized in text via CRFs with a F1 measure of 0.86. Subsequent, linguistic variants of extracted histone modification terms were mapped to standard representations that were organized into a newly assembled histone modification hierarchy. The mapping was accomplished by a novel developed term mapping approach described in the thesis. The combination of term recognition and term variant resolution builds up a new procedure for the assembly of novel terminology collections. It supports the generation of a term list that is applicable in dictionary-based methods. For the recognition of histone modification in text it could be shown that the named entity recognition method based on dictionaries is superior to the used machine learning approach. In conclusion, the present thesis provides techniques which enable an enhanced utilization of textual data, hence, supporting research in epigenomics and drug discovery

    BioSilicoSystems - A Multipronged Approach Towards Analysis and Representation of Biological Data (PhD Thesis)

    Get PDF
    The rising field of integrative bioinformatics provides the vital methods to integrate, manage and also to analyze the diverse data and allows gaining new and deeper insights and a clear understanding of the intricate biological systems. The difficulty is not only to facilitate the study of heterogeneous data within the biological context, but it also more fundamental, how to represent and make the available knowledge accessible. Moreover, adding valuable information and functions that persuade the user to discover the interesting relations hidden within the data is, in itself, a great challenge. Also, the cumulative information can provide greater biological insight than is possible with individual information sources. Furthermore, the rapidly growing number of databases and data types poses the challenge of integrating the heterogeneous data types, especially in biology. This rapid increase in the volume and number of data resources drive for providing polymorphic views of the same data and often overlap in multiple resources. &#xd;&#xa;&#xd;&#xa;In this thesis a multi-pronged approach is proposed that deals with various methods for the analysis and representation of the diverse biological data which are present in different data sources. This is an effort to explain and emphasize on different concepts which are developed for the analysis of molecular data and also to explain its biological significance. The hypotheses proposed are in context with various other results and findings published in the past. The approach demonstrated also explains different ways to integrate the molecular data from various sources along with the need for a comprehensive understanding and clear projection of the concept or the algorithm and its results, but with simple means and methods. The multifarious approach proposed in this work comprises of different tools or methods spanning significant areas of bioinformatics research such as data integration, data visualization, biological network construction / reconstruction and alignment of biological pathways. Each tool deals with a unique approach to utilize the molecular data for different areas of biological research and is built based on the kernel of the thesis. Furthermore these methods are combined with graphical representation that make things simple and comprehensible and also helps to understand with ease the underlying biological complexity. Moreover the human eye is often used to and it is more comfortable with the visual representation of the facts

    Numerical and Experimental Investigations on Corrosion and Self-Protection Processes in Reinforced Concrete

    Get PDF
    The chloride induced corrosion of steel in concrete is one of the biggest durability issues affecting structures worldwide. Concrete structures that are installed in marine environment and those exposed frequently to de-icing salts in the winter season, such as bridges and parking structures, are particularly susceptible to corrosion induced damage. In worst cases, the structure is unable to fulfil its entire service life and needs extensive repairs or is decommissioned quite early. Such situations can have a strong impact on society which is dependent on infrastructures for mobility and transportation of essential materials. Moreover, the economic losses are predicted in billions in the coming future and can impact the global economy. In an attempt to increase the service life of concrete structures with respect to chloride durability, Layered Double Hydroxides (LDH) are introduced as chloride ion entrapping additive in concrete. LDH encapsulates chloride ions from the environment which can extend the service life of concrete structures. It can also be tailored to deliver corrosion inhibiting ions which can mitigate the chloride induced damage in concrete. A new concrete mix with LDH was developed in this work for building long lasting infrastructure exposed to chloride ingress in submerged marine zones. Predictive modelling approaches are used to study the corrosion processes and chloride durability of concrete. Multi-ion transport model is used to predict the efficiency of LDH in concrete concerning chloride ingress. Computational results are presented which compare chloride ingress in concrete with and without LDH. Formation factor has been used in this study to determine the microstructure related properties of concrete with and without LDH. Additionally, experimental investigations are presented which report on the stability and chloride binding capacity of LDH in synthetic alkaline solutions, concrete pore solutions, mortars and also in concrete. The compatibility of LDH with cement is also presented. The work highlights that LDH is able to improve the chloride durability of concrete. Furthermore, In-situ investigations are carried out to understand the stability of LDH inside concrete
    corecore