5,412 research outputs found

    Review of QSAR Models and Software Tools for predicting Biokinetic Properties

    Get PDF
    In the assessment of industrial chemicals, cosmetic ingredients, and active substances in pesticides and biocides, metabolites and degradates are rarely tested for their toxicologcal effects in mammals. In the interests of animal welfare and cost-effectiveness, alternatives to animal testing are needed in the evaluation of these types of chemicals. In this report we review the current status of various types of in silico estimation methods for Absorption, Distribution, Metabolism and Excretion (ADME) properties, which are often important in discriminating between the toxicological profiles of parent compounds and their metabolites/degradation products. The review was performed in a broad sense, with emphasis on QSARs and rule-based approaches and their applicability to estimation of oral bioavailability, human intestinal absorption, blood-brain barrier penetration, plasma protein binding, metabolism and. This revealed a vast and rapidly growing literature and a range of software tools. While it is difficult to give firm conclusions on the applicability of such tools, it is clear that many have been developed with pharmaceutical applications in mind, and as such may not be applicable to other types of chemicals (this would require further research investigation). On the other hand, a range of predictive methodologies have been explored and found promising, so there is merit in pursuing their applicability in the assessment of other types of chemicals and products. Many of the software tools are not transparent in terms of their predictive algorithms or underlying datasets. However, the literature identifies a set of commonly used descriptors that have been found useful in ADME prediction, so further research and model development activities could be based on such studies.JRC.DG.I.6-Systems toxicolog

    A hybrid algorithm for Bayesian network structure learning with application to multi-label learning

    Get PDF
    We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author

    In-silico-Systemanalyse von Biopathways

    Get PDF
    Chen M. In silico systems analysis of biopathways. Bielefeld (Germany): Bielefeld University; 2004.In the past decade with the advent of high-throughput technologies, biology has migrated from a descriptive science to a predictive one. A vast amount of information on the metabolism have been produced; a number of specific genetic/metabolic databases and computational systems have been developed, which makes it possible for biologists to perform in silico analysis of metabolism. With experimental data from laboratory, biologists wish to systematically conduct their analysis with an easy-to-use computational system. One major task is to implement molecular information systems that will allow to integrate different molecular database systems, and to design analysis tools (e.g. simulators of complex metabolic reactions). Three key problems are involved: 1) Modeling and simulation of biological processes; 2) Reconstruction of metabolic pathways, leading to predictions about the integrated function of the network; and 3) Comparison of metabolism, providing an important way to reveal the functional relationship between a set of metabolic pathways. This dissertation addresses these problems of in silico systems analysis of biopathways. We developed a software system to integrate the access to different databases, and exploited the Petri net methodology to model and simulate metabolic networks in cells. It develops a computer modeling and simulation technique based on Petri net methodology; investigates metabolic networks at a system level; proposes a markup language for biological data interchange among diverse biological simulators and Petri net tools; establishes a web-based information retrieval system for metabolic pathway prediction; presents an algorithm for metabolic pathway alignment; recommends a nomenclature of cellular signal transduction; and attempts to standardize the representation of biological pathways. Hybrid Petri net methodology is exploited to model metabolic networks. Kinetic modeling strategy and Petri net modeling algorithm are applied to perform the processes of elements functioning and model analysis. The proposed methodology can be used for all other metabolic networks or the virtual cell metabolism. Moreover, perspectives of Petri net modeling and simulation of metabolic networks are outlined. A proposal for the Biology Petri Net Markup Language (BioPNML) is presented. The concepts and terminology of the interchange format, as well as its syntax (which is based on XML) are introduced. BioPNML is designed to provide a starting point for the development of a standard interchange format for Bioinformatics and Petri nets. The language makes it possible to exchange biology Petri net diagrams between all supported hardware platforms and versions. It is also designed to associate Petri net models and other known metabolic simulators. A web-based metabolic information retrieval system, PathAligner, is developed in order to predict metabolic pathways from rudimentary elements of pathways. It extracts metabolic information from biological databases via the Internet, and builds metabolic pathways with data sources of genes, sequences, enzymes, metabolites, etc. The system also provides a navigation platform to investigate metabolic related information, and transforms the output data into XML files for further modeling and simulation of the reconstructed pathway. An alignment algorithm to compare the similarity between metabolic pathways is presented. A new definition of the metabolic pathway is proposed. The pathway defined as a linear event sequence is practical for our alignment algorithm. The algorithm is based on strip scoring the similarity of 4-hierarchical EC numbers involved in the pathways. The algorithm described has been implemented and is in current use in the context of the PathAligner system. Furthermore, new methods for the classification and nomenclature of cellular signal transductions are recommended. For each type of characterized signal transduction, a unique ST number is provided. The Signal Transduction Classification Database (STCDB), based on the proposed classification and nomenclature, has been established. By merging the ST numbers with EC numbers, alignments of biopathways are possible. Finally, a detailed model of urea cycle that includes gene regulatory networks, metabolic pathways and signal transduction is demonstrated by using our approaches. A system biological interpretation of the observed behavior of the urea cycle and its related transcriptomics information is proposed to provide new insights for metabolic engineering and medical care

    Structural and semantic similarity metrics for chemical compound classification

    Get PDF
    Tese de mestrado, Bioquímica, Universidade de Lisboa, Faculdade de Ciências, 2010Ao longo das últimas décadas, tem-se assistido a um grande aumento na quantidade de dados produzidos e disponibilizados em química, em especial após a introdução de métodos de análise mecanizados. Devido a este crescimento no número de dados, existe cada vez mais uma necessidade de implementar sistemas automáticos computacionais capazes de armazenar, estudar e interpretar estes dados de forma eficiente. Uma das tarefas mais importantes em quimio-informática é, de facto, a utilização dos dados obtidos em laboratório em sistemas de comparação e classificação de compostos químicos. Os métodos actuais mais eficazes baseiam-se na premissa de que a função de um composto químico está intimamente relacionada com a sua estrutura. Apesar de esta premissa estar geralmente correcta, como comprovam os métodos actuais, eles podem falhar, especialmente quando moléculas parecidas desempenham funções diferentes (como acontece com os l- e d-aminoácidos) ou moléculas diferentes desempenham uma função biológica semelhante (como acontece com inúmeros exemplos de inibidores). O trabalho proposto neste documento apresenta uma solução para resolver este problema através da utilização de uma métrica híbrida que integre no seu núcleo informação não só estrutural mas também semântica, ou seja, o sistema desenvolvido tem a capacidade de explorar a informação acerca do significado das moléculas num contexto bioquímico. Para este efeito, utilizei o ChEBI como fonte de informação semântica, tendo criado uma ferramenta denominada Chym (Chemical Hybrid Metric) que é capaz de lidar com problemas de classificação de compostos químicos. Resumidamente, para decidir se um composto químico possui uma determinada característica, por exemplo se atravessa a barreira hematoencefálica, este sistema atribui ao composto um coeficiente de actividade que é calculado com base nos compostos químicos que se sabe possuírem a característica; por comparação com um valor de corte, o Chym classifica o composto em estudo como possuidor ou não dessa característica. A ferramenta que resultou do trabalho desta tese foi aqui explorada e validada. Assim, o trabalho apresentado mostra evidências substanciais que suportam a eficácia do Chym, uma vez que este apresenta melhores resultados do que todos os modelos com os quais foi comparado. Particularmente, para três problemas seleccionados, o Chym decide correctamente qual a classificação de um composto 90.9%, 87.7% e 84.2% das vezes: pela ordem apresentada, esses valores referem-se à classificação de compostos como permeáveis à barreira hematoencefálica, como substratos da glicoproteína-P, ou como ligandos de um receptor de estrogénio. Para efeitos de comparação, estes três problemas foram anteriormente resolvidos com exactidão de 81.5%, 80.6% e 82.8% respectivamente. Comprova-se, portanto, a hipótese da tese, ou seja, que a integração de informação semântica em sistemas de comparação e classificação de compostos químicos aumenta, por vezes de forma substancial, a fidelidade do método. Desta forma, o objectivo da tese foi bem sucedido em duas frentes. Por um lado a tese serviu para validar a hipótese, e por outro culminou na criação de uma ferramenta de classificação de compostos químicos que pode vir a ser usada no futuro em projectos mais abrangentes, nomeadamente no estudo da evolução das vias metabólicas, na área de desenvolvimento de fármacos ou na análise preliminar da toxicidade de compostos químicos.Over the last few decades, there has been an increasing number of attempts at creating systems capable of comparing and classifying chemical compounds based on their structure and/or physicochemical properties. While the rate of success of these approaches has been increasing, particularly with the introduction of new and ever more sophisticated methods of machine learning, there is still room for improvement. One of the problems of these methods is that they fail to consider that similar molecules may have different roles in nature, or, to a lesser extend, that disparate molecules may have similar roles. This thesis proposes the exploitation of the semantic properties of chemical compounds, as described in the ChEBI ontology, to create an efficient system able to automatically deal with the binary classification of chemical compounds. To that effect, I developed Chym (Chemical Hybrid Metric) as a tool that integrates structural and semantic information in a unique hybrid metric. The work here presented shows substantial evidence supporting the effectiveness of Chym, since it has outperformed all the models with which it was compared. Particularly, it achieved accuracy values of 90.9%, 87.7% and 84.2% when solving three classification problems which, previously, had only been solved with accuracy values of 81.5%, 80.6% and 82.8% respectively. Other results show that the tool is appropriate to use even if the problem at hand is not well represented in the ChEBI ontology. Thus, Chym shows that considering the semantic properties of a compound helps solving classification problems. Therefore, Chym can be used in projects that require the classification and/or the comparison of chemical compounds, such as the study of the evolution of metabolic pathways, drug discovery or in preliminary toxicity analysis

    Interpretability-oriented data-driven modelling of bladder cancer via computational intelligence

    Get PDF

    Improving the hierarchical classification of protein functions With swarm intelligence

    Get PDF
    This thesis investigates methods to improve the performance of hierarchical classification. In terms of this thesis hierarchical classification is a form of supervised learning, where the classes in a data set are arranged in a tree structure. As a base for our new methods we use the TDDC (top-down divide-and-conquer) approach for hierarchical classification, where each classifier is built only to discriminate between sibling classes. Firstly, we propose a swarm intelligence technique which varies the types of classifiers used at each divide within the TDDC tree. Our technique, PSO/ACO-CS (Particle Swarm Optimisation/Ant Colony Optimisation Classifier Selection), finds combinations of classifiers to be used in the TDDC tree using the global search ability of PSO/ACO. Secondly, we propose a technique that attempts to mitigate a major drawback of the TDDC approach. The drawback is that if at any point in the TDDC tree an example is misclassified it can never be correctly classified further down the TDDC tree. Our approach, PSO/ACO-RO (PSO/ACO-Recovery Optimisation) decides whether to redirect examples at a given classifier node using, again, the global search ability of PSO/ACO. Thirdly, we propose an ensemble based technique, HEHRS (Hierarchical Ensembles of Hierarchical Rule Sets), which attempts to boost the accuracy at each classifier node in the TDDC tree by using information from classifiers (rule sets) in the rest of that tree. We use Particle Swarm Optimisation to weight the individual rules within each ensemble. We evaluate these three new methods in hierarchical bioinformatics datasets that we have created for this research. These data sets represent the real world problem of protein function prediction. We find through extensive experimentation that the three proposed methods improve upon the baseline TDDC method to varying degrees. Overall the HEHRS and PSO/ACO- CS-RO approaches are most effective, although they are associated with a higher computational cost

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Prevalent Polymorphism in Thyroid Hormone-Activating Enzyme Leaves a Genetic Fingerprint that Underlies Associated Clinical Syndromes

    Get PDF
    Context: A common polymorphism in the gene encoding the activating deiodinase (Thr92Ala-D2) is known to be associated with quality of life in millions of patients with hypothyroidism and with several organ-specific conditions. This polymorphism results in a single amino acid change within the D2 molecule where its susceptibility to ubiquitination and proteasomal degradation is regulated. Objective: To define the molecular mechanisms underlying associated conditions in carriers of the Thr92Ala-D2 polymorphism. Design, Setting, Patients: Microarray analyses of nineteen postmortem human cerebral cortex samples were performed to establish a foundation for molecular studies via a cell model of HEK-293 cells stably expressing Thr92 or Ala92 D2. Results: The cerebral cortex of Thr92Ala-D2 carriers exhibits a transcriptional fingerprint that includes sets of genes involved in CNS diseases, ubiquitin, mitochondrial dysfunction (chromosomal genes encoding mitochondrial proteins), inflammation, apoptosis, DNA repair and growth factor signaling. Similar findings were made in Ala92-D2-expressing HEK-293 cells and in both cases there was no evidence that thyroid hormone signaling was affected, i.e. the expression level of T3-responsive genes was unchanged, but that several other genes were differentially regulated. The combined microarray analyses (brain/cells) led to the development of an 81-gene classifier that correctly predicts the genotype of homozygous brain samples. In contrast to Thr92-D2, Ala92-D2 exhibits longer half-life and was consistently found in the Golgi. A number of Golgi-related genes were down-regulated in Ala92-D2-expressing cells but were normalized after 24h-treatment with the antioxidant N-acetylecysteine. Conclusions: Ala92-D2 accumulates in the Golgi, where its presence and/or ensuing oxidative stress disrupts basic cellular functions and increases pre-apoptosis. These findings are reminiscent to disease mechanisms observed in other neurodegenerative disorders such as Huntington's disease, and could contribute to the unresolved neurocognitive symptoms of affected carriers
    • …
    corecore