21 research outputs found

    TI2BioP — Topological Indices to BioPolymers. A Graphical– Numerical Approach for Bioinformatics

    Get PDF
    We developed a new graphical–numerical method called TI2BioP (Topological Indices to BioPolymers) to estimate topological indices (TIs) from two-dimensional (2D) graphical approaches for the natural biopolymers DNA, RNA and proteins The methodology mainly turns long biopolymeric sequences into 2D artificial graphs such as Cartesian and four-color maps but also reads other 2D graphs from the thermodynamic folding of DNA/RNA strings inferred from other programs. The topology of such 2D graphs is either encoded by node or adjacency matrixes for the calculation of the spectral moments as TIs. These numerical indices were used to build up alignment-free models to the functional classification of biosequences and to calculate alignment-free distances for phylogenetic purposes. The performance of the method was evaluated in highly diverse gene/protein classes, which represents a challenge for current bioinformatics algorithms. TI2BioP generally outperformed classical bioinformatics algorithms in the functional classification of Bacteriocins, ribonucleases III (RNases III), genomic internal transcribed spacer II (ITS2) and adenylation domains (A-domains) of nonribosomal peptide synthetases (NRPS) allowing the detection of new members in these target gene/protein classes. TI2BioP classification performance was contrasted and supported by predictions with sensitive alignment-based algorithms and experimental outcomes, respectively. The new ITS2 sequence isolated from Petrakia sp. was used in our graphical–numerical approach to estimate alignment-free distances for phylogenetic inferences. Despite TI2BioP having been developed for application in bioinformatics, it can be extended to predict interesting features of other biopolymers than DNA and protein sequences. TI2BioP version 2.0 is freely available from http://ti2biop.sourceforge.net/

    Atom, atom-type, and total linear indices of the "molecular pseudograph's atom adjacency matrix": Application to QSPR/QSAR studies of organic compounds

    Get PDF
    In this paper we describe the application in QSPR/QSAR studies of a new group of molecular descriptors: atom, atom-type and total linear indices of the molecular pseudograph's atom adjacency matrix. These novel molecular descriptors were used for the prediction of boiling point and partition coefficient (log P), specific rate constant (log k), and antibacterial activity of 28 alkyl-alcohols and 34 derivatives of 2-furylethylenes, respectively. For this purpose two quantitative models were obtained to describe the alkyl-alcohols' boiling points. The first one includes only two total linear indices and showed a good behavior from a statistical point of view (R2 = 0.984, s = 3.78, F = 748.57, q2 = 0.981, and scv = 3.91). The second one includes four variables [3 global and 1 local (heteroatom) linear indices] and it showed an improvement in the description of physical property (R 2 = 0.9934, s = 2.48, F = 871.96, q2 = 0.990, and s cv = 2.79). Later, linear multiple regression analysis was also used to describe log P and log k of the 2-furyl-ethylenes derivatives. These models were statistically significant [(R2 = 0.984, s = 0.143, and F = 113.38) and (R2 = 0.973, s = 0.26 and F = 161.22), respectively] and showed very good stability to data variation in leave-one-out (LOO) cross-validation experiment [(q2 = 0.93.8 and scv = 0.178) and (q2 = 0.948 and scv = 0.33), respectively]. Finally, a linear discriminant model for classifying antibacterial activity of these compounds was also achieved with the use of the atom and atom-type linear indices. The global percent of good classification in training and external test set obtained was of 94.12% and 100.0%, respectively. The comparison with other approaches (connectivity indices, total and local spectral moments, quantum chemical descriptors, topographic indices and Estate/biomolecular encounter parameters) reveals a good behavior of our method. The approach described in this paper appears to be a very promising structural invariant, useful for QSPR/QSAR studies and computer-aided "rational" drug design.Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA

    Singlet Oxygen Generation by Porphyrins and Metalloporphyrins Revisited: a Quantitative Structure-property Relationship (QSPR) Study

    Get PDF
    state followed by formation of singlet oxygen (1O2), which is a highly reactive species and mediates various oxidative processes. The design of advanced sensitizers based on porphyrin compounds have attracted significant attention in recent years. However, it is still difficult to predict the efficiency of singlet oxygen generation for a given structure. Our goal was to develop a quantitative structure-property relationship (QSPR) model for the fast virtual screening and prediction of singlet oxygen quantum yields for pophyrins and metalloporphyrins. We performed QSPR analysis of a dataset containing 32 compounds, including various porphyrins and their analogues (chlorins and bacteriochlorins). Quantum-chemical descriptors were calculated using Density Functional Theory (DFT), namely B3LYP and M062X functionals. Three different machine learning methods were used to develop QSPR models: random forest regression (RFR), support vector regression (SVR), and multiple linear regression (MLR). The optimal QSPR model «structure – singlet oxygen generation quantum yield» obtained using RFR method demonstrated high determination coefficient for the training set (R2 = 0.949) and the highest predicting ability for the test set (pred_R2 = 0.875). This proves that the developed QSPR method is realiable and can be directly applied in the studies of singlet oxygen generation both for free base porphyrins and their metal complexes. We believe that QSPR approach developed in this study can be useful for the search of new poprhyrin photosensitizers with enhanced singlet oxygen generation ability

    An Alignment-Free Approach for Eukaryotic ITS2 Annotation and Phylogenetic Inference

    Get PDF
    The ITS2 gene class shows a high sequence divergence among its members that have complicated its annotation and its use for reconstructing phylogenies at a higher taxonomical level (beyond species and genus). Several alignment strategies have been implemented to improve the ITS2 annotation quality and its use for phylogenetic inferences. Although, alignment based methods have been exploited to the top of its complexity to tackle both issues, no alignment-free approaches have been able to successfully address both topics. By contrast, the use of simple alignment-free classifiers, like the topological indices (TIs) containing information about the sequence and structure of ITS2, may reveal to be a useful approach for the gene prediction and for assessing the phylogenetic relationships of the ITS2 class in eukaryotes. Thus, we used the TI2BioP (Topological Indices to BioPolymers) methodology [1], [2], freely available at http://ti2biop.sourceforge.net/ to calculate two different TIs. One class was derived from the ITS2 artificial 2D structures generated from DNA strings and the other from the secondary structure inferred from RNA folding algorithms. Two alignment-free models based on Artificial Neural Networks were developed for the ITS2 class prediction using the two classes of TIs referred above. Both models showed similar performances on the training and the test sets reaching values above 95% in the overall classification. Due to the importance of the ITS2 region for fungi identification, a novel ITS2 genomic sequence was isolated from Petrakia sp. This sequence and the test set were used to comparatively evaluate the conventional classification models based on multiple sequence alignments like Hidden Markov based approaches, revealing the success of our models to identify novel ITS2 members. The isolated sequence was assessed using traditional and alignment-free based techniques applied to phylogenetic inference to complement the taxonomy of the Petrakia sp. fungal isolate

    Nuevas aportaciones al desarrollo de modelos QSAR/QSPR para la predicción de la mutagenicidad de contaminantes ambientales y su interacción con sustancias activas presentes en el medio

    Get PDF
    Se estudió mediante modelos QSAR, la posible mutagenicidad de sustancias presentes en el medio ambiente como los ácidos haloacéticos (derivados de la cloración del agua) y los carbonilos alfa, beta insaturados (sobre todo los empleados como monómeros para la preparación de materiales dentales de restauración) y su posible interacción con la beta ciclodextrina, la cual está presente como excipiente en productos farmacéuticos y como estabilizador de aromas, colorantes y algunas vitaminas en alimentos. Como resultado de este estudio pudimos destacar: -El ácido fluoroiodoacético y difluoroiodoacético podrían ser mutagénicos debido a los valores de potencia mutagénica obtenidos con los modelos desarrollados. Sustancias que podrían encontrarse en aguas fluoradas ricas en ioduro/bromuro. Además es posible que estén presentes en aguas fluoradas ricas en bromuro/ioduro hecho que pondría en duda la necesidad de fluorar el agua potable. - Sustancias comúnmente empleadas como monómeros dentales presentaron predicciones negativas para el ensayo de Ames y un carácter mutagénico para el ensayo con células de mamífero, a excepción del UDMA (Uretil dimetacrilato). - Respecto a la posible interacción de estas sustancias con la beta-ciclodextrina, los ácidos haloacéticos presentan valores de complejación inferiores a los que normalmente presentan fármacos o componentes de los alimentos, por lo que es de esperar que la interacción entre los ácidos haloacéticos y la beta-CD sea de escasa importancia. En cuanto a los monómeros dentales hay que resaltar que sustancias como el TEGDMA, 1,6-ADMA, 1,8-ADMA, GMR, MEPC y 6-HHMA, predichos como mutagénicos, presentan valores de complejación superiores a los que presentan fármacos o componentes de los alimentos. Por lo tanto, estas sustancias podrían desplazar de sus complejos a fármacos o componentes de los alimentos pudiéndose llegar a algún tipo de interacción.Farmaci

    Machine Learning in Discrete Molecular Spaces

    Get PDF
    The past decade has seen an explosion of machine learning in chemistry. Whether it is in property prediction, synthesis, molecular design, or any other subdivision, machine learning seems poised to become an integral, if not a dominant, component of future research efforts. This extraordinary capacity rests on the interac- tion between machine learning models and the underlying chemical data landscape commonly referred to as chemical space. Chemical space has multiple incarnations, but is generally considered the space of all possible molecules. In this sense, it is one example of a molecular set: an arbitrary collection of molecules. This thesis is devoted to precisely these objects, and particularly how they interact with machine learning models. This work is predicated on the idea that by better understanding the relationship between molecular sets and the models trained on them we can improve models, achieve greater interpretability, and further break down the walls between data-driven and human-centric chemistry. The hope is that this enables the full predictive power of machine learning to be leveraged while continuing to build our understanding of chemistry. The first three chapters of this thesis introduce and reviews the necessary machine learning theory, particularly the tools that have been specially designed for chemical problems. This is followed by an extensive literature review in which the contributions of machine learning to multiple facets of chemistry over the last two decades are explored. Chapters 4-7 explore the research conducted throughout this PhD. Here we explore how we can meaningfully describe the properties of an arbitrary set of molecules through information theory; how we can determine the most informative data points in a set of molecules; how graph signal processing can be used to understand the relationship between the chosen molecular representation, the property, and the machine learning model; and finally how this approach can be brought to bear on protein space. Each of these sub-projects briefly explores the necessary mathematical theory before leveraging it to provide approaches that resolve the posed problems. We conclude with a summary of the contributions of this work and outline fruitful avenues for further exploration

    Herramientas informáticas y de inteligencia artificial para el meta-análisis en la frontera entre la bioinformática y las ciencias jurídicas

    Get PDF
    [Resumen] Los modelos computacionales, conocidos por su acrónimo en idioma Inglés como QSPR (Quantitative Structure-Property Relationships) pueden usarse para predecir propiedades de sistemas complejos. Estas predicciones representan una aplicación importante de las Tecnologías de la Información y la Comunicación (TICs). La mayor relevancia es debido a la reducción de costes de medición experimental en términos de tiempo, recursos humanos, recursos materiales, y/o el uso de animales de laboratorio en ciencias biomoleculares, técnicas, sociales y/o jurídicas. Las Redes Neuronales Artificiales (ANNs) son una de las herramientas informáticas más poderosas para buscar modelos QSPR. Para ello, las ANNs pueden usar como variables de entrada (input) parámetros numéricos que cuantifiquen información sobre la estructura del sistema. Los parámetros conocidos como Índices Topológicos (TIs) se encuentran entre los más versátiles. Los TIs se calculan en Teoría de Grafos a partir de la representación de cualquier sistema como una red de nodos interconectados; desde moléculas a redes biológicas, tecnológicas, y sociales. Esta tesis tiene como primer objetivo realizar una revisión y/o introducir nuevos TIs y software de cálculo de TIs útiles como inputs de ANNs para el desarrollo de modelos QSPR de redes bio-moleculares, biológicas, tecnológico-económicas y socio-jurídicas. En ellas, por una parte, los nodos representan biomoléculas, organismos, poblaciones, leyes tributarias o concausas de delitos. Por otra parte, en la interacción TICs-Ciencias Biomoleculares- Derecho se hace necesario un marco de seguridad jurídica que permita el adecuado desarrollo de las TICs y sus aplicaciones en Ciencias Biomoleculares. Por eso, el segundo objetivo de esta tesis es revisar el marco jurídico-legal de protección de los modelos QSAR/QSPR de sistemas moleculares. El presente trabajo de investigación pretende demostrar la utilidad de estos modelos para predecir características y propiedades de estos sistemas complejos.[Resumo] Os modelos de ordenador coñecidos pola súas iniciais en inglés QSPR (Quantitative Structure-Property Relationships) poden prever as propiedades de sistemas complexos e reducir os custos experimentais en termos de tempo, recursos humanos, materiais e/ou o uso de animais de laboratorio nas ciencias biomoleculares, técnicas, e sociais. As Redes Neurais Artificiais (ANNs) son unha das ferramentas máis poderosas para buscar modelos QSPR. Para iso, as ANNs poden facer uso, coma variables de entrada (input), dos parámetros numéricos da estrutura do sistema chamados Índices Topolóxicos (TIs). Os TI calcúlanse na teoría dos grafos a partir da representación do sistema coma unha rede de nós conectados, incluíndo tanto moléculas coma redes sociais e tecnolóxicas. Esta tese ten como obxectivo principal revisar e/ou desenvolver novos TIs, programas de cálculo de TIs, e/ou modelos QSPR facendo uso de ANNs para predicir redes bio-moleculares, biolóxicas, económicas, e sociais ou xurídicas onde os nós representan moléculas biolóxicas, organismos, poboacións, ou as leis fiscais ou as concausas dun delito. Ademais, a interacción das TIC con as ciencias biolóxicas e xurídicas necesita dun marco de seguridade xurídica que permita o bo desenvolvemento das TIC e as súas aplicacións en Ciencias Biomoleculares. Polo tanto, o segundo obxectivo desta tese é analizar o marco xurídico e legal de protección dos modelos QSPR. O presente traballo de investigación pretende demostrar a utilidade destes modelos para predicir características e propiedades destes sistemas complexos.[Abstract] QSPR (Quantitative Structure-Property Relationships) computer models can predict properties of complex systems reducing experimental costs in terms of time, human resources, material resources, and/or the use of laboratory animals in bio-molecular, technical, and/or social sciences. Artificial Neural Networks (ANNs) are one of the most powerful tools to search QSPR models. For this, the ANNs may use as input variables numerical parameters of the system structure called Topological Indices (TIs). The TIs are calculated in Graph Theory from a representation of any system as a network of interconnected nodes, including molecules or social and technological networks. The first aim of this thesis is to review and/or develop new TIs, TIs calculation software, and QSPR models using ANNs to predict bio-molecular, biological, commercial, social, and legal networks where nodes represent bio-molecules, organisms, populations, products, tax laws, or criminal causes. Moreover, the interaction of ICTs with Biomolecular and law Sciences needs a legal security framework that allows the proper development of ICTs and their applications in Biomolecular Sciences. Therefore, the second objective of this thesis is to review the legal framework and legal protection of QSPR techniques. The present work of investigation tries to demonstrate the usefulness of these models to predict characteristics and properties of these complex systems
    corecore