43 research outputs found
Resolución de problemas de optimización combinatoria utilizando técnicas de computación evolutiva: una aplicación a la biomedicina
[Resumen] Cada dÃa se genera una mayor cantidad de datos, tanto con respecto a su volumen como
por el número de variables que involucran, lo cual representa un problema para las técnicas
tradicionales. En muchos problemas el conjunto de soluciones posibles es tan elevado que
la localización de una solución óptima es imposible en un tiempo razonable, por lo que es
necesario emplear técnicas basadas en heurÃsticas. Se ha observado que las técnicas de
computación evolutiva (CE) proporcionan resultados satisfactorios en situaciones en que
técnicas tradicionales no los obtuvieron, en especial en su aplicación a datos biomédicos y
relacionados con el diagnóstico de enfermedades.
AsÃ, en este trabajo se ha desarrollado un modelo basado en CE capaz de, a partir de unos
datos de entrada etiquetados como sujetos sanos o enfermos, extraer expresiones con las
que construir un modelo de clasificación. Este modelo ha sido validado tanto contra datos
sintéticos como aplicado a un conjunto de datos clÃnicos reales, además de comparar sus
resultados con métodos similares. Es de destacar que el modelo propuesto obtiene
expresiones sencillas y que logra clasificar ambos tipos de conjuntos mejor que el resto de
técnicas, resultando de gran utilidad como apoyo al diagnóstico clÃnico.[Resumo] Cada dÃa xérase unha maior cantidade de datos, tanto con respecto ao seu volume como
polo número de variables que involucran, o cal representa un problema para as técnicas
tradicionais. En moitos problemas o conxunto de solucións posibles é tan elevado que a
localización dunha solución óptima é imposible nun tempo razoable, polo que é necesario
empregar técnicas baseadas en heurÃsticas. Observouse que as técnicas de computación
evolutiva (CE) proporcionan resultados satisfactorios en situacións en que técnicas
tradicionais non os obtiveron, en especial na súa aplicación a datos biomédicos e
relacionados co diagnóstico de enfermidades.
AsÃ, neste traballo desenvolveuse un modelo baseado en CE capaz de, a partir duns datos
de entrada etiquetados como suxeitos sans ou enfermos, extraer expresións coas que
construÃr un modelo de clasificación. Este modelo foi validado tanto contra datos sintéticos
como aplicado a un conxunto de datos clÃnicos reais, ademais de comparar os seus
resultados con métodos similares. Compre destacar que o modelo proposto obtén
expresións sinxelas e que logra clasificar ambos tipos de conxuntos mellor co resto de
técnicas, resultando de gran utilidade como apoio ó diagnóstico clÃnico.[Abstract] Every day more data are being generated. Not only the volume of data increases, but also
the number of variables does. This represents an issue for traditional techniques.
Furthermore, many problems involve such a large set of possible solutions that finding the
optimal solution in a reasonable amount of time is not feasible. Thus, using techniques
based on heuristics becomes necessary. Evolutionary Computation (EC) has provided
good results in situations in which traditional techniques did not, especially when applied to
biomedical data and disease diagnosis.
Therefore, in this work, a model based on EC has been developed. This model, based on
an input set with data that belong to healthy or diseased subjects, is capable of extracting
expressions in order to build a classification model. The model proposed in this thesis has
been validated on generated data, as well as applied to real clinical data, comparing the
results obtained with those of other similar techniques. It is worth pointing out that the
model presented extracts simple expressions and performs better when classifying both
types of data sets than other existing techniques. As a result, the model presented is
expected to be very useful for clinical diagnostic support
ATria: a novel centrality algorithm applied to biological networks
Background The notion of centrality is used to identify ?important? nodes in social networks. Importance of nodes is not well-defined, and many different notions exist in the literature. The challenge of defining centrality in meaningful ways when network edges can be positively or negatively weighted has not been adequately addressed in the literature. Existing centrality algorithms also have a second shortcoming, i.e., the list of the most central nodes are often clustered in a specific region of the network and are not well represented across the network. Methods We address both by proposing Ablatio Triadum (ATria), an iterative centrality algorithm that uses the concept of ?payoffs? from economic theory. Results We compare our algorithm with other known centrality algorithms and demonstrate how ATria overcomes several of their shortcomings. We demonstrate the applicability of our algorithm to synthetic networks as well as biological networks including bacterial co-occurrence networks, sometimes referred to as microbial social networks. Conclusions We show evidence that ATria identifies three different kinds of ?important? nodes in microbial social networks with different potential roles in the community
Exploring Patterns of Epigenetic Information With Data Mining Techniques
[Abstract] Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.Programa Iberoamericano de Ciencia y TecnologÃa para el Desarrollo; 209RT-0366Galicia. ConsellerÃa de EconomÃa e Industria; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/000
Using Genetic Algorithms for Automatic Recurrent ANN Development: an Application to EEG Signal Classification
[Abstract] ANNs are one of the most successful learning systems. For this
reason, many techniques have been
published that allow the obtaining of
feed-forward networks. However, fe
w works describe techniques for
developing recurrent networks. This work uses a genetic algorithm for
automatic recurrent ANN devel
opment. This system has been applied to solve a
well-known problem: classi
fication of EEG signals
from epileptic patients.
Results show the high performance of this
system, and its ability to develop
simple networks, with a low number of neurons and connections.Red Gallega de Investigación sobre Cáncer Colorrectal; ref. 2009/58Programa Ibeoramericano de Ciencia y TecnologÃa para el Desarrollo; 209RT0366Ministerio de Industria, Turismo y Comercio; TSI-020110-2009-53Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; PIO52048Instituto de Salud Carlos III; RD07/0067/000
Random Forest Classification Based on Star Graph Topological Indices for Antioxidant Proteins
[Abstract] Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins that may influence the aging process. However, testing every single protein in order to identify its properties is quite expensive and inefficient. For this reason, this work proposes a model, in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity. The graph obtained as a representation will help us describe the complex system by using topological indices. More specifically, in this work, Randić’s Star Networks have been used as well as the associated indices, calculated with the S2SNet tool. In order to simulate the existing proportion of antioxidant proteins in nature, a dataset containing 1999 proteins, of which 324 are antioxidant proteins, was created. Using this data as input, Star Graph Topological Indices were calculated with the S2SNet tool. These indices were then used as input to several classification techniques. Among the techniques utilised, the Random Forest has shown the best performance, achieving a score of 94% correctly classified instances. Although the target class (antioxidant proteins) represents a tiny subset inside the dataset, the proposed model is able to achieve a percentage of 81.8% correctly classified instances for this class, with a precision of 81.3%.Galicia. ConsellerÃa de EconomÃa e Industria; 10SIN105004PRGalicia. ConsellerÃa de EconomÃa e Industria; O9SIN010105PRMinisterio de EconomÃa y Competitividad; TIN-2009-0770
Applied Computational Techniques on Schizophrenia Using Genetic Mutations
[Abstract] Schizophrenia is a complex disease, with both genetic and environmental influence. Machine learning techniques can be used to associate different genetic variations at different genes with a (schizophrenic or non-schizophrenic) phenotype. Several machine learning techniques were applied to schizophrenia data to obtain the results presented in this study. Considering these data, Quantitative Genotype – Disease Relationships (QDGRs) can be used for disease prediction. One of the best machine learning-based models obtained after this exhaustive comparative study was implemented online; this model is an artificial neural network (ANN). Thus, the tool offers the possibility to introduce Single Nucleotide Polymorphism (SNP) sequences in order to classify a patient with schizophrenia. Besides this comparative study, a method for variable selection, based on ANNs and evolutionary computation (EC), is also presented. This method uses half the number of variables as the original ANN and the variables obtained are among those found in other publications. In the future, QDGR models based on nucleic acid information could be expanded to other diseases.Programa Iberoamericano de Ciencia y TecnologÃa para el Desarrollo; 209RT-0366Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/0005Xunta de Galicia; Ref. 2009/5
Metagenomics, Metatranscriptomics, and Metabolomics Approaches for Microbiome Analysis
Microbiomes are ubiquitous and are found in the ocean, the soil, and in/on other living organisms. Changes in the microbiome can impact the health of the environmental niche in which they reside. In order to learn more about these communities, different approaches based on data from mul-tiple omics have been pursued. Metagenomics produces a taxonomical profile of the sample, metatranscriptomics helps us to obtain a functional profile, and metabolomics completes the picture by determining which byproducts are being released into the environment. Although each approach provides valuable information separately, we show that, when combined, they paint a more comprehensive picture. We conclude with a review of network-based approaches as applied to integrative studies, which we believe holds the key to in-depth understanding of microbiomes
Predicting symptom severity and contagiousness of respiratory viral infections
This work aims at predicting the symptom severity and contagiousness of a person infected with respiratory virus, using time series gene expression data. Four different respiratory viruses were studied – RSV, H1N1, H3N2 and Rhinovirus. Predictive models were built for each virus for each time point. Partial least squares discriminant analysis was used for feature selection and random forest was used for classification. Certain genes were identified as biomarkers in distinguishing the subjects. Gene enrichment analysis was performed on the differentially expressed genes. Prediction accuracy values were high even when expression data from early time points were analyzed. Significant genes were detected as early as 5 and 10 hours post infection, as compared to prior work that did so at 29 hours post infection. The potential biomarkers obtained with the proposed approach need to be investigated further
SNP locator: a candidate SNP selection tool
[Abstract] In this work, a data integration approach using a federated model based on a service oriented architecture (SOA) is presented. The BioMOBY middleware was used to implement each service which is part of the integration process. As an example of usage of this architecture, a web tool for candidate SNP selection has been developed. Thus, several BioMOBY services have been created as the model layer of the web application. Each data source has a wrapper which communicates with the federated model, that is, the BioMOBY model, and this model is the one that interacts with the client.Red Gallega de Investigación sobre Cáncer Colorrectal; Ref. 2009/58Programa Iberoamericano de Ciencia y TecnologÃa para el Desarrollo; 209RT-0366Instituto de Salud Carlos III; PIO52048Instituto de Salud Carlos III; RD07/0067/0005Galicia. ConsellerÃa de EconomÃa e Industria ; 10SIN105004PRMinisterio de Industria, Turismo y Comercio; TSI-020110-2009-5
Evolutionary Computation and QSAR Research
[Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. ConsellerÃa de EconomÃa e Industria; 10SIN105004P