37 research outputs found
Resolución de problemas de optimización combinatoria utilizando técnicas de computación evolutiva: una aplicación a la biomedicina
[Resumen] Cada dÃa se genera una mayor cantidad de datos, tanto con respecto a su volumen como
por el número de variables que involucran, lo cual representa un problema para las técnicas
tradicionales. En muchos problemas el conjunto de soluciones posibles es tan elevado que
la localización de una solución óptima es imposible en un tiempo razonable, por lo que es
necesario emplear técnicas basadas en heurÃsticas. Se ha observado que las técnicas de
computación evolutiva (CE) proporcionan resultados satisfactorios en situaciones en que
técnicas tradicionales no los obtuvieron, en especial en su aplicación a datos biomédicos y
relacionados con el diagnóstico de enfermedades.
AsÃ, en este trabajo se ha desarrollado un modelo basado en CE capaz de, a partir de unos
datos de entrada etiquetados como sujetos sanos o enfermos, extraer expresiones con las
que construir un modelo de clasificación. Este modelo ha sido validado tanto contra datos
sintéticos como aplicado a un conjunto de datos clÃnicos reales, además de comparar sus
resultados con métodos similares. Es de destacar que el modelo propuesto obtiene
expresiones sencillas y que logra clasificar ambos tipos de conjuntos mejor que el resto de
técnicas, resultando de gran utilidad como apoyo al diagnóstico clÃnico.[Resumo] Cada dÃa xérase unha maior cantidade de datos, tanto con respecto ao seu volume como
polo número de variables que involucran, o cal representa un problema para as técnicas
tradicionais. En moitos problemas o conxunto de solucións posibles é tan elevado que a
localización dunha solución óptima é imposible nun tempo razoable, polo que é necesario
empregar técnicas baseadas en heurÃsticas. Observouse que as técnicas de computación
evolutiva (CE) proporcionan resultados satisfactorios en situacións en que técnicas
tradicionais non os obtiveron, en especial na súa aplicación a datos biomédicos e
relacionados co diagnóstico de enfermidades.
AsÃ, neste traballo desenvolveuse un modelo baseado en CE capaz de, a partir duns datos
de entrada etiquetados como suxeitos sans ou enfermos, extraer expresións coas que
construÃr un modelo de clasificación. Este modelo foi validado tanto contra datos sintéticos
como aplicado a un conxunto de datos clÃnicos reais, ademais de comparar os seus
resultados con métodos similares. Compre destacar que o modelo proposto obtén
expresións sinxelas e que logra clasificar ambos tipos de conxuntos mellor co resto de
técnicas, resultando de gran utilidade como apoio ó diagnóstico clÃnico.[Abstract] Every day more data are being generated. Not only the volume of data increases, but also
the number of variables does. This represents an issue for traditional techniques.
Furthermore, many problems involve such a large set of possible solutions that finding the
optimal solution in a reasonable amount of time is not feasible. Thus, using techniques
based on heuristics becomes necessary. Evolutionary Computation (EC) has provided
good results in situations in which traditional techniques did not, especially when applied to
biomedical data and disease diagnosis.
Therefore, in this work, a model based on EC has been developed. This model, based on
an input set with data that belong to healthy or diseased subjects, is capable of extracting
expressions in order to build a classification model. The model proposed in this thesis has
been validated on generated data, as well as applied to real clinical data, comparing the
results obtained with those of other similar techniques. It is worth pointing out that the
model presented extracts simple expressions and performs better when classifying both
types of data sets than other existing techniques. As a result, the model presented is
expected to be very useful for clinical diagnostic support
ATria: a novel centrality algorithm applied to biological networks
Background The notion of centrality is used to identify ?important? nodes in social networks. Importance of nodes is not well-defined, and many different notions exist in the literature. The challenge of defining centrality in meaningful ways when network edges can be positively or negatively weighted has not been adequately addressed in the literature. Existing centrality algorithms also have a second shortcoming, i.e., the list of the most central nodes are often clustered in a specific region of the network and are not well represented across the network. Methods We address both by proposing Ablatio Triadum (ATria), an iterative centrality algorithm that uses the concept of ?payoffs? from economic theory. Results We compare our algorithm with other known centrality algorithms and demonstrate how ATria overcomes several of their shortcomings. We demonstrate the applicability of our algorithm to synthetic networks as well as biological networks including bacterial co-occurrence networks, sometimes referred to as microbial social networks. Conclusions We show evidence that ATria identifies three different kinds of ?important? nodes in microbial social networks with different potential roles in the community
Exploring Patterns of Epigenetic Information With Data Mining Techniques
[Abstract] Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.Programa Iberoamericano de Ciencia y TecnologÃa para el Desarrollo; 209RT-0366Galicia. ConsellerÃa de EconomÃa e Industria; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/000
Using Genetic Algorithms for Automatic Recurrent ANN Development: an Application to EEG Signal Classification
[Abstract] ANNs are one of the most successful learning systems. For this
reason, many techniques have been
published that allow the obtaining of
feed-forward networks. However, fe
w works describe techniques for
developing recurrent networks. This work uses a genetic algorithm for
automatic recurrent ANN devel
opment. This system has been applied to solve a
well-known problem: classi
fication of EEG signals
from epileptic patients.
Results show the high performance of this
system, and its ability to develop
simple networks, with a low number of neurons and connections.Red Gallega de Investigación sobre Cáncer Colorrectal; ref. 2009/58Programa Ibeoramericano de Ciencia y TecnologÃa para el Desarrollo; 209RT0366Ministerio de Industria, Turismo y Comercio; TSI-020110-2009-53Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; PIO52048Instituto de Salud Carlos III; RD07/0067/000
Random Forest Classification Based on Star Graph Topological Indices for Antioxidant Proteins
[Abstract] Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins that may influence the aging process. However, testing every single protein in order to identify its properties is quite expensive and inefficient. For this reason, this work proposes a model, in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity. The graph obtained as a representation will help us describe the complex system by using topological indices. More specifically, in this work, Randić’s Star Networks have been used as well as the associated indices, calculated with the S2SNet tool. In order to simulate the existing proportion of antioxidant proteins in nature, a dataset containing 1999 proteins, of which 324 are antioxidant proteins, was created. Using this data as input, Star Graph Topological Indices were calculated with the S2SNet tool. These indices were then used as input to several classification techniques. Among the techniques utilised, the Random Forest has shown the best performance, achieving a score of 94% correctly classified instances. Although the target class (antioxidant proteins) represents a tiny subset inside the dataset, the proposed model is able to achieve a percentage of 81.8% correctly classified instances for this class, with a precision of 81.3%.Galicia. ConsellerÃa de EconomÃa e Industria; 10SIN105004PRGalicia. ConsellerÃa de EconomÃa e Industria; O9SIN010105PRMinisterio de EconomÃa y Competitividad; TIN-2009-0770
Electronic Health Records Exploitation Using Artificial Intelligence Techniques
[Abstract] The exploitation of electronic health records (EHRs) has multiple utilities, from predictive
tasks and clinical decision support to pattern recognition. Artificial Intelligence (AI) allows to extract
knowledge from EHR data in a practical way. In this study, we aim to construct a Machine Learning
model from EHR data to make predictions about patients. Specifically, we will focus our analysis on
patients suffering from respiratory problems. Then, we will try to predict whether those patients will
have a relapse in less than 6, 12 or 18 months. The main objective is to identify the characteristics that
seem to increase the relapse risk. At the same time, we propose an exploratory analysis in search
of hidden patterns among data. These patterns will help us to classify patients according to their
specific conditions for some clinical variables.Centro de Investigación de Galicia CITIC is funded by ConsellerÃa de Educación, Universidades e Formación Profesional from Xunta de Galicia and European Union (European Regional Development Fund—FEDER Galicia 2014-2020 Program) by grant ED431G 2019/01. Partially supported by the Spanish Ministry of Science (Challenges of Society 2019) PID2019-104323RB-C33Xunta de Galicia; ED431G 2019/0
Applied Computational Techniques on Schizophrenia Using Genetic Mutations
[Abstract] Schizophrenia is a complex disease, with both genetic and environmental influence. Machine learning techniques can be used to associate different genetic variations at different genes with a (schizophrenic or non-schizophrenic) phenotype. Several machine learning techniques were applied to schizophrenia data to obtain the results presented in this study. Considering these data, Quantitative Genotype – Disease Relationships (QDGRs) can be used for disease prediction. One of the best machine learning-based models obtained after this exhaustive comparative study was implemented online; this model is an artificial neural network (ANN). Thus, the tool offers the possibility to introduce Single Nucleotide Polymorphism (SNP) sequences in order to classify a patient with schizophrenia. Besides this comparative study, a method for variable selection, based on ANNs and evolutionary computation (EC), is also presented. This method uses half the number of variables as the original ANN and the variables obtained are among those found in other publications. In the future, QDGR models based on nucleic acid information could be expanded to other diseases.Programa Iberoamericano de Ciencia y TecnologÃa para el Desarrollo; 209RT-0366Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/0005Xunta de Galicia; Ref. 2009/5
Metagenomics, Metatranscriptomics, and Metabolomics Approaches for Microbiome Analysis
Microbiomes are ubiquitous and are found in the ocean, the soil, and in/on other living organisms. Changes in the microbiome can impact the health of the environmental niche in which they reside. In order to learn more about these communities, different approaches based on data from mul-tiple omics have been pursued. Metagenomics produces a taxonomical profile of the sample, metatranscriptomics helps us to obtain a functional profile, and metabolomics completes the picture by determining which byproducts are being released into the environment. Although each approach provides valuable information separately, we show that, when combined, they paint a more comprehensive picture. We conclude with a review of network-based approaches as applied to integrative studies, which we believe holds the key to in-depth understanding of microbiomes
SNP locator: a candidate SNP selection tool
[Abstract] In this work, a data integration approach using a federated model based on a service oriented architecture (SOA) is presented. The BioMOBY middleware was used to implement each service which is part of the integration process. As an example of usage of this architecture, a web tool for candidate SNP selection has been developed. Thus, several BioMOBY services have been created as the model layer of the web application. Each data source has a wrapper which communicates with the federated model, that is, the BioMOBY model, and this model is the one that interacts with the client.Red Gallega de Investigación sobre Cáncer Colorrectal; Ref. 2009/58Programa Iberoamericano de Ciencia y TecnologÃa para el Desarrollo; 209RT-0366Instituto de Salud Carlos III; PIO52048Instituto de Salud Carlos III; RD07/0067/0005Galicia. ConsellerÃa de EconomÃa e Industria ; 10SIN105004PRMinisterio de Industria, Turismo y Comercio; TSI-020110-2009-5