663 research outputs found
Applied Computational Techniques on Schizophrenia Using Genetic Mutations
[Abstract] Schizophrenia is a complex disease, with both genetic and environmental influence. Machine learning techniques can be used to associate different genetic variations at different genes with a (schizophrenic or non-schizophrenic) phenotype. Several machine learning techniques were applied to schizophrenia data to obtain the results presented in this study. Considering these data, Quantitative Genotype – Disease Relationships (QDGRs) can be used for disease prediction. One of the best machine learning-based models obtained after this exhaustive comparative study was implemented online; this model is an artificial neural network (ANN). Thus, the tool offers the possibility to introduce Single Nucleotide Polymorphism (SNP) sequences in order to classify a patient with schizophrenia. Besides this comparative study, a method for variable selection, based on ANNs and evolutionary computation (EC), is also presented. This method uses half the number of variables as the original ANN and the variables obtained are among those found in other publications. In the future, QDGR models based on nucleic acid information could be expanded to other diseases.Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT-0366Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/0005Xunta de Galicia; Ref. 2009/5
ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci
<p>Abstract</p> <p>Background</p> <p>Growing interest and burgeoning technology for discovering genetic mechanisms that influence disease processes have ushered in a flood of genetic association studies over the last decade, yet little heritability in highly studied complex traits has been explained by genetic variation. Non-additive gene-gene interactions, which are not often explored, are thought to be one source of this "missing" heritability.</p> <p>Methods</p> <p>Stochastic methods employing evolutionary algorithms have demonstrated promise in being able to detect and model gene-gene and gene-environment interactions that influence human traits. Here we demonstrate modifications to a neural network algorithm in ATHENA (the Analysis Tool for Heritable and Environmental Network Associations) resulting in clear performance improvements for discovering gene-gene interactions that influence human traits. We employed an alternative tree-based crossover, backpropagation for locally fitting neural network weights, and incorporation of domain knowledge obtainable from publicly accessible biological databases for initializing the search for gene-gene interactions. We tested these modifications <it>in silico </it>using simulated datasets.</p> <p>Results</p> <p>We show that the alternative tree-based crossover modification resulted in a modest increase in the sensitivity of the ATHENA algorithm for discovering gene-gene interactions. The performance increase was highly statistically significant when backpropagation was used to locally fit NN weights. We also demonstrate that using domain knowledge to initialize the search for gene-gene interactions results in a large performance increase, especially when the search space is larger than the search coverage.</p> <p>Conclusions</p> <p>We show that a hybrid optimization procedure, alternative crossover strategies, and incorporation of domain knowledge from publicly available biological databases can result in marked increases in sensitivity and performance of the ATHENA algorithm for detecting and modelling gene-gene interactions that influence a complex human trait.</p
Discovering Higher-order SNP Interactions in High-dimensional Genomic Data
In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise
Resolución de problemas de optimización combinatoria utilizando técnicas de computación evolutiva: una aplicación a la biomedicina
[Resumen] Cada día se genera una mayor cantidad de datos, tanto con respecto a su volumen como
por el número de variables que involucran, lo cual representa un problema para las técnicas
tradicionales. En muchos problemas el conjunto de soluciones posibles es tan elevado que
la localización de una solución óptima es imposible en un tiempo razonable, por lo que es
necesario emplear técnicas basadas en heurísticas. Se ha observado que las técnicas de
computación evolutiva (CE) proporcionan resultados satisfactorios en situaciones en que
técnicas tradicionales no los obtuvieron, en especial en su aplicación a datos biomédicos y
relacionados con el diagnóstico de enfermedades.
Así, en este trabajo se ha desarrollado un modelo basado en CE capaz de, a partir de unos
datos de entrada etiquetados como sujetos sanos o enfermos, extraer expresiones con las
que construir un modelo de clasificación. Este modelo ha sido validado tanto contra datos
sintéticos como aplicado a un conjunto de datos clínicos reales, además de comparar sus
resultados con métodos similares. Es de destacar que el modelo propuesto obtiene
expresiones sencillas y que logra clasificar ambos tipos de conjuntos mejor que el resto de
técnicas, resultando de gran utilidad como apoyo al diagnóstico clínico.[Resumo] Cada día xérase unha maior cantidade de datos, tanto con respecto ao seu volume como
polo número de variables que involucran, o cal representa un problema para as técnicas
tradicionais. En moitos problemas o conxunto de solucións posibles é tan elevado que a
localización dunha solución óptima é imposible nun tempo razoable, polo que é necesario
empregar técnicas baseadas en heurísticas. Observouse que as técnicas de computación
evolutiva (CE) proporcionan resultados satisfactorios en situacións en que técnicas
tradicionais non os obtiveron, en especial na súa aplicación a datos biomédicos e
relacionados co diagnóstico de enfermidades.
Así, neste traballo desenvolveuse un modelo baseado en CE capaz de, a partir duns datos
de entrada etiquetados como suxeitos sans ou enfermos, extraer expresións coas que
construír un modelo de clasificación. Este modelo foi validado tanto contra datos sintéticos
como aplicado a un conxunto de datos clínicos reais, ademais de comparar os seus
resultados con métodos similares. Compre destacar que o modelo proposto obtén
expresións sinxelas e que logra clasificar ambos tipos de conxuntos mellor co resto de
técnicas, resultando de gran utilidade como apoio ó diagnóstico clínico.[Abstract] Every day more data are being generated. Not only the volume of data increases, but also
the number of variables does. This represents an issue for traditional techniques.
Furthermore, many problems involve such a large set of possible solutions that finding the
optimal solution in a reasonable amount of time is not feasible. Thus, using techniques
based on heuristics becomes necessary. Evolutionary Computation (EC) has provided
good results in situations in which traditional techniques did not, especially when applied to
biomedical data and disease diagnosis.
Therefore, in this work, a model based on EC has been developed. This model, based on
an input set with data that belong to healthy or diseased subjects, is capable of extracting
expressions in order to build a classification model. The model proposed in this thesis has
been validated on generated data, as well as applied to real clinical data, comparing the
results obtained with those of other similar techniques. It is worth pointing out that the
model presented extracts simple expressions and performs better when classifying both
types of data sets than other existing techniques. As a result, the model presented is
expected to be very useful for clinical diagnostic support
Ant Colony Optimization
Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented
The computational hardness of feature selection in strict-pure synthetic genetic datasets
A common task in knowledge discovery is finding a few features correlated with an
outcome in a sea of mostly irrelevant data. This task is particularly formidable in
genetic datasets containing thousands to millions of Single Nucleotide Polymorphisms
(SNPs) for each individual; the goal here is to find a small subset of SNPs correlated
with whether an individual is sick or healthy(labeled data). Although determining
a correlation between any given SNP (genotype) and a disease label (phenotype) is
relatively straightforward, detecting subsets of SNPs such that the correlation is only
apparent when the whole subset is considered seems to be much harder. In this thesis,
we study the computational hardness of this problem, in particular for a widely used
method of generating synthetic SNP datasets.
More specifically, we consider the feature selection problem in datasets generated
by ”pure and strict” models, such as ones produced by the popular GAMETES software.
In these datasets, there is a high correlation between a predefined target set of
features (SNPs) and a label; however, any subset of the target set appears uncorrelated
with the outcome.
Our main result is a (linear-time, parameter-preserving) reduction from the well-known
Learning Parity with Noise (LPN) problem to feature selection in such pure
and strict datasets. This gives us a host of consequences for the complexity of feature
selection in this setting. First, not only it is NP-hard (to even approximate), it is computationally hard on average under a standard cryptographic assumption on
hardness on learning parity with noise; moreover, in general it is as hard for the
uniform distribution as for arbitrary distributions, and as hard for random noise as
for adversarial noise. For the worst case complexity, we get a tighter parameterized
lower bound: even in the non-noisy case, finding a parity of Hamming weight at most
k is W[1]-hard when the number of samples is relatively small (logarithmic in the
number of features).
Finally, most relevant to the development of feature selection heuristics, by the
unconditional hardness of LPN in Kearns’ statistical query model, no heuristic that
only computes statistics about the samples rather than considering samples themselves,
can successfully perform feature selection in such pure and strict datasets.
This eliminates a large class of common approaches to feature selection
A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications
Particle swarm optimization (PSO) is a heuristic global optimization method, proposed originally by Kennedy and Eberhart in 1995. It is now one of the most commonly used optimization techniques. This survey presented a comprehensive investigation of PSO. On one hand, we provided advances with PSO, including its modifications (including quantum-behaved PSO, bare-bones PSO, chaotic PSO, and fuzzy PSO), population topology (as fully connected, von Neumann, ring, star, random, etc.), hybridization (with genetic algorithm, simulated annealing, Tabu search, artificial immune system, ant colony algorithm, artificial bee colony, differential evolution, harmonic search, and biogeography-based optimization), extensions (to multiobjective, constrained, discrete, and binary optimization), theoretical analysis (parameter selection and tuning, and convergence analysis), and parallel implementation (in multicore, multiprocessor, GPU, and cloud computing forms). On the other hand, we offered a survey on applications of PSO to the following eight fields: electrical and electronic engineering, automation control systems, communication theory, operations research, mechanical engineering, fuel and energy, medicine, chemistry, and biology. It is hoped that this survey would be beneficial for the researchers studying PSO algorithms
Evaluation of Existing Methods for High-Order Epistasis Detection
[Abstract]
Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.10.13039/501100010801-Xunta de Galicia (Grant Number: ED431C2016-037, ED431C2017/04 and ED431G2019/01)
10.13039/501100003176-Ministerio de Educacion Cultura y Deporte (Grant Number: FPU16/01333)
10.13039/501100003329-Ministerio de Economia y Competitividad (Grant Number: CGL2016-75482-P, PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/50110 and TIN2016-75845-P)Xunta de Galicia; ED431C2016-037Xunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C 2017/0
Genetic heterogeneity analysis using genetic algorithm and network science
Through genome-wide association studies (GWAS), disease susceptible genetic
variables can be identified by comparing the genetic data of individuals with
and without a specific disease. However, the discovery of these associations
poses a significant challenge due to genetic heterogeneity and feature
interactions. Genetic variables intertwined with these effects often exhibit
lower effect-size, and thus can be difficult to be detected using machine
learning feature selection methods. To address these challenges, this paper
introduces a novel feature selection mechanism for GWAS, named Feature
Co-selection Network (FCSNet). FCS-Net is designed to extract heterogeneous
subsets of genetic variables from a network constructed from multiple
independent feature selection runs based on a genetic algorithm (GA), an
evolutionary learning algorithm. We employ a non-linear machine learning
algorithm to detect feature interaction. We introduce the Community Risk Score
(CRS), a synthetic feature designed to quantify the collective disease
association of each variable subset. Our experiment showcases the effectiveness
of the utilized GA-based feature selection method in identifying feature
interactions through synthetic data analysis. Furthermore, we apply our novel
approach to a case-control colorectal cancer GWAS dataset. The resulting
synthetic features are then used to explain the genetic heterogeneity in an
additional case-only GWAS dataset
- …