663 research outputs found

    Applied Computational Techniques on Schizophrenia Using Genetic Mutations

    Get PDF
    [Abstract] Schizophrenia is a complex disease, with both genetic and environmental influence. Machine learning techniques can be used to associate different genetic variations at different genes with a (schizophrenic or non-schizophrenic) phenotype. Several machine learning techniques were applied to schizophrenia data to obtain the results presented in this study. Considering these data, Quantitative Genotype – Disease Relationships (QDGRs) can be used for disease prediction. One of the best machine learning-based models obtained after this exhaustive comparative study was implemented online; this model is an artificial neural network (ANN). Thus, the tool offers the possibility to introduce Single Nucleotide Polymorphism (SNP) sequences in order to classify a patient with schizophrenia. Besides this comparative study, a method for variable selection, based on ANNs and evolutionary computation (EC), is also presented. This method uses half the number of variables as the original ANN and the variables obtained are among those found in other publications. In the future, QDGR models based on nucleic acid information could be expanded to other diseases.Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT-0366Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/0005Xunta de Galicia; Ref. 2009/5

    ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Growing interest and burgeoning technology for discovering genetic mechanisms that influence disease processes have ushered in a flood of genetic association studies over the last decade, yet little heritability in highly studied complex traits has been explained by genetic variation. Non-additive gene-gene interactions, which are not often explored, are thought to be one source of this "missing" heritability.</p> <p>Methods</p> <p>Stochastic methods employing evolutionary algorithms have demonstrated promise in being able to detect and model gene-gene and gene-environment interactions that influence human traits. Here we demonstrate modifications to a neural network algorithm in ATHENA (the Analysis Tool for Heritable and Environmental Network Associations) resulting in clear performance improvements for discovering gene-gene interactions that influence human traits. We employed an alternative tree-based crossover, backpropagation for locally fitting neural network weights, and incorporation of domain knowledge obtainable from publicly accessible biological databases for initializing the search for gene-gene interactions. We tested these modifications <it>in silico </it>using simulated datasets.</p> <p>Results</p> <p>We show that the alternative tree-based crossover modification resulted in a modest increase in the sensitivity of the ATHENA algorithm for discovering gene-gene interactions. The performance increase was highly statistically significant when backpropagation was used to locally fit NN weights. We also demonstrate that using domain knowledge to initialize the search for gene-gene interactions results in a large performance increase, especially when the search space is larger than the search coverage.</p> <p>Conclusions</p> <p>We show that a hybrid optimization procedure, alternative crossover strategies, and incorporation of domain knowledge from publicly available biological databases can result in marked increases in sensitivity and performance of the ATHENA algorithm for detecting and modelling gene-gene interactions that influence a complex human trait.</p

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    Resolución de problemas de optimización combinatoria utilizando técnicas de computación evolutiva: una aplicación a la biomedicina

    Get PDF
    [Resumen] Cada día se genera una mayor cantidad de datos, tanto con respecto a su volumen como por el número de variables que involucran, lo cual representa un problema para las técnicas tradicionales. En muchos problemas el conjunto de soluciones posibles es tan elevado que la localización de una solución óptima es imposible en un tiempo razonable, por lo que es necesario emplear técnicas basadas en heurísticas. Se ha observado que las técnicas de computación evolutiva (CE) proporcionan resultados satisfactorios en situaciones en que técnicas tradicionales no los obtuvieron, en especial en su aplicación a datos biomédicos y relacionados con el diagnóstico de enfermedades. Así, en este trabajo se ha desarrollado un modelo basado en CE capaz de, a partir de unos datos de entrada etiquetados como sujetos sanos o enfermos, extraer expresiones con las que construir un modelo de clasificación. Este modelo ha sido validado tanto contra datos sintéticos como aplicado a un conjunto de datos clínicos reales, además de comparar sus resultados con métodos similares. Es de destacar que el modelo propuesto obtiene expresiones sencillas y que logra clasificar ambos tipos de conjuntos mejor que el resto de técnicas, resultando de gran utilidad como apoyo al diagnóstico clínico.[Resumo] Cada día xérase unha maior cantidade de datos, tanto con respecto ao seu volume como polo número de variables que involucran, o cal representa un problema para as técnicas tradicionais. En moitos problemas o conxunto de solucións posibles é tan elevado que a localización dunha solución óptima é imposible nun tempo razoable, polo que é necesario empregar técnicas baseadas en heurísticas. Observouse que as técnicas de computación evolutiva (CE) proporcionan resultados satisfactorios en situacións en que técnicas tradicionais non os obtiveron, en especial na súa aplicación a datos biomédicos e relacionados co diagnóstico de enfermidades. Así, neste traballo desenvolveuse un modelo baseado en CE capaz de, a partir duns datos de entrada etiquetados como suxeitos sans ou enfermos, extraer expresións coas que construír un modelo de clasificación. Este modelo foi validado tanto contra datos sintéticos como aplicado a un conxunto de datos clínicos reais, ademais de comparar os seus resultados con métodos similares. Compre destacar que o modelo proposto obtén expresións sinxelas e que logra clasificar ambos tipos de conxuntos mellor co resto de técnicas, resultando de gran utilidade como apoio ó diagnóstico clínico.[Abstract] Every day more data are being generated. Not only the volume of data increases, but also the number of variables does. This represents an issue for traditional techniques. Furthermore, many problems involve such a large set of possible solutions that finding the optimal solution in a reasonable amount of time is not feasible. Thus, using techniques based on heuristics becomes necessary. Evolutionary Computation (EC) has provided good results in situations in which traditional techniques did not, especially when applied to biomedical data and disease diagnosis. Therefore, in this work, a model based on EC has been developed. This model, based on an input set with data that belong to healthy or diseased subjects, is capable of extracting expressions in order to build a classification model. The model proposed in this thesis has been validated on generated data, as well as applied to real clinical data, comparing the results obtained with those of other similar techniques. It is worth pointing out that the model presented extracts simple expressions and performs better when classifying both types of data sets than other existing techniques. As a result, the model presented is expected to be very useful for clinical diagnostic support

    Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented

    The computational hardness of feature selection in strict-pure synthetic genetic datasets

    Get PDF
    A common task in knowledge discovery is finding a few features correlated with an outcome in a sea of mostly irrelevant data. This task is particularly formidable in genetic datasets containing thousands to millions of Single Nucleotide Polymorphisms (SNPs) for each individual; the goal here is to find a small subset of SNPs correlated with whether an individual is sick or healthy(labeled data). Although determining a correlation between any given SNP (genotype) and a disease label (phenotype) is relatively straightforward, detecting subsets of SNPs such that the correlation is only apparent when the whole subset is considered seems to be much harder. In this thesis, we study the computational hardness of this problem, in particular for a widely used method of generating synthetic SNP datasets. More specifically, we consider the feature selection problem in datasets generated by ”pure and strict” models, such as ones produced by the popular GAMETES software. In these datasets, there is a high correlation between a predefined target set of features (SNPs) and a label; however, any subset of the target set appears uncorrelated with the outcome. Our main result is a (linear-time, parameter-preserving) reduction from the well-known Learning Parity with Noise (LPN) problem to feature selection in such pure and strict datasets. This gives us a host of consequences for the complexity of feature selection in this setting. First, not only it is NP-hard (to even approximate), it is computationally hard on average under a standard cryptographic assumption on hardness on learning parity with noise; moreover, in general it is as hard for the uniform distribution as for arbitrary distributions, and as hard for random noise as for adversarial noise. For the worst case complexity, we get a tighter parameterized lower bound: even in the non-noisy case, finding a parity of Hamming weight at most k is W[1]-hard when the number of samples is relatively small (logarithmic in the number of features). Finally, most relevant to the development of feature selection heuristics, by the unconditional hardness of LPN in Kearns’ statistical query model, no heuristic that only computes statistics about the samples rather than considering samples themselves, can successfully perform feature selection in such pure and strict datasets. This eliminates a large class of common approaches to feature selection

    A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications

    Get PDF
    Particle swarm optimization (PSO) is a heuristic global optimization method, proposed originally by Kennedy and Eberhart in 1995. It is now one of the most commonly used optimization techniques. This survey presented a comprehensive investigation of PSO. On one hand, we provided advances with PSO, including its modifications (including quantum-behaved PSO, bare-bones PSO, chaotic PSO, and fuzzy PSO), population topology (as fully connected, von Neumann, ring, star, random, etc.), hybridization (with genetic algorithm, simulated annealing, Tabu search, artificial immune system, ant colony algorithm, artificial bee colony, differential evolution, harmonic search, and biogeography-based optimization), extensions (to multiobjective, constrained, discrete, and binary optimization), theoretical analysis (parameter selection and tuning, and convergence analysis), and parallel implementation (in multicore, multiprocessor, GPU, and cloud computing forms). On the other hand, we offered a survey on applications of PSO to the following eight fields: electrical and electronic engineering, automation control systems, communication theory, operations research, mechanical engineering, fuel and energy, medicine, chemistry, and biology. It is hoped that this survey would be beneficial for the researchers studying PSO algorithms

    Evaluation of Existing Methods for High-Order Epistasis Detection

    Get PDF
    [Abstract] Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.10.13039/501100010801-Xunta de Galicia (Grant Number: ED431C2016-037, ED431C2017/04 and ED431G2019/01) 10.13039/501100003176-Ministerio de Educacion Cultura y Deporte (Grant Number: FPU16/01333) 10.13039/501100003329-Ministerio de Economia y Competitividad (Grant Number: CGL2016-75482-P, PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/50110 and TIN2016-75845-P)Xunta de Galicia; ED431C2016-037Xunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C 2017/0

    Genetic heterogeneity analysis using genetic algorithm and network science

    Full text link
    Through genome-wide association studies (GWAS), disease susceptible genetic variables can be identified by comparing the genetic data of individuals with and without a specific disease. However, the discovery of these associations poses a significant challenge due to genetic heterogeneity and feature interactions. Genetic variables intertwined with these effects often exhibit lower effect-size, and thus can be difficult to be detected using machine learning feature selection methods. To address these challenges, this paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet). FCS-Net is designed to extract heterogeneous subsets of genetic variables from a network constructed from multiple independent feature selection runs based on a genetic algorithm (GA), an evolutionary learning algorithm. We employ a non-linear machine learning algorithm to detect feature interaction. We introduce the Community Risk Score (CRS), a synthetic feature designed to quantify the collective disease association of each variable subset. Our experiment showcases the effectiveness of the utilized GA-based feature selection method in identifying feature interactions through synthetic data analysis. Furthermore, we apply our novel approach to a case-control colorectal cancer GWAS dataset. The resulting synthetic features are then used to explain the genetic heterogeneity in an additional case-only GWAS dataset
    corecore