Search CORE

663 research outputs found

Applied Computational Techniques on Schizophrenia Using Genetic Mutations

Author: Aguiar-Pulido Vanessa
Fernández-Lozano Carlos
Gestal M.
Munteanu Cristian-Robert
Rivero Daniel
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2013
Field of study

[Abstract] Schizophrenia is a complex disease, with both genetic and environmental influence. Machine learning techniques can be used to associate different genetic variations at different genes with a (schizophrenic or non-schizophrenic) phenotype. Several machine learning techniques were applied to schizophrenia data to obtain the results presented in this study. Considering these data, Quantitative Genotype – Disease Relationships (QDGRs) can be used for disease prediction. One of the best machine learning-based models obtained after this exhaustive comparative study was implemented online; this model is an artificial neural network (ANN). Thus, the tool offers the possibility to introduce Single Nucleotide Polymorphism (SNP) sequences in order to classify a patient with schizophrenia. Besides this comparative study, a method for variable selection, based on ANNs and evolutionary computation (EC), is also presented. This method uses half the number of variables as the original ANN and the variables obtained are among those found in other publications. In the future, QDGR models based on nucleic acid information could be expanded to other diseases.Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT-0366Xunta de Galicia; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/0005Xunta de Galicia; Ref. 2009/5

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

University of Miami: Scholarship Miami

ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci

Author: A Bateman
A Freitas
AA Motsinger
AA Motsinger-Reif
B Maher
BC White
C Kooperberg
C Newton-Cheh
CJ Willer
CM Bishop
CR Porter
CS Carlson
CS Greene
CS Greene
CY Huang
D Ruano
DB Goldstein
E Boerwinkle
E Colucci-Guyon
ER Holzinger
F Sato
G Peng
H Shao
HJ Cordell
I Vastrik
I Xenarios
IG Sprinkhuizen-Kuyper
International hapmap consortium
International hapmap consortium
J Koza
J Meiler
J Moore
J Moore
J Ott
JE Dayhoff
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JH Moore
JN Hirschhorn
KG Becker
KH Pietilainen
LA Hindorff
M Abney
M Ashburner
M Kanehisa
M O'Neil
Marylyn D Ritchie
MC Gruda
MD Ritchie
MD Ritchie
MR Nelson
N Killeen
N Penrod
P Cohen
P Gorry
P Lucek
R Bellman
R Culverhouse
R Culverhouse
R Linder
R Poli
R Shen
RD Finn
RJ Klein
S Itohara
S Kathiresan
S Kim
S Wright
SC Hamon
Scott M Dudek
SD Turner
SD Turner
SE Baranzini
SE Maxwell
Stephen D Turner
T Baba
TA Manolio
TL Edwards
TM Frayling
V Kurkova
WJ Gauderman
WJ Gauderman
WS Bush
X He
X Yao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Growing interest and burgeoning technology for discovering genetic mechanisms that influence disease processes have ushered in a flood of genetic association studies over the last decade, yet little heritability in highly studied complex traits has been explained by genetic variation. Non-additive gene-gene interactions, which are not often explored, are thought to be one source of this "missing" heritability. Methods Stochastic methods employing evolutionary algorithms have demonstrated promise in being able to detect and model gene-gene and gene-environment interactions that influence human traits. Here we demonstrate modifications to a neural network algorithm in ATHENA (the Analysis Tool for Heritable and Environmental Network Associations) resulting in clear performance improvements for discovering gene-gene interactions that influence human traits. We employed an alternative tree-based crossover, backpropagation for locally fitting neural network weights, and incorporation of domain knowledge obtainable from publicly accessible biological databases for initializing the search for gene-gene interactions. We tested these modifications <it>in silico </it>using simulated datasets. Results We show that the alternative tree-based crossover modification resulted in a modest increase in the sensitivity of the ATHENA algorithm for discovering gene-gene interactions. The performance increase was highly statistically significant when backpropagation was used to locally fit NN weights. We also demonstrate that using domain knowledge to initialize the search for gene-gene interactions results in a large performance increase, especially when the search space is larger than the search coverage. Conclusions We show that a hybrid optimization procedure, alternative crossover strategies, and incorporation of domain knowledge from publicly available biological databases can result in marked increases in sensitivity and performance of the ATHENA algorithm for detecting and modelling gene-gene interactions that influence a complex human trait.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

Author: Uppu Suneetha
Publication venue: Curtin University
Publication date: 01/01/2018
Field of study

In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

espace@Curtin

Resolución de problemas de optimización combinatoria utilizando técnicas de computación evolutiva: una aplicación a la biomedicina

Author: Aguiar-Pulido Vanessa
Publication venue
Publication date: 01/01/2014
Field of study

[Resumen] Cada día se genera una mayor cantidad de datos, tanto con respecto a su volumen como por el número de variables que involucran, lo cual representa un problema para las técnicas tradicionales. En muchos problemas el conjunto de soluciones posibles es tan elevado que la localización de una solución óptima es imposible en un tiempo razonable, por lo que es necesario emplear técnicas basadas en heurísticas. Se ha observado que las técnicas de computación evolutiva (CE) proporcionan resultados satisfactorios en situaciones en que técnicas tradicionales no los obtuvieron, en especial en su aplicación a datos biomédicos y relacionados con el diagnóstico de enfermedades. Así, en este trabajo se ha desarrollado un modelo basado en CE capaz de, a partir de unos datos de entrada etiquetados como sujetos sanos o enfermos, extraer expresiones con las que construir un modelo de clasificación. Este modelo ha sido validado tanto contra datos sintéticos como aplicado a un conjunto de datos clínicos reales, además de comparar sus resultados con métodos similares. Es de destacar que el modelo propuesto obtiene expresiones sencillas y que logra clasificar ambos tipos de conjuntos mejor que el resto de técnicas, resultando de gran utilidad como apoyo al diagnóstico clínico.[Resumo] Cada día xérase unha maior cantidade de datos, tanto con respecto ao seu volume como polo número de variables que involucran, o cal representa un problema para as técnicas tradicionais. En moitos problemas o conxunto de solucións posibles é tan elevado que a localización dunha solución óptima é imposible nun tempo razoable, polo que é necesario empregar técnicas baseadas en heurísticas. Observouse que as técnicas de computación evolutiva (CE) proporcionan resultados satisfactorios en situacións en que técnicas tradicionais non os obtiveron, en especial na súa aplicación a datos biomédicos e relacionados co diagnóstico de enfermidades. Así, neste traballo desenvolveuse un modelo baseado en CE capaz de, a partir duns datos de entrada etiquetados como suxeitos sans ou enfermos, extraer expresións coas que construír un modelo de clasificación. Este modelo foi validado tanto contra datos sintéticos como aplicado a un conxunto de datos clínicos reais, ademais de comparar os seus resultados con métodos similares. Compre destacar que o modelo proposto obtén expresións sinxelas e que logra clasificar ambos tipos de conxuntos mellor co resto de técnicas, resultando de gran utilidade como apoio ó diagnóstico clínico.[Abstract] Every day more data are being generated. Not only the volume of data increases, but also the number of variables does. This represents an issue for traditional techniques. Furthermore, many problems involve such a large set of possible solutions that finding the optimal solution in a reasonable amount of time is not feasible. Thus, using techniques based on heuristics becomes necessary. Evolutionary Computation (EC) has provided good results in situations in which traditional techniques did not, especially when applied to biomedical data and disease diagnosis. Therefore, in this work, a model based on EC has been developed. This model, based on an input set with data that belong to healthy or diseased subjects, is capable of extracting expressions in order to build a classification model. The model proposed in this thesis has been validated on generated data, as well as applied to real clinical data, comparing the results obtained with those of other similar techniques. It is worth pointing out that the model presented extracts simple expressions and performs better when classifying both types of data sets than other existing techniques. As a result, the model presented is expected to be very useful for clinical diagnostic support

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ant Colony Optimization

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented

Directory of Open Access Books (DOAB)

The computational hardness of feature selection in strict-pure synthetic genetic datasets

Author: Mohtasham Majid Beheshti
Publication venue: Memorial University of Newfoundland
Publication date: 01/09/2019
Field of study

A common task in knowledge discovery is finding a few features correlated with an outcome in a sea of mostly irrelevant data. This task is particularly formidable in genetic datasets containing thousands to millions of Single Nucleotide Polymorphisms (SNPs) for each individual; the goal here is to find a small subset of SNPs correlated with whether an individual is sick or healthy(labeled data). Although determining a correlation between any given SNP (genotype) and a disease label (phenotype) is relatively straightforward, detecting subsets of SNPs such that the correlation is only apparent when the whole subset is considered seems to be much harder. In this thesis, we study the computational hardness of this problem, in particular for a widely used method of generating synthetic SNP datasets. More specifically, we consider the feature selection problem in datasets generated by ”pure and strict” models, such as ones produced by the popular GAMETES software. In these datasets, there is a high correlation between a predefined target set of features (SNPs) and a label; however, any subset of the target set appears uncorrelated with the outcome. Our main result is a (linear-time, parameter-preserving) reduction from the well-known Learning Parity with Noise (LPN) problem to feature selection in such pure and strict datasets. This gives us a host of consequences for the complexity of feature selection in this setting. First, not only it is NP-hard (to even approximate), it is computationally hard on average under a standard cryptographic assumption on hardness on learning parity with noise; moreover, in general it is as hard for the uniform distribution as for arbitrary distributions, and as hard for random noise as for adversarial noise. For the worst case complexity, we get a tighter parameterized lower bound: even in the non-noisy case, finding a parity of Hamming weight at most k is W[1]-hard when the number of samples is relatively small (logarithmic in the number of features). Finally, most relevant to the development of feature selection heuristics, by the unconditional hardness of LPN in Kearns’ statistical query model, no heuristic that only computes statistics about the samples rather than considering samples themselves, can successfully perform feature selection in such pure and strict datasets. This eliminates a large class of common approaches to feature selection

Memorial University Research Repository

A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications

Author: Genlin Ji
Shuihua Wang
Yudong Zhang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Particle swarm optimization (PSO) is a heuristic global optimization method, proposed originally by Kennedy and Eberhart in 1995. It is now one of the most commonly used optimization techniques. This survey presented a comprehensive investigation of PSO. On one hand, we provided advances with PSO, including its modifications (including quantum-behaved PSO, bare-bones PSO, chaotic PSO, and fuzzy PSO), population topology (as fully connected, von Neumann, ring, star, random, etc.), hybridization (with genetic algorithm, simulated annealing, Tabu search, artificial immune system, ant colony algorithm, artificial bee colony, differential evolution, harmonic search, and biogeography-based optimization), extensions (to multiobjective, constrained, discrete, and binary optimization), theoretical analysis (parameter selection and tuning, and convergence analysis), and parallel implementation (in multicore, multiprocessor, GPU, and cloud computing forms). On the other hand, we offered a survey on applications of PSO to the following eight fields: electrical and electronic engineering, automation control systems, communication theory, operations research, mechanical engineering, fuel and energy, medicine, chemistry, and biology. It is hoped that this survey would be beneficial for the researchers studying PSO algorithms

Crossref

Directory of Open Access Journals

Evaluation of Existing Methods for High-Order Epistasis Detection

Author: Carvajal-Rodriguez Antonio
González-Domínguez Jorge
Martín María J.
Ponte-Fernández Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/10/2020
Field of study

[Abstract] Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.10.13039/501100010801-Xunta de Galicia (Grant Number: ED431C2016-037, ED431C2017/04 and ED431G2019/01) 10.13039/501100003176-Ministerio de Educacion Cultura y Deporte (Grant Number: FPU16/01333) 10.13039/501100003329-Ministerio de Economia y Competitividad (Grant Number: CGL2016-75482-P, PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/50110 and TIN2016-75845-P)Xunta de Galicia; ED431C2016-037Xunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C 2017/0

Repositorio da Universidade da Coruña

Genetic heterogeneity analysis using genetic algorithm and network science

Author: Chen Yuanzhu
Hu Ting
Sha Zhendong
Publication venue
Publication date: 11/08/2023
Field of study

Through genome-wide association studies (GWAS), disease susceptible genetic variables can be identified by comparing the genetic data of individuals with and without a specific disease. However, the discovery of these associations poses a significant challenge due to genetic heterogeneity and feature interactions. Genetic variables intertwined with these effects often exhibit lower effect-size, and thus can be difficult to be detected using machine learning feature selection methods. To address these challenges, this paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet). FCS-Net is designed to extract heterogeneous subsets of genetic variables from a network constructed from multiple independent feature selection runs based on a genetic algorithm (GA), an evolutionary learning algorithm. We employ a non-linear machine learning algorithm to detect feature interaction. We introduce the Community Risk Score (CRS), a synthetic feature designed to quantify the collective disease association of each variable subset. Our experiment showcases the effectiveness of the utilized GA-based feature selection method in identifying feature interactions through synthetic data analysis. Furthermore, we apply our novel approach to a case-control colorectal cancer GWAS dataset. The resulting synthetic features are then used to explain the genetic heterogeneity in an additional case-only GWAS dataset

arXiv.org e-Print Archive