296 research outputs found

    Artificial neural network approach for selection of susceptible single nucleotide polymorphisms and construction of prediction model on childhood allergic asthma

    Get PDF
    BACKGROUND: Screening of various gene markers such as single nucleotide polymorphism (SNP) and correlation between these markers and development of multifactorial disease have previously been studied. Here, we propose a susceptible marker-selectable artificial neural network (ANN) for predicting development of allergic disease. RESULTS: To predict development of childhood allergic asthma (CAA) and select susceptible SNPs, we used an ANN with a parameter decreasing method (PDM) to analyze 25 SNPs of 17 genes in 344 Japanese people, and select 10 susceptible SNPs of CAA. The accuracy of the ANN model with 10 SNPs was 97.7% for learning data and 74.4% for evaluation data. Important combinations were determined by effective combination value (ECV) defined in the present paper. Effective 2-SNP or 3-SNP combinations were found to be concentrated among the 10 selected SNPs. CONCLUSION: ANN can reliably select SNP combinations that are associated with CAA. Thus, the ANN can be used to characterize development of complex diseases caused by multiple factors. This is the first report of automatic selection of SNPs related to development of multifactorial disease from SNP data of more than 300 patients

    COMPARISON OF ANN METHOD AND LOGISTIC REGRESSION METHOD ON SINGLE NUCLEOTIDE POLYMORPHISM GENETIC DATA

    Get PDF
    This study aims to determine the goodness of classification using the ANN method on Asthma genetic data in the R program package, namely SNPassoc. SNP genetic data was transformed using codominant genetic traits, namely for genetic data AA, AC, CC were given a score of 0, 0.5 and 1, respectively, while CC, CT and TT were scored 0, 0.5 and 1, respectively. The scoring is based on the smallest alphabetical order given a low score. The average accuracy, precision, recall and F1 score were determined using the neural network method if the genetic code was used with variations in the proportion of test data 10%, 20%, 30% and 40% and repeated B = 1000 times. The results obtained were compared with the logistic regression method. If 20% test data is used and the ANN method is used, the accuracy, precision, recall and F1 scores are 0.7756, 0.7844, 0.9844 and 0.8728, respectively. When all information from various countries is used in the Asthma genetic data, the logistic regression method gives higher average accuracy, precision and F1 scores than the ANN method, but the average recall is the opposite. When a separate analysis is performed for each country, the logistic regression method gives higher accuracy, precision, recall and F1 scores in the ANN method compared to the logistic regression method

    A multifactorial analysis of obesity as CVD risk factor: Use of neural network based methods in a nutrigenetics context

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Obesity is a multifactorial trait, which comprises an independent risk factor for cardiovascular disease (CVD). The aim of the current work is to study the complex etiology beneath obesity and identify genetic variations and/or factors related to nutrition that contribute to its variability. To this end, a set of more than 2300 white subjects who participated in a nutrigenetics study was used. For each subject a total of 63 factors describing genetic variants related to CVD (24 in total), gender, and nutrition (38 in total), e.g. average daily intake in calories and cholesterol, were measured. Each subject was categorized according to body mass index (BMI) as normal (BMI ≤ 25) or overweight (BMI > 25). Two artificial neural network (ANN) based methods were designed and used towards the analysis of the available data. These corresponded to i) a multi-layer feed-forward ANN combined with a parameter decreasing method (PDM-ANN), and ii) a multi-layer feed-forward ANN trained by a hybrid method (GA-ANN) which combines genetic algorithms and the popular back-propagation training algorithm.</p> <p>Results</p> <p>PDM-ANN and GA-ANN were comparatively assessed in terms of their ability to identify the most important factors among the initial 63 variables describing genetic variations, nutrition and gender, able to classify a subject into one of the BMI related classes: normal and overweight. The methods were designed and evaluated using appropriate training and testing sets provided by 3-fold Cross Validation (3-CV) resampling. Classification accuracy, sensitivity, specificity and area under receiver operating characteristics curve were utilized to evaluate the resulted predictive ANN models. The most parsimonious set of factors was obtained by the GA-ANN method and included gender, six genetic variations and 18 nutrition-related variables. The corresponding predictive model was characterized by a mean accuracy equal of 61.46% in the 3-CV testing sets.</p> <p>Conclusions</p> <p>The ANN based methods revealed factors that interactively contribute to obesity trait and provided predictive models with a promising generalization ability. In general, results showed that ANNs and their hybrids can provide useful tools for the study of complex traits in the context of nutrigenetics.</p

    Machine learning in genome-wide association studies

    Full text link
    Recently, genome-wide association studies have substantially expanded our knowledge about genetic variants that influence the susceptibility to complex diseases. Although standard statistical tests for each single-nucleotide polymorphism (SNP) separately are able to capture main genetic effects, different approaches are necessary to identify SNPs that influence disease risk jointly or in complex interactions. Experimental and simulated genome-wide SNP data provided by the Genetic Analysis Workshop 16 afforded an opportunity to analyze the applicability and benefit of several machine learning methods. Penalized regression, ensemble methods, and network analyses resulted in several new findings while known and simulated genetic risk variants were also identified. In conclusion, machine learning approaches are promising complements to standard single-and multi-SNP analysis methods for understanding the overall genetic architecture of complex human diseases. However, because they are not optimized for genome-wide SNP data, improved implementations and new variable selection procedures are required. Genet. Epidemiol . 33 (Suppl. 1):S51–S57, 2009. © 2009 Wiley-Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/64533/1/20473_ftp.pd

    Neural networks for genetic epidemiology: past, present, and future

    Get PDF
    During the past two decades, the field of human genetics has experienced an information explosion. The completion of the human genome project and the development of high throughput SNP technologies have created a wealth of data; however, the analysis and interpretation of these data have created a research bottleneck. While technology facilitates the measurement of hundreds or thousands of genes, statistical and computational methodologies are lacking for the analysis of these data. New statistical methods and variable selection strategies must be explored for identifying disease susceptibility genes for common, complex diseases. Neural networks (NN) are a class of pattern recognition methods that have been successfully implemented for data mining and prediction in a variety of fields. The application of NN for statistical genetics studies is an active area of research. Neural networks have been applied in both linkage and association analysis for the identification of disease susceptibility genes

    A genetic ensemble approach for gene-gene interaction identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging.</p> <p>Methods</p> <p>In this paper, we propose a new approach for identifying such gene-gene and gene-environment interactions underlying complex diseases. This is a hybrid algorithm and it combines genetic algorithm (GA) and an ensemble of classifiers (called genetic ensemble). Using this approach, the original problem of SNP interaction identification is converted into a data mining problem of combinatorial feature selection. By collecting various single nucleotide polymorphisms (SNP) subsets as well as environmental factors generated in multiple GA runs, patterns of gene-gene and gene-environment interactions can be extracted using a simple combinatorial ranking method. Also considered in this study is the idea of combining identification results obtained from multiple algorithms. A novel formula based on pairwise <it>double fault </it>is designed to quantify the degree of complementarity.</p> <p>Conclusions</p> <p>Our simulation study demonstrates that the proposed genetic ensemble algorithm has comparable identification power to Multifactor Dimensionality Reduction (MDR) and is slightly better than Polymorphism Interaction Analysis (PIA), which are the two most popular methods for gene-gene interaction identification. More importantly, the identification results generated by using our genetic ensemble algorithm are highly complementary to those obtained by PIA and MDR. Experimental results from our simulation studies and real world data application also confirm the effectiveness of the proposed genetic ensemble algorithm, as well as the potential benefits of combining identification results from different algorithms.</p

    TRIO LOGIC REGRESSION - DETECTION OF SNP - SNP INTERACTIONS IN CASE-PARENT TRIOS

    Get PDF
    Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the linkage disequilibrium (LD) structure in the genotype data, and accommodates missing genotypes via haplotype-based imputation. We also derive an efficient algorithm to simulate case-parent trios where genetic risk is determined via epistatic interactions

    Large–scale data–driven network analysis of human–plasmodium falciparum interactome: extracting essential targets and processes for malaria drug discovery

    Get PDF
    Background: Plasmodium falciparum malaria is an infectious disease considered to have great impact on public health due to its associated high mortality rates especially in sub Saharan Africa. Falciparum drugresistant strains, notably, to chloroquine and sulfadoxine-pyrimethamine in Africa is traced mainly to Southeast Asia where artemisinin resistance rate is increasing. Although careful surveillance to monitor the emergence and spread of artemisinin-resistant parasite strains in Africa is on-going, research into new drugs, particularly, for African populations, is critical since there is no replaceable drug for artemisinin combination therapies (ACTs) yet. Objective: The overall objective of this study is to identify potential protein targets through host–pathogen protein–protein functional interaction network analysis to understand the underlying mechanisms of drug failure and identify those essential targets that can play their role in predicting potential drug candidates specific to the African populations through a protein-based approach of both host and Plasmodium falciparum genomic analysis. Methods: We leveraged malaria-specific genome wide association study summary statistics data obtained from Gambia, Kenya and Malawi populations, Plasmodium falciparum selective pressure variants and functional datasets (protein sequences, interologs, host-pathogen intra-organism and host-pathogen inter-organism protein-protein interactions (PPIs)) from various sources (STRING, Reactome, HPID, Uniprot, IntAct and literature) to construct overlapping functional network for both host and pathogen. Developed algorithms and a large-scale data-driven computational framework were used in this study to analyze the datasets and the constructed networks to identify densely connected subnetworks or hubs essential for network stability and integrity. The host-pathogen network was analyzed to elucidate the influence of parasite candidate key proteins within the network and predict possible resistant pathways due to host-pathogen candidate key protein interactions. We performed biological and pathway enrichment analysis on critical proteins identified to elucidate their functions. In order to leverage disease-target-drug relationships to identify potential repurposable already approved drug candidates that could be used to treat malaria, pharmaceutical datasets from drug bank were explored using semantic similarity approach based of target–associated biological processes Results: About 600,000 significant SNPs (p-value< 0.05) from the summary statistics data were mapped to their associated genes, and we identified 79 human-associated malaria genes. The assembled parasite network comprised of 8 clusters containing 799 functional interactions between 155 reviewed proteins of which 5 clusters contained 43 key proteins (selective variants) and 2 clusters contained 2 candidate key proteins(key proteins characterized by high centrality measure), C6KTB7 and C6KTD2. The human network comprised of 32 clusters containing 4,133,136 interactions between 20,329 unique reviewed proteins of which 7 clusters contained 760 key proteins and 2 clusters contained 6 significant human malaria-associated candidate key proteins or genes P22301 (IL10), P05362 (ICAM1), P01375 (TNF), P30480 (HLA-B), P16284 (PECAM1), O00206 (TLR4). The generated host-pathogen network comprised of 31,512 functional interactions between 8,023 host and pathogen proteins. We also explored the association of pfk13 gene within the host-pathogen. We observed that pfk13 cluster with host kelch–like proteins and other regulatory genes but no direct association with our identified host candidate key malaria targets. We implemented semantic similarity based approach complemented by Kappa and Jaccard statistical measure to identify 115 malaria–similar diseases and 26 potential repurposable drug hits that can be 3 appropriated experimentally for malaria treatment. Conclusion: In this study, we reviewed existing antimalarial drugs and resistance–associated variants contributing to the diminished sensitivity of antimalarials, especially chloroquine, sulfadoxine-pyrimethamine and artemisinin combination therapy within the African population. We also described various computational techniques implemented in predicting drug targets and leads in drug research. In our data analysis, we showed that possible mechanisms of resistance to artemisinin in Africa may arise from the combinatorial effects of many resistant genes to chloroquine and sulfadoxine–pyrimethamine. We investigated the role of pfk13 within the host–pathogen network. We predicted key targets that have been proposed to be essential for malaria drug and vaccine development through structural and functional analysis of host and pathogen function networks. Based on our analysis, we propose these targets as essential co-targets for combinatorial malaria drug discovery

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise
    corecore