121 research outputs found

    Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis

    Get PDF
    Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at http://sites.google.com/site/McKinneyLab/software

    A Computational Model of Quantitative Chromatin Immunoprecipitation (ChIP) Analysis

    Get PDF
    Chromatin immunoprecipitation (ChIP) analysis is widely used to identify the locations in genomes occupied by transcription factors (TFs). The approach involves chemical cross-linking of DNA with associated proteins, fragmentation of chromatin by sonication or enzymatic digestion, immunoprecipitation of the fragments containing the protein of interest, and then PCR or hybridization analysis to characterize and quantify the genomic sequences enriched. We developed a computational model of quantitative ChIP analysis to elucidate the factors contributing to the methodā€™s resolution. The most important variables identified by the model were, in order of importance, the spacing of the PCR primers, the mean length of the chromatin fragments, and, unexpectedly, the type of fragment width distribution, with very small DNA fragments and smaller amplicons providing the best resolution of TF binding. One of the major predictions of the model was also validated experimentally

    Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multifactor Dimensionality Reduction (MDR) has been introduced previously as a non-parametric statistical method for detecting gene-gene interactions. MDR performs a dimensional reduction by assigning multi-locus genotypes to either high- or low-risk groups and measuring the percentage of cases and controls incorrectly labelled by this classification ā€“ the classification error. The combination of variables that produces the lowest classification error is selected as the best or most fit model. The correctly and incorrectly labelled cases and controls can be expressed as a two-way contingency table. We sought to improve the ability of MDR to detect gene-gene interactions by replacing classification error with a different measure to score model quality.</p> <p>Results</p> <p>In this study, we compare the detection and power of MDR using a variety of measures for two-way contingency table analysis. We simulated 40 genetic models, varying the number of disease loci in the model (2 ā€“ 5), allele frequencies of the disease loci (.2/.8 or .4/.6) and the broad-sense heritability of the model (.05 ā€“ .3). Overall, detection using NMI was 65.36% across all models, and specific detection was 59.4% versus detection using classification error at 62% and specific detection was 52.2%.</p> <p>Conclusion</p> <p>Of the 10 measures evaluated, the likelihood ratio and normalized mutual information (NMI) are measures that consistently improve the detection and power of MDR in simulated data over using classification error. These measures also reduce the inclusion of spurious variables in a multi-locus model. Thus, MDR, which has already been demonstrated as a powerful tool for detecting gene-gene interactions, can be improved with the use of alternative fitness functions.</p

    Seasonal Occurrence, Horizontal Movements, and Habitat Use Patterns of Whale Sharks (\u3ci\u3eRhincodon typus\u3c/i\u3e) in the Gulf of Mexico

    Get PDF
    In the northern Gulf of Mexico (GOM), whale sharks (Rhincodon typus) form large aggregations at continental shelf-edge banks during summer; however, knowledge of movements once they leave aggregation sites is limited. Here we report on the seasonal occurrence of whale sharks in the northern GOM based on over 800 whale shark sightings from 1989 to 2016, as well as the movements of 42 whale sharks tagged with satellite-linked and popup satellite archival transmitting tags from 2008 to 2015. Sightings data were most numerous during summer and fall often with aggregations of individuals reported along the continental shelf break. Most sharks (66%) were tagged during this time at Ewing Bank, a known aggregation site off the coast of Louisiana. Whale shark track duration ranged from three to 366 days and all tagged individuals, which ranged from 4.5 to 12.0 m in total length, remained within the GOM. Sightings data revealed that whale sharks occurred primarily in continental shelf and shelf-edge waters (81%) whereas tag data revealed the sharks primarily inhabited continental slope and open ocean waters (91%) of the GOM. Much of their time spent in open ocean waters was associated with the edge of the Loop Current and associated mesoscale eddies. During cooler months, there was a net movement southward, corresponding with the time of reduced sighting reports. Several sharks migrated to the southwest GOM during fall and winter, suggesting this region could be important overwintering habitat and possibly represents another seasonal aggregation site. The three long-term tracked whale sharks exhibited interannual site fidelity, returning one year later to the vicinity where they were originally tagged. The increased habitat use of north central GOM waters by whale sharks as summer foraging grounds and potential interannual site fidelity to Ewing Bank demonstrate the importance of this region for this species

    Identification and replication of RNA-Seq gene network modules associated with depression severity

    Get PDF
    Genomic variation underlying major depressive disorder (MDD) likely involves the interaction and regulation of multiple genes in a network. Data-driven co-expression network module inference has the potential to account for variation within regulatory networks, reduce the dimensionality of RNA-Seq data, and detect significant geneexpression modules associated with depression severity. We performed an RNA-Seq gene co-expression network analysis of mRNA data obtained from the peripheral blood mononuclear cells of unmedicated MDD (n = 78) and healthy control (n = 79) subjects. Across the combined MDD and HC groups, we assigned genes into modules using hierarchical clustering with a dynamic tree cut method and projected the expression data onto a lower-dimensional module space by computing the single-sample gene set enrichment score of each module. We tested the singlesample scores of each module for association with levels of depression severity measured by the Montgomery-ƅsberg Depression Scale (MADRS). Independent of MDD status, we identified 23 gene modules from the co-expression network. Two modules were significantly associated with the MADRS score after multiple comparison adjustment (adjusted p = 0.009, 0.028 at 0.05 FDR threshold), and one of these modules replicated in a previous RNA-Seq study of MDD (p = 0.03). The two MADRS-associated modules contain genes previously implicated in mood disorders and show enrichment of apoptosis and B cell receptor signaling. The genes in these modules show a correlation between network centrality and univariate association with depression, suggesting that intramodular hub genes are more likely to be related to MDD compared to other genes in a module

    A yeast phenomic model for the gene interaction network modulating CFTR-Ī”F508 protein biogenesis

    Get PDF
    BackgroundThe overall influence of gene interaction in human disease is unknown. In cystic fibrosis (CF) a single allele of the cystic fibrosis transmembrane conductance regulator (CFTR-Ī”F508) accounts for most of the disease. In cell models, CFTR-Ī”F508 exhibits defective protein biogenesis and degradation rather than proper trafficking to the plasma membrane where CFTR normally functions. Numerous genes function in the biogenesis of CFTR and influence the fate of CFTR-Ī”F508. However it is not known whether genetic variation in such genes contributes to disease severity in patients. Nor is there an easy way to study how numerous gene interactions involving CFTR-Ī”F would manifest phenotypically.MethodsTo gain insight into the function and evolutionary conservation of a gene interaction network that regulates biogenesis of a misfolded ABC transporter, we employed yeast genetics to develop a 'phenomic' model, in which the CFTR-Ī”F508-equivalent residue of a yeast homolog is mutated (Yor1-Ī”F670), and where the genome is scanned quantitatively for interaction. We first confirmed that Yor1-Ī”F undergoes protein misfolding and has reduced half-life, analogous to CFTR-Ī”F. Gene interaction was then assessed quantitatively by growth curves for approximately 5,000 double mutants, based on alteration in the dose response to growth inhibition by oligomycin, a toxin extruded from the cell at the plasma membrane by Yor1.ResultsFrom a comparative genomic perspective, yeast gene interactions influencing Yor1-Ī”F biogenesis were representative of human homologs previously found to modulate processing of CFTR-Ī”F in mammalian cells. Additional evolutionarily conserved pathways were implicated by the study, and a Ī”F-specific pro-biogenesis function of the recently discovered ER membrane complex (EMC) was evident from the yeast screen. This novel function was validated biochemically by siRNA of an EMC ortholog in a human cell line expressing CFTR-Ī”F508. The precision and accuracy of quantitative high throughput cell array phenotyping (Q-HTCP), which captures tens of thousands of growth curves simultaneously, provided powerful resolution to measure gene interaction on a phenomic scale, based on discrete cell proliferation parameters.ConclusionWe propose phenomic analysis of Yor1-Ī”F as a model for investigating gene interaction networks that can modulate cystic fibrosis disease severity. Although the clinical relevance of the Yor1-Ī”F gene interaction network for cystic fibrosis remains to be defined, the model appears to be informative with respect to human cell models of CFTR-Ī”F. Moreover, the general strategy of yeast phenomics can be employed in a systematic manner to model gene interaction for other diseases relating to pathologies that result from protein misfolding or potentially any disease involving evolutionarily conserved genetic pathways

    A Nonlinear Simulation Framework Supports Adjusting for Age When Analyzing BrainAGE

    Get PDF
    Several imaging modalities, including T1-weighted structural imaging, diffusion tensor imaging, and functional MRI can show chronological age related changes. Employing machine learning algorithms, an individual's imaging data can predict their age with reasonable accuracy. While details vary according to modality, the general strategy is to: (1) extract image-related features, (2) build a model on a training set that uses those features to predict an individual's age, (3) validate the model on a test dataset, producing a predicted age for each individual, (4) define the ā€œBrain Age Gap Estimateā€ (BrainAGE) as the difference between an individual's predicted age and his/her chronological age, (5) estimate the relationship between BrainAGE and other variables of interest, and (6) make inferences about those variables and accelerated or delayed brain aging. For example, a group of individuals with overall positive BrainAGE may show signs of accelerated aging in other variables as well. There is inevitably an overestimation of the age of younger individuals and an underestimation of the age of older individuals due to ā€œregression to the mean.ā€ The correlation between chronological age and BrainAGE may significantly impact the relationship between BrainAGE and other variables of interest when they are also related to age. In this study, we examine the detectability of variable effects under different assumptions. We use empirical results from two separate datasets [training = 475 healthy volunteers, aged 18ā€“60 years (259 female); testing = 489 participants including people with mood/anxiety, substance use, eating disorders and healthy controls, aged 18ā€“56 years (312 female)] to inform simulation parameter selection. Outcomes in simulated and empirical data strongly support the proposal that models incorporating BrainAGE should include chronological age as a covariate. We propose either including age as a covariate in step 5 of the above framework, or employing a multistep procedure where age is regressed on BrainAGE prior to step 5, producing BrainAGE Residualized (BrainAGER) scores

    Population Connectivity of Pelagic Megafauna in the Cuba-Mexico-United States Triangle

    Get PDF
    The timing and extent of international crossings by billfishes, tunas, and sharks in the Cuba-Mexico-United States (U.S.) triangle was investigated using electronic tagging data from eight species that resulted in \u3e22,000 tracking days. Transnational movements of these highly mobile marine predators were pronounced with varying levels of bi- or tri-national population connectivity displayed by each species. Billfishes and tunas moved throughout the Gulf of Mexico and all species investigated (blue marlin, white marlin, Atlantic bluefin tuna, yellowfin tuna) frequently crossed international boundaries and entered the territorial waters of Cuba and/or Mexico. Certain sharks (tiger shark, scalloped hammerhead) displayed prolonged periods of residency in U.S. waters with more limited displacements, while whale sharks and to a lesser degree shortfin mako moved through multiple jurisdictions. The spatial extent of associated movements was generally associated with their differential use of coastal and open ocean pelagic ecosystems. Species with the majority of daily positions in oceanic waters off the continental shelf showed the greatest tendency for transnational movements and typically traveled farther from initial tagging locations. Several species converged on a common seasonal movement pattern between territorial waters of the U.S. (summer) and Mexico (winter)
    • ā€¦
    corecore