2 research outputs found

    Genetic association analysis of complex diseases through information theoretic metrics and linear pleiotropy

    Get PDF
    The main goal of this thesis was to help in the identification of genetic variants that are responsible for complex traits, combining both linear and nonlinear approaches. First, two one-locus approaches were proposed. The first one defined and characterized a novel nonlinear test of genetic association, based on the mutual information measure. This test takes into account the genetic structure of the population. It was applied to the GAW17 dataset and compared to the standard linear test of association. Since the solution of the GAW17 simulation model was known, this study served to characterize the performance of the proposed nonlinear methods in comparison to the linear one. The proposed nonlinear test was able to recover the results obtained with linear methods but also detected an additional SNP in a gene related with the phenotype. In addition, the performance of both tests in terms of their accuracy in classification (AUC) was similar. In contrast, the second approach was an exploratory study on the relationship between SNP variability among species and SNP association with disease, at different genetic regions. Two sets of SNPs were compared, one containing deleterious SNPs and the other defined by neutral SNPs. Both sets were stratified depending on the region where the polymorphisms were located, a feature that may have influenced their conservation across species. It was observed that, for most functional regions, SNPs associated to diseases tend to be significantly less variable across species than neutral SNPs. Second, a novel nonlinear methodology for multiloci genetic association was proposed with the goal of detecting association between combinations of SNPs and a phenotype. The proposed method was based on the mutual information of statistical significance, called MISS. This approach was compared with MLR, the standard linear method used for genetic association based on multiple linear regressions. Both were applied as a relevance criterion of a new multi-solution floating feature selection algorithm (MSSFFS), proposed in the context of multi-loci genetic association for complex diseases. Both were also compared with MECPM, an algorithm for searching predictive multi-loci interactions with a criterion of maximum entropy. The three methods were tested on the SNPs of the F7 gene, and the FVII levels in blood, with the data from the GAIT project. The proposed nonlinear method (MISS) improved the results of traditional genetic association methods, detecting new SNP-SNP interactions. Most of the obtained sets of SNPs were in concordance with the functional results found in the literature where the obtained SNPs have been described as functional elements correlated with the phenotype. Third, a linear methodological framework for the simultaneous study of several phenotypes was proposed. The methodology consisted in building new phenotypic variables, named metaphenotypes, that capture the joint activity of sets of phenotypes involved in a metabolic pathway. These new variables were used in further association tests with the aim of identifying genetic elements related with the underlying biological process as a whole. As a practical implementation, the methodology was applied to the GAIT project dataset with the aim of identifying genetic markers that could be related to the coagulation process as a whole and thus to thrombosis. Three mathematical models were used for the definition of metaphenotypes, corresponding to one PCA and two ICA models. Using this novel approach, already known associations were retrieved but also new candidates were proposed as regulatory genes with a global effect on the coagulation pathway as a whole

    Variability Of Pink Salmon Family Size Has Implications For Conservation And Management Models

    Get PDF
    Thesis (Ph.D.) University of Alaska Fairbanks, 2002In several populations of pink salmon, the short-term dynamics population size was related to both the mean and variance of individual family sizes, because not all families were equally productive. In the marine lifestage, population increases came disproportionately from the most productive families, especially in populations with the highest average marine survival. Moreover, the trait of marine survival itself had a statistically detectable genetic component. This implies that the most favored phenotypes change from generation to generation, and that the marine environment is unpredictable and changing. These results, together with laboratory studies of freshwater survival and measurements of wild pink salmon in Prince William Sound, Alaska, seemed to indicate that family-specific variation in marine survival and variation in egg retention within the redd were the most important potential influences on variation of pink salmon family size in the studied populations, when density was controlled to intermediate levels. These results provide more justification for maintaining stock sizes at intermediate or high levels, and for protecting metapopulation structure. These results also show the importance of variation and instability in the recruitment process of Pacific salmon, and highlight the inadequacy of current models of salmon recruitment, which emphasize stability and long-term averages
    corecore