363 research outputs found

    Computing Power and Sample Size for Case-Control Association Studies with Copy Number Polymorphism: Application of Mixture-Based Likelihood Ratio Test

    Get PDF
    Recent studies suggest that copy number polymorphisms (CNPs) may play an important role in disease susceptibility and onset. Currently, the detection of CNPs mainly depends on microarray technology. For case-control studies, conventionally, subjects are assigned to a specific CNP category based on the continuous quantitative measure produced by microarray experiments, and cases and controls are then compared using a chi-square test of independence. The purpose of this work is to specify the likelihood ratio test statistic (LRTS) for case-control sampling design based on the underlying continuous quantitative measurement, and to assess its power and relative efficiency (as compared to the chi-square test of independence on CNP counts). The sample size and power formulas of both methods are given. For the latter, the CNPs are classified using the Bayesian classification rule. The LRTS is more powerful than this chi-square test for the alternatives considered, especially alternatives in which the at-risk CNP categories have low frequencies. An example of the application of the LRTS is given for a comparison of CNP distributions in individuals of Caucasian or Taiwanese ethnicity, where the LRTS appears to be more powerful than the chi-square test, possibly due to misclassification of the most common CNP category into a less common category

    SNP Selection in Genome-Wide Association Studies via Penalized Support Vector Machine with MAX Test

    Get PDF
    One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the prediction methods ignoring the genetic models in terms of prediction power and selectivity

    Population Genetics And Mixed Stock Analysis Of Chum Salmon (Oncorhynchus Keta) With Molecular Genetics

    Get PDF
    Thesis (Ph.D.) University of Alaska Fairbanks, 2012Chum salmon (Oncorhynchus keta) are important for subsistence and commercial harvest in Alaska. Variability of returns to western Alaskan drainages that caused economic hardship for stakeholders has led to speculation about reasons, which may include both anthropogenic and environmental causes in the marine environment. Mixed stock analysis (MSA) compares genetic information from an individual caught at sea to a reference baseline of genotypes to assign it to its population of origin. Application of genetic baselines requires several complex steps that can introduce bias. The bias may reduce the accuracy of MSA and result in overly-optimistic evaluations of baselines. Moreover, some applications that minimize bias cannot use informative haploid mitochondrial variation. Costs of baseline development are species-specific and difficult to predict. Finally, because populations of western Alaskan chum salmon demonstrate weak genetic divergence, samples from mixtures cannot be accurately assigned to a population of origin. The chapters of this thesis address three challenges. The first chapter describes technical aspects of genetic marker development. The second chapter describes a method to evaluate the precision and accuracy of a genetic baseline that accepts any type of data and reduces bias that may have been introduced during baseline development. This chapter also includes a method that places a cost on baseline development by predicting the number of markers needed to achieve a given accuracy. The final chapter explores the reasons for the weak genetic structure of western Alaskan chum salmon populations. The results of those analyses and both geological and archaeological data suggest that recent environmental and geological processes may be involved. The methods and analyses in this thesis can be applied to any species and may be particularly useful for other western Alaskan species

    Inference Of Natural Selection In Human Populations And Cancers: Testing, Extending, And Complementing Dn/ds-Like Approaches

    Get PDF
    Heritable traits tend to rise or fall in prevalence over time in accordance with their effect on survival and reproduction; this is the law of natural selection, the driving force behind speciation. Natural selection is both a consequence and (in cancer) a cause of disease. The new abundance of sequencing data has spurred the development of computational techniques to infer the strength of selection across a genome. One technique, dN/dS, compares mutation rates at mutation-tolerant synonymous sites with those at nonsynonymous sites to infer selection. This dissertation tests, extends, and complements dN/dS for inferring selection from sequencing data. First, I test whether the genomic community’s understanding of mutational processes is sufficient to use synonymous mutations to set expectations for nonsynonymous mutations. Second, I extend a dN/dS-like approach to the noncoding genome, where dN/dS is otherwise undefined, using conservation data among mammals. Third, I use evolutionary theory to co-develop a new technique for inferring selection within an individual patient’s tumor. Overall, this work advances our ability to infer selection pressure, prioritize disease-related genomic elements, and ultimately identify new therapeutic targets for patients suffering from a broad range of genetically-influenced diseases

    Variations in Carbon Monoxide, Nitric Oxide, and Detoxification Genes, Interactions with Maternal Smoking, and Associations with Preeclampsia: A Mother-Child Dyad Analysis

    Get PDF
    Preeclampsia is a serious pregnancy complication with limited treatment. Etiology is hypothesized to originate during placentation, and may have both maternal and fetal contributions. There is a well-established enigmatic inverse relationship between maternal smoking and preeclampsia. A plausible biological explanation for this relationship is through response to cigarette smoke components, via vasodilation or activation of smoking detoxification pathways. Examining genes in these pathways and their modification by smoking, while incorporating maternal and child genetic contributions, could provide support for a genetic or related biological mechanism. We conducted a nested case-control study within the Norwegian Mother and Child Birth Cohort of 1,545 case-pairs and 995 control-pairs from 2,540 validated dyads (2,011 complete pairs, 529 missing mother or child genotype). For aim 1, we selected 1,518 single nucleotide polymorphisms (SNPs) in nitric oxide and carbon monoxide signaling pathways. For aim 2, we analyzed these and 397 additional SNPs in smoking detoxification pathways for their modification by maternal smoking during placentation. We used log-linear Poisson regression models and likelihood ratio tests to assess maternal and child effects and included a SNP by smoking interaction term to assess maternal and child genotype-smoking interactions. The child variant, rs12547243 in adenylate cyclase 8 (ADCY8), was associated with an increased risk (RR=1.42 [95% CI: 1.20, 1.69] for AG vs GG, RR=2.14 [1.47, 3.11] for AA vs GG, Q=0.03). We also found suggestive associations of SNPs in PDE1C for preeclampsia sub-phenotypes. We found limited evidence for multiplicative SNP by smoking interaction after correction for multiple comparisons. This study uses a novel approach to disentangle maternal and child genotypic effects of smoking-related genes on preeclampsia. Our findings do not provide strong support that the inverse smoking-preeclampsia relationship is due to a genetic effect in these pathways, although our power was limited due to the low prevalence of smoking in this population. Dyad methods and gene-environment interaction analysis may be useful for the study of pregnancy outcomes, particularly preeclampsia. Larger populations, such as multi-cohort consortia combined with these evolving methods may be necessary to dissect this enigmatic association.Doctor of Philosoph

    Addressing Issues in the Detection of Gene-Environment Interaction Through the Study of Conduct Disorder

    Get PDF
    This work addresses issues in the study of gene-environment interaction (GxE) through research of conduct disorder (CD) among adolescents and extends the recent report of significant GxE and subsequent replication studies. A sub-sample of 1,299 individual participants/649 twin pairs and their parents from the Virginia Twin Study of Adolescent and Behavioral Development was used for whom Monoamine Oxidase A (MAOA) genotype, diagnosis of CD, maternal antisocial personality symptoms, and household neglect were obtained. This dissertation (1) tested for GxE by gender using MAOA and childhood adversity using multiple approaches to CD measurement and model assessment, (2) determined whether other mechanisms would explain differences in GxE by gender and (3) identified and assessed other genes and environments related to the interaction MAOA and childhood adversity. Using a multiple regression approach, a main effect of the low/low MAOA genotype remained after controlling other risk factors in females. However, the effects of GxE were modest and were removed by transforming the environmental measures. In contrast, there was no significant effect of the low activity MAOA allele in males although significant GxE was detected and remained after transformation. The sign of the interaction for males was opposite from females, indicating genetic sensitivity to childhood adversity may differ by gender. Upon further investigation, gender differences in GxE were due to genotype-sex interaction and may involve MAOA. A Markov Chain Monte Carlo approach including a genetic Item Response Theory modeled CD as a trait with continuous liability, since false detection of GxE may result from measurement. In males and females, the inclusion of GxE while controlling for the other covariates was appropriate, but was little improvement in model fit and effect sizes of GxE were small. Other candidate genes functioning in the serotonin and dopamine neurotransmitter systems were tested for interaction with MAOA to affect risk for CD. Main genetic effects of dopamine transporter genotype and MAOA in the presence of comorbidity were detected. No epistatic effects were detected. The use of random forests systematically assessed the environment and produced several interesting environments that will require more thoughtful consideration before incorporation into a model testing GxE

    Strategies for Genome-Wide Association Analyses of Raw Copy Number Variation Data

    Get PDF
    Copy number variations (CNVs), as one type of genetic variation in which a large sequence of nucleotides is repeated in tandem multiple times to a variable extent among different individuals of one population, have gained much attention with regard to human phenotypic diversity. Recent efforts to map human structural variation have shown that CNVs affect a significantly larger proportion of the human genome than single nucleotide polymorphisms (SNPs). This gave rise to the idea of CNVs playing an important role in explaining some of the large proportion of the phenotypic variance in a population that is due to genetic factors and that could not yet be explained by common SNPs. Current data from SNP genotyping arrays were found to be useful not only for the genome-wide genotyping of SNPs, but also for the detection of CNVs. However, due to the mostly still inadequate accuracy of CNV detection and the rareness of provided methods for association testing, to design a genome-wide CNV association study can be a challenge. This thesis explored four strategies for the genome-wide association analyses of raw CNV data being derived from the Affymetrix Genome-Wide Human SNP Array 6.0. Initially, the two most commonly used strategic approaches are presented and applied to real data examples for the phenotypes early-onset extreme obesity and childhood attention - deficit / hyperactivity disorder (ADHD). On the one hand, raw intensity values reflecting individual copy numbers are directly tested for an association with the risk of disease, without providing or making use of any information about CNV genotypes. On the other hand, genome-wide CNV analyses are performed as a two-step procedure in first calling individual CNV genotypes and then using these to test for CNV - phenotype associations. Secondly, two extensions of the standard strategies are introduced, which both form its own strategy with a special focus on the intention to overcome problems and weaknesses of the respective widely used strategy. In this sense, one proposed strategy accounts for the fact that thousands of array-provided CNV marker are located in genomic regions without underlying copy number variability, and thus suggests to test only a pre-selected set of relevant and informative intensity values for associations in order to relax the multiple testing issue. Furthermore, the second proposed strategy addresses the known inaccuracy of CNV calling in especially common CNV regions that is often caused to some extent by the high CNV population frequency and the consequent inadequacy of estimating CNV genotypes relative to sample's mean or median hybridization intensity values. Instead, the use of intensity reference values being estimated in a Gaussian mixture model framework, called MCMR, is investigated in application to data examples for the HapMap and replicate samples as well as to the previously analysed obesity data set. The latter obesity sample has been analysed in use of all four genome-wide CNV analyses strategies which allowed a comparison on the strategy's applicability and performance. The four strategies were observed to greatly vary in terms of computing efforts and genetic results. Whereas one of the two standard strategies was successful in the identification of rare CNVs at the PARK2 locus being genome-wide statistitically significantly associated with ADHD in children, none of these two strategies detected any CNV - obesity association. Contrarily, alternative MCMR reference intensity values showed improved reliability of CNV calls compared to standard calling in terms of stability, reproducibility and false positive rates. As a consequence, a novel common CNV for early-onset extreme obesity on chromosome 11q11 was identified in application of the proposed analyses strategies. Moreover, a common deletion at chromosome 10q11.22, which was previously reported to be associated with body mass index (BMI), was also replicated in use of one the proposed strategies. The results suggest that the choice of the genome-wide CNV association analyses strategy may greatly influence genetic results. The presented strategic investigations presented here give an overview on aspects to consider when planning a genome-wide CNV analyses pipeline, but do not allow general recommendations towards an optimal design

    Pharmacogenetic epidemiology of statins in an ageing population

    Get PDF

    Pharmacogenetic epidemiology of statins in an ageing population

    Get PDF
    • …
    corecore