86 research outputs found

    A power study of bivariate LOD score analysis of a complex trait and fear/discomfort with strangers

    Get PDF
    Complex diseases are often reported along with disease-related traits (DRT). Sometimes investigators consider both disease and DRT phenotypes separately and sometimes they consider individuals as affected if they have either the disease or the DRT, or both. We propose instead to consider the joint distribution of the disease and the DRT and do a linkage analysis assuming a pleiotropic model. We evaluated our results through analysis of the simulated datasets provided by Genetic Analysis Workshop 14. We first conducted univariate linkage analysis of the simulated disease, Kofendrerd Personality Disorder and one of its simulated associated traits, phenotype b (fear/discomfort with strangers). Subsequently, we considered the bivariate phenotype, which combined the information on Kofendrerd Personality Disorder and fear/discomfort with strangers. We developed a program to perform bivariate linkage analysis using an extension to the Elston-Stewart peeling method of likelihood calculation. Using this program we considered the microsatellites within 30 cM of the gene pleiotropic for this simulated disease and DRT. Based on 100 simulations of 300 families we observed excellent power to detect linkage within 10 cM of the disease locus using the DRT and the bivariate trait

    Locating disease genes using Bayesian variable selection with the Haseman-Elston method

    Get PDF
    BACKGROUND: We applied stochastic search variable selection (SSVS), a Bayesian model selection method, to the simulated data of Genetic Analysis Workshop 13. We used SSVS with the revisited Haseman-Elston method to find the markers linked to the loci determining change in cholesterol over time. To study gene-gene interaction (epistasis) and gene-environment interaction, we adopted prior structures, which incorporate the relationship among the predictors. This allows SSVS to search in the model space more efficiently and avoid the less likely models. RESULTS: In applying SSVS, instead of looking at the posterior distribution of each of the candidate models, which is sensitive to the setting of the prior, we ranked the candidate variables (markers) according to their marginal posterior probability, which was shown to be more robust to the prior. Compared with traditional methods that consider one marker at a time, our method considers all markers simultaneously and obtains more favorable results. CONCLUSIONS: We showed that SSVS is a powerful method for identifying linked markers using the Haseman-Elston method, even for weak effects. SSVS is very effective because it does a smart search over the entire model space

    A gene-model-free method for linkage analysis of a disease-related-trait based on analysis of proband/sibling pairs

    Get PDF
    In this paper we investigate the power of finding linkage to a disease locus through analysis of the disease-related traits. We propose two family-based gene-model-free linkage statistics. Both involve considering the distribution of the number of alleles identical by descent with the proband and comparing siblings with the disease-related trait to those without the disease-related-trait. The objective is to find linkages to disease-related traits that are pleiotropic for both the disease and the disease-related-traits. The power of these statistics is investigated for Kofendrerd Personality Disorder-related traits a (Joining/founding cults) and trait b (Fear/discomfort with strangers) of the simulated data. The answers were known prior to the execution of the reported analyses. We find that both tests have very high power when applied to the samples created by combining the data of the three cities for which we have nuclear family data

    Incorporation of genetic model parameters for cost-effective designs of genetic association studies using DNA pooling

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Studies of association methods using DNA pooling of single nucleotide polymorphisms (SNPs) have focused primarily on the effects of "machine-error", number of replicates, and the size of the pool. We use the non-centrality parameter (NCP) for the analysis of variance test to compute the approximate power for genetic association tests with DNA pooling data on cases and controls. We incorporate genetic model parameters into the computation of the NCP. Parameters involved in the power calculation are disease allele frequency, frequency of the marker SNP allele in coupling with the disease locus, disease prevalence, genotype relative risk, sample size, genetic model, number of pools, number of replicates of each pool, and the proportion of variance of the pooled frequency estimate due to machine variability. We compute power for different settings of number of replicates and total number of genotypings when the genetic model parameters are fixed. Several significance levels are considered, including stringent significance levels (due to the increasing popularity of 100 K and 500 K SNP "chip" data). We use a factorial design with two to four settings of each parameter and multiple regression analysis to assess which parameters most significantly affect power.</p> <p>Results</p> <p>The power can increase substantially as the genotyping number increases. For a fixed number of genotypings, the power is a function of the number of replicates of each pool such that there is a setting with maximum power. The four most significant parameters affecting power for association are: (1) genotype relative risk, (2) genetic model, (3) sample size, and (4) the interaction term between disease and SNP marker allele probabilities.</p> <p>Conclusion</p> <p>For a fixed number of genotypings, there is an optimal number of replicates of each pool that increases as the number of genotypings increases. Power is not substantially reduced when the number of replicates is close to but not equal to the optimal setting.</p

    Characteristics of replicated single-nucleotide polymorphism genotypes from COGA: Affymetrix and Center for Inherited Disease Research

    Get PDF
    Genetic Analysis Workshop 14 provided re-genotyped single-nucleotide polymorphism (SNP) data. Specifically, both Center for Inherited Disease Research (CIDR) and Affymetrix genotyped the same 11,560 SNPs from the Affymetrix GeneChip Mapping 10K Array marker set on the same 184 individuals from the Collaborative Study on the Genetics of Alcoholism database. While the inconsistency rate between CIDR and Affymetrix (two different genotypes for the same subject) was low (0.2%), the non-replication rate (two different genotypes for the same subject or one identified genotype and one missing genotype) was substantial (9.5%). The missing data could be from no-call regions, which is inconsistent with recent recommendations about the use of no-call regions in association tests. In addition, no-call regions would suggest that the actual inconsistency rate is higher than reported. A high inconsistency rate has significant impact on power in related hypothesis tests. In addition, the data are consistent with assumptions made in a recently proposed likelihood ratio test of association for re-genotyped data

    Power of maximum HLOD tests to detect linkage to obesity genes

    Get PDF
    BACKGROUND: We investigate the power of heterogeneity LOD test to detect linkage when a trait is determined by several major genes using Genetic Analysis Workshop 13 simulated data. We consider three traits, two of which are disease-causing traits: 1) the rate of change in body mass index (BMI); and 2) the maximum BMI; and 3) the disease itself (hypertension). Of interest is the power of "HLOD2", the maximum heterogeneity LOD obtained upon maximizing over the two genetic models. RESULTS: Using a trait phenotype Obesity Slope, we observe that the power to detect the two markers closest to the two genes (S1, S2) at the 0.05 level using HLOD2 is 13% and 10%. The power of HLOD2 for Max BMI phenotype is 12% and 9%. The corresponding values for the Hypertension phenotype are 8% and 6%. CONCLUSION: The power to detect linkage to the slope genes is quite low. But the power using disease-related traits as a phenotype is greater than the power using the disease (hypertension) phenotype

    A Bayesian approach for applying Haseman-Elston methods

    Get PDF
    The main goal of this paper is to couple the Haseman-Elston method with a simple yet effective Bayesian factor-screening approach. This approach selects markers by considering a set of multigenic models that include epistasis effects. The markers are ranked based on their marginal posterior probability. A significant improvement over our previously proposed Bayesian variable selection methodology is a simple Metropolis-Hasting algorithm that requires minimum tuning on the prior settings. The algorithm, however, is also flexible enough for us to easily incorporate our hypotheses and avoid computational pitfalls. We apply our approach to the microsatellite data of Collaborative Studies on Genetics of Alcoholism using the coded values for the ALDX1 variable as our response

    Mixture modeling of microarray gene expression data

    Get PDF
    About 28% of genes appear to have an expression pattern that follows a mixture distribution. We use first- and second-order partial correlation coefficients to identify trios and quartets of non-sex-linked genes that are highly associated and that are also mixtures. We identified 18 trio and 35 quartet mixtures and evaluated their mixture distribution concordance. Concordance was defined as the proportion of observations that simultaneously fall in the component with the higher mean or simultaneously in the component with the lower mean based on their Bayesian posterior probabilities. These trios and quartets have a concordance rate greater than 80%. There are 33 genes involved in these trios and quartets. A factor analysis with varimax rotation identifies three gene groups based on their factor loadings. One group of 18 genes has a concordance rate of 56.7%, another group of 8 genes has a concordance rate of 60.8%, and a third group of 7 genes has a concordance rate of 69.6%. Each of these rates is highly significant, suggesting that there may be strong biological underpinnings for the mixture mechanisms of these genes. Bayesian factor screening confirms this hypothesis by identifying six single-nucleotide polymorphisms that are significantly associated with the expression phenotypes of the five most concordant genes in the first group

    Using mixture models to characterize disease-related traits

    Get PDF
    We consider 12 event-related potentials and one electroencephalogram measure as disease-related traits to compare alcohol-dependent individuals (cases) to unaffected individuals (controls). We use two approaches: 1) two-way analysis of variance (with sex and alcohol dependency as the factors), and 2) likelihood ratio tests comparing sex adjusted values of cases to controls assuming that within each group the trait has a 2 (or 3) component normal mixture distribution. In the second approach, we test the null hypothesis that the parameters of the mixtures are equal for the cases and controls. Based on the two-way analysis of variance, we find 1) males have significantly (p < 0.05) lower mean response values than females for 7 of these traits. 2) Alcohol-dependent cases have significantly lower mean response than controls for 3 traits. The mixture analysis of sex-adjusted values of 1 of these traits, the event-related potential obtained at the parietal midline channel (ttth4), found the appearance of a 3-component normal mixture in cases and controls. The mixtures differed in that the cases had significantly lower mean values than controls and significantly different mixing proportions in 2 of the 3 components. Implications of this study are: 1) Sex needs to be taken into account when studying risk factors for alcohol dependency to prevent finding a spurious association between alcohol dependency and the risk factor. 2) Mixture analysis indicates that for the event-related potential "ttth4", the difference observed reflects strong evidence of heterogeneity of response in both the cases and controls

    Little genetic differentiation as assessed by uniparental markers in the presence of substantial language variation in peoples of the Cross River region of Nigeria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Cross River region in Nigeria is an extremely diverse area linguistically with over 60 distinct languages still spoken today. It is also a region of great historical importance, being a) adjacent to the likely homeland from which Bantu-speaking people migrated across most of sub-Saharan Africa 3000-5000 years ago and b) the location of Calabar, one of the largest centres during the Atlantic slave trade. Over 1000 DNA samples from 24 clans representing speakers of the six most prominent languages in the region were collected and typed for Y-chromosome (SNPs and microsatellites) and mtDNA markers (Hypervariable Segment 1) in order to examine whether there has been substantial gene flow between groups speaking different languages in the region. In addition the Cross River region was analysed in the context of a larger geographical scale by comparison to bordering Igbo speaking groups as well as neighbouring Cameroon populations and more distant Ghanaian communities.</p> <p>Results</p> <p>The Cross River region was shown to be extremely homogenous for both Y-chromosome and mtDNA markers with language spoken having no noticeable effect on the genetic structure of the region, consistent with estimates of inter-language gene flow of 10% per generation based on sociological data. However the groups in the region could clearly be differentiated from others in Cameroon and Ghana (and to a lesser extent Igbo populations). Significant correlations between genetic distance and both geographic and linguistic distance were observed at this larger scale.</p> <p>Conclusions</p> <p>Previous studies have found significant correlations between genetic variation and language in Africa over large geographic distances, often across language families. However the broad sampling strategies of these datasets have limited their utility for understanding the relationship within language families. This is the first study to show that at very fine geographic/linguistic scales language differences can be maintained in the presence of substantial gene flow over an extended period of time and demonstrates the value of dense sampling strategies and having DNA of known and detailed provenance, a practice that is generally rare when investigating sub-Saharan African demographic processes using genetic data.</p
    corecore