104 research outputs found

    A rapid conditional enumeration haplotyping method in pedigrees

    Get PDF
    Haplotyping in pedigrees provides valuable information for genetic studies (e.g., linkage analysis and association study). In order to identify a set of haplotype configurations with the highest likelihoods for a large pedigree with a large number of linked loci, in our previous work, we proposed a conditional enumeration haplotyping method which sets a threshold for the conditional probabilities of the possible ordered genotypes at every unordered individual-marker to delete some ordered genotypes with low conditional probabilities and then eliminate some haplotype configurations with low likelihoods. In this article we present a rapid haplotyping algorithm based on a modification of our previous method by setting an additional threshold for the ratio of the conditional probability of a haplotype configuration to the largest conditional probability of all haplotype configurations in order to eliminate those configurations with relatively low conditional probabilities. The new algorithm is much more efficient than our previous method and the widely used software SimWalk2

    Case-control association analysis of rheumatoid arthritis with candidate genes using related cases

    Get PDF
    We performed a case-control association analysis of rheumatoid arthritis (RA) for several candidate genes using the North American Rheumatoid Arthritis Consortium (NARAC) data provided in Genetic Analysis Workshop 15. We conducted the case-control association analysis using all related cases and unrelated controls and compared the results with those from the analysis of samples using only one randomly selected case from each family and all unrelated controls. For both analyses we used a weighted composite likelihood ratio test based on single-nucleotide polymorphism (SNP) markers or haplotypes accounting for the correlation among samples within a family. Several SNPs, including R620W in the candidate gene PTPN22, showed an association with RA status, which confirmed previously reported results. Several other SNPs in the candidate genes, such as CTLA4, HAVCR1, and SUMO4, also had rather small p-values (<0.05), suggesting the associations between them and RA. Our results showed that the p-values obtained from the analysis including all related cases were generally smaller than those obtained from the analysis including only one randomly selected case per family. These results, together with the results, based on simulated data, showed that higher power could be achieved using all related cases

    Rare Variant Association Testing by Adaptive Combination of P-values

    Get PDF
    With the development of next-generation sequencing technology, there is a great demand for powerful statistical methods to detect rare variants (minor allele frequencies (MAFs)-MidPmethod (Cheung et al., 2012, Genet Epidemiol 36: 675–685) and propose an approach (named ‘adaptive combination of P-values for rare variant association testing’, abbreviated as ‘ADA’) that adaptively combines per-site P-values with the weights based on MAFs. Before combining P-values, we first imposed a truncation threshold upon the per-site P-values, to guard against the noise caused by the inclusion of neutral variants. ThisADA method is shown to outperform popular burden tests and non-burden tests under many scenarios. ADA is recommended for next-generation sequencing data analysis where many neutral variants may be included in a functional region

    Application of imputation methods to the analysis of rheumatoid arthritis data in genome-wide association studies

    Get PDF
    Most genetic association studies only genotype a small proportion of cataloged single-nucleotide polymorphisms (SNPs) in regions of interest. With the catalogs of high-density SNP data available (e.g., HapMap) to researchers today, it has become possible to impute genotypes at untyped SNPs. This in turn allows us to test those untyped SNPs, the motivation being to increase power in association studies. Several imputation methods and corresponding software packages have been developed for this purpose. The objective of our study is to apply three widely used imputation methods and corresponding software packages to a data from a genome-wide association study of rheumatoid arthritis from the North American Rheumatoid Arthritis Consortium in Genetic Analysis Workshop 16, to compare the performances of the three methods, to evaluate their strengths and weaknesses, and to identify additional susceptibility loci underlying rheumatoid arthritis. The software packages used in this paper included a program for Bayesian imputation-based association mapping (BIMBAM), a program for imputing unobserved genotypes in case-control association studies (IMPUTE), and a program for testing untyped alleles (TUNA). We found some untyped SNP that showed significant association with rheumatoid arthritis. Among them, a few of these were not located near any typed SNP that was found to be significant and thus may be worth further investigation

    Testing for differences in distribution tails to test for differences in 'maximum' lifespan

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Investigators are actively testing interventions intended to increase lifespan and wish to test whether the interventions increase maximum lifespan. Based on the fact that one cannot be assured of observing population maximum lifespans in finite samples, in previous work, we constructed and validated several tests of difference in the upper parts of lifespan distributions between a treatment group and a control group by testing whether the probabilities that observations are above some threshold defining 'old' or being in the tail of the survival distribution are equal in the two groups. However, a limitation of these tests is that they do not consider <it>how much </it>above the threshold any particular observation is.</p> <p>Methods</p> <p>In this article we propose new methods which improve upon our previous tests by considering not only whether an observation is above some threshold, but also the magnitudes by which observations exceed the threshold.</p> <p>Results</p> <p>Simulations show that the new methods control type I error rates quite well and that the power of the new methods is usually higher than that of the tests we previously proposed. In illustrative analyses of two real datasets involving rodents, when setting the threshold equal to 110 (100) weeks for the first (second) datasets, the new methods detected differences in 'maximum lifespan' between groups at nominal alpha levels of 0.01 (0.05) for the first (second) datasets and provided more significant results than competitor tests.</p> <p>Conclusion</p> <p>The new methods not only have good performance in controlling the type I error rates but also improve the power compared with the tests we previously proposed.</p

    MethylPCA: a toolkit to control for confounders in methylome-wide association studies

    Get PDF
    Background In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging due to the extremely large number of methylation sites in the human genome. Result We introduce MethylPCA that is specifically designed to control for potential confounders in studies where the number of methylation sites is extremely large. MethylPCA offers a complete and flexible data analysis including 1) an adaptive method that performs data reduction prior to PCA by empirically combining methylation data of neighboring sites, 2) an efficient algorithm that performs a principal component analysis (PCA) on the ultra high-dimensional data matrix, and 3) association tests. To accomplish this MethylPCA allows for parallel execution of tasks, uses C++ for CPU and I/O intensive calculations, and stores intermediate results to avoid computing the same statistics multiple times or keeping results in memory. Through simulations and an analysis of a real whole methylome MBD-seq study of 1,500 subjects we show that MethylPCA effectively controls for potential confounders. Conclusions MethylPCA provides users a convenient tool to perform MWAS. The software effectively handles the challenge in memory and speed to perform tasks that would be impossible to accomplish using existing software when millions of sites are interrogated with the sample sizes required for MWAS

    Discrimination of benign from malignant breast lesions in dense breasts with model-based analysis of regions-of-interest using directional diffusion-weighted images.

    Get PDF
    BACKGROUND: There is an increasing interest in non-contrast-enhanced magnetic resonance imaging (MRI) for detecting and evaluating breast lesions. We present a methodology utilizing lesion core and periphery region of interest (ROI) features derived from directional diffusion-weighted imaging (DWI) data to evaluate performance in discriminating benign from malignant lesions in dense breasts. METHODS: We accrued 55 dense-breast cases with 69 lesions (31 benign; 38 cancer) at a single institution in a prospective study; cases with ROIs exceeding 7.50 cm RESULTS: The region-growing algorithm for 3D lesion model generation improved inter-observer variability over hand drawn ROIs (DSC: 0.66 vs 0.56 (p \u3c 0.001) with substantial agreement (DSC \u3e 0.8) in 46% vs 13% of cases, respectively (p \u3c 0.001)). The overall classifier improved discrimination over mean ADC, (ROC- area under the curve (AUC): 0.85 vs 0.75 and 0.83 vs 0.74 respectively for the two readers). CONCLUSIONS: A classifier generated from directional DWI information using lesion core and lesion periphery information separately can improve lesion discrimination in dense breasts over mean ADC and should be considered for inclusion in computer-aided diagnosis algorithms. Our model-based ROIs could facilitate standardization of breast MRI computer-aided diagnostics (CADx)
    corecore