41 research outputs found

    Mapping Haplotype-haplotype Interactions with Adaptive LASSO

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The genetic etiology of complex diseases in human has been commonly viewed as a complex process involving both genetic and environmental factors functioning in a complicated manner. Quite often the interactions among genetic variants play major roles in determining the susceptibility of an individual to a particular disease. Statistical methods for modeling interactions underlying complex diseases between single genetic variants (e.g. single nucleotide polymorphisms or SNPs) have been extensively studied. Recently, haplotype-based analysis has gained its popularity among genetic association studies. When multiple sequence or haplotype interactions are involved in determining an individual's susceptibility to a disease, it presents daunting challenges in statistical modeling and testing of the interaction effects, largely due to the complicated higher order epistatic complexity.</p> <p>Results</p> <p>In this article, we propose a new strategy in modeling haplotype-haplotype interactions under the penalized logistic regression framework with adaptive <it>L</it><sub>1</sub>-penalty. We consider interactions of sequence variants between haplotype blocks. The adaptive <it>L</it><sub>1</sub>-penalty allows simultaneous effect estimation and variable selection in a single model. We propose a new parameter estimation method which estimates and selects parameters by the modified Gauss-Seidel method nested within the EM algorithm. Simulation studies show that it has low false positive rate and reasonable power in detecting haplotype interactions. The method is applied to test haplotype interactions involved in mother and offspring genome in a small for gestational age (SGA) neonates data set, and significant interactions between different genomes are detected.</p> <p>Conclusions</p> <p>As demonstrated by the simulation studies and real data analysis, the approach developed provides an efficient tool for the modeling and testing of haplotype interactions. The implementation of the method in R codes can be freely downloaded from <url>http://www.stt.msu.edu/~cui/software.html</url>.</p

    Conservation and implications of eukaryote transcriptional regulatory regions across multiple species

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Increasing evidence shows that whole genomes of eukaryotes are almost entirely transcribed into both protein coding genes and an enormous number of non-protein-coding RNAs (ncRNAs). Therefore, revealing the underlying regulatory mechanisms of transcripts becomes imperative. However, for a complete understanding of transcriptional regulatory mechanisms, we need to identify the regions in which they are found. We will call these transcriptional regulation regions, or TRRs, which can be considered functional regions containing a cluster of regulatory elements that cooperatively recruit transcriptional factors for binding and then regulating the expression of transcripts.</p> <p>Results</p> <p>We constructed a hierarchical stochastic language (HSL) model for the identification of core TRRs in yeast based on regulatory cooperation among TRR elements. The HSL model trained based on yeast achieved comparable accuracy in predicting TRRs in other species, e.g., fruit fly, human, and rice, thus demonstrating the conservation of TRRs across species. The HSL model was also used to identify the TRRs of genes, such as p53 or <it>OsALYL1</it>, as well as microRNAs. In addition, the ENCODE regions were examined by HSL, and TRRs were found to pervasively locate in the genomes.</p> <p>Conclusion</p> <p>Our findings indicate that 1) the HSL model can be used to accurately predict core TRRs of transcripts across species and 2) identified core TRRs by HSL are proper candidates for the further scrutiny of specific regulatory elements and mechanisms. Meanwhile, the regulatory activity taking place in the abundant numbers of ncRNAs might account for the ubiquitous presence of TRRs across the genome. In addition, we also found that the TRRs of protein coding genes and ncRNAs are similar in structure, with the latter being more conserved than the former.</p

    An aggregating U-Test for a genetic association study of quantitative traits

    Get PDF
    We propose a novel aggregating U-test for gene-based association analysis. The method considers both rare and common variants. It adaptively searches for potential disease-susceptibility rare variants and collapses them into a single “supervariant.” A forward U-test is then used to assess the joint association of the supervariant and other common variants with quantitative traits. Using 200 simulated replicates from the Genetic Analysis Workshop 17 mini-exome data, we compare the performance of the proposed method with that of a commonly used approach, QuTie. We find that our method has an equivalent or greater power than QuTie to detect nine genes that influence the quantitative trait Q1. This new approach provides a powerful tool for detecting both common and rare variants associated with quantitative traits

    Hybridization modeling of oligonucleotide SNP arrays for accurate DNA copy number estimation

    Get PDF
    Affymetrix SNP arrays have been widely used for single-nucleotide polymorphism (SNP) genotype calling and DNA copy number variation inference. Although numerous methods have achieved high accuracy in these fields, most studies have paid little attention to the modeling of hybridization of probes to off-target allele sequences, which can affect the accuracy greatly. In this study, we address this issue and demonstrate that hybridization with mismatch nucleotides (HWMMN) occurs in all SNP probe-sets and has a critical effect on the estimation of allelic concentrations (ACs). We study sequence binding through binding free energy and then binding affinity, and develop a probe intensity composite representation (PICR) model. The PICR model allows the estimation of ACs at a given SNP through statistical regression. Furthermore, we demonstrate with cell-line data of known true copy numbers that the PICR model can achieve reasonable accuracy in copy number estimation at a single SNP locus, by using the ratio of the estimated AC of each sample to that of the reference sample, and can reveal subtle genotype structure of SNPs at abnormal loci. We also demonstrate with HapMap data that the PICR model yields accurate SNP genotype calls consistently across samples, laboratories and even across array platforms

    A statistical shrinkage model and its applications

    Full text link
    grantor: University of TorontoBridge regression, a special type of penalized regression of a penalty function &mid; bj &mid; g with ã >= 1is considered. The Bridge estimator is obtained by solving the penalized score equations via the modified Newton-Raphson method for ã > 1 or the Shooting method for ã = 1. The Bridge estimator yields small variance with a little sacrifice of bias, and thus achieves small mean squared error and small prediction error when collinearity is present among regressors in a linear regression model. The concept of penalization is generalized via the penalized score equations, which allow the implementation of penalization regardless of the existence of joint likelihood functions. Penalization is then applied to generalized linear models and generalized estimating equations (GEE). The penalty parameter ã and the tuning parameter [lambda] are selected via the generalized cross-validation (GCV). A quasi-GCV is developed to select the parameters for the penalized GEE. Simulation studies show that the Bridge estimator performs well compared to the estimators of ridge regression (ã = 2) and the Lasso (ã = 1). Several data sets from public health studies are analyzed using the Bridge penalty model in the statistical settings of a linear regression model, a logistic regression model and a GEE model for binary outcomes.Ph.D
    corecore