54 research outputs found

    Association Analysis of Rare Variants in Sequencing Studies

    Get PDF
    Recent advances in sequencing technologies have made it possible to explore the influence of rare variants on complex diseases and traits. Large-scale sequencing studies provide the opportunity to examine the proportion of the missing heritability that is attributable to rare variants. They also pose a range of analytical and computational challenges that cannot be adequately addressed with existing methods. For the association analysis of the rare variants, it is customary to aggregate rare mutations within a gene to perform gene-level association analysis. In the first part of the dissertation, we develop asymptotic and resampling gene-level association tests for a variety of traits and study designs. We employ score statistics under appropriate statistical models to achieve numerical stability and computational efficiency. The resulting software SCORE-Seq features a large collection of utilities devoted to perform gene-level association analysis in different scenarios. Trait-dependent sampling has been adopted in many sequencing projects to reduce cost. In the second part, we provide a valid and efficient maximum likelihood framework for analyzing binary secondary traits under such sampling strategy. We produce the commonly used gene-level association tests and compare our methods with the naive methods ignoring the trait-dependent sampling. A single sequencing study is often underpowered to detect modest genetic effect of rare variants. Several methods are available to conduct meta-analysis for rare variants under fixed-effects models, which assume that the genetic effects are the same across all studies. In practice, genetic associations are likely to be heterogeneous among studies because of differences in population composition, environmental factors, phenotype and genotype measurements, or analysis method. In the third part, we propose a general framework for meta-analysis of sequencing studies that allows the genetic effects to vary among studies. We produce the fixed-effects and random-effects versions of all commonly used gene-level association tests. Our methods take score statistics, rather than individual participant data, as input and thus can accommodate any study designs and any phenotypes. We demonstrate through extensive simulation studies that our tests are more powerful than the existing ones in a wide range of practical situations.Doctor of Philosoph

    Groupwise Learning to Rank Algorithm with Introduction of Activated Weighting

    Get PDF
    Learning to rank (LtR) applies supervised machine learning (SML) technologies to the ranking problems, aiming at optimizing the relevance of input document list. As regard to previous studies on the deep ranking model, the calculation of the relevance of the documents in the list is independent of each other, which lacks consideration of document interactions. In recent years, some new methods are devoted to mining the interaction between documents, such as groupwise scoring function (GSF), which learns multivariate scoring function to jointly judge the correlation, but most of these methods ignore the differences of the interaction between documents, and bring high calculation cost at the same time. In order to solve this problem, this paper proposes a weighted groupwise deep ranking model (W-GSF). In view of the deep interest network in the field of recommendation, this paper intro-duces the idea of adjusting the weight of historical behavior sequence according to the candidate products. On the basis of multivariate scoring method in learning to rank field, this method uses muti-layer feed forword neural networks as main structure, and adds an activation unit into it before the input module, taking advantage of neural networks to adjust the weight of input multiple variables adaptively, so as to mine the differences of cross document relationship. Experiments on the public benchmark dataset MSLR verify the effectiveness of the method. Compared with baseline ranking models, the introduction of activation strategy brings a significant improvement of ranking metrics, and the computational complexity is greatly reduced compared with the same effect learning to rank methods

    Integrated study of copy number states and genotype calls using high-density SNP arrays

    Get PDF
    We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls

    High expression of RNF169 is associated with poor prognosis in pancreatic adenocarcinoma by regulating tumour immune infiltration

    Get PDF
    Background: Pancreatic adenocarcinoma (PAAD) is a highly deadly and aggressive tumour with a poor prognosis. However, the prognostic value of RNF169 and its related mechanisms in PAAD have not been elucidated. In this study, we aimed to explore prognosis-related genes, especially RNF169 in PAAD and to identify novel potential prognostic predictors of PAAD.Methods: The GEPIA and UALCAN databases were used to investigate the expression and prognostic value of RNF169 in PAAD. The correlation between RNF169 expression and immune infiltration was determined by using TIMER and TISIDB. Correlation analysis with starBase was performed to identify a potential regulatory axis of lncRNA-miRNA-RNF169.Results: The data showed that the level of RNF169 mRNA expression in PAAD tissues was higher than that in normal tissues. High RNF169 expression was correlated with poor prognosis in PAAD. In addition, analysis with the TISIDB and TIMER databases revealed that RNF169 expression was positively correlated with tumour immune infiltration in PAAD. Correlation analysis suggested that the long non-coding RNA (lncRNA) AL049555.1 and the microRNA (miRNA) hsa-miR-324-5p were involved in the expression of RNF169, composing a potential regulatory axis to control the progression of PAAD. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses indicated that RNF169 plays a role in PAAD through pathways such as TNF, Hippo, JAK-STAT and Toll-like receptor signaling.Conclusion: In summary, the upregulation of RNF169 expression mediated by ncRNAs might influence immune cell infiltration in the microenvironment; thus, it can be used as a prognostic biomarker and a potential therapeutic target in PAAD

    Robust and powerful differential composition tests on clustered microbiome data

    No full text
    Clustered microbiome data have become prevalent in recent years from designs such as longitudinal studies, family studies, and matched case-control studies. The within-cluster dependence compounds the challenge of the microbiome data analysis. Methods that properly accommodate intra-cluster correlation and features of the microbiome data are needed. We develop robust and powerful differential composition tests for clustered microbiome data. The methods do not rely on any distributional assumptions on the microbial compositions, which provides flexibility to model various correlation structures among taxa and among samples within a cluster. By leveraging the adjusted sandwich covariance estimate, the methods properly accommodate sample dependence within a cluster. Different types of confounding variables can be easily adjusted for in the methods. We perform extensive simulation studies under commonly-adopted clustered data designs to evaluate the methods. The usefulness of the proposed methods is further demonstrated with a real dataset from a longitudinal microbiome study on pregnant women.Non UBCUnreviewedAuthor affiliation: University Wisconsin - MadisonResearche
    corecore