9 research outputs found

    Small sample properties of rare variant analysis methods

    Get PDF
    We are now well into the sequencing era of genetic analysis, and methods to investigate rare variants associated with disease remain in high demand. Currently, the more common rare variant analysis methods are burden tests and variance component tests. This report introduces a burden test known as the modified replication based sum statistic and evaluates its performance, and the performance of other common burden and variance component tests under the setting of a small sample size (103 total cases and controls) using the Genetic Analysis Workshop 18 simulated data with complete knowledge of the simulation model. Specifically we look at the variable threshold sum statistic, replication-based sum statistics, the C-alpha, and sequence kernel association test. Using minor allele frequency thresholds of less than 0.05, we find that the modified replication based sum statistic is competitive with all methods and that using 103 individuals leads to all methods being vastly underpowered. Much larger sample sizes are needed to confidently find truly associated genes

    Statistical methods for incorporating biological knowledge into association tests of seqencing data

    No full text
    Recently many rare variant analysis methods have been proposed. However, each method has its own advantages and disadvantages depending on properties of the data. Thus, there is no uniformly most powerful test for rare variant analysis. In this work I propose a statistical framework to improve the statistical power of existing rare variant analysis methods. Specically, I incorporate computational biological knowl- edge into existing rare variant analysis methods. Among the biological knowledge I use for whole exome sequence (WES) is SIFT, Polyphen2, PhyloP and GERP++. For whole genome sequencing (WGS) I use RegulomeDB that is based on the Ency- clopedia of DNA Elements (ENCODE) project. In addition, since the score system of RegulomeDB is categorized into 6 levels, I propose to transform the categories to numerical scores to use as a weight in association tests of WGS. I evaluate and com- pare the proposed methods with existing methods using extensive simulation studies as well as applications to the Genetic Analysis Workshop (GAW) 17 mini-exome se- quencing and GAW 19 WGS data. I also show how to combine multiple sources of biological knowledge and discuss how extreme scores of the transformation of cate- gories can lead to false positive discovery

    Packer Detection for Multi-Layer Executables Using Entropy Analysis

    No full text
    Packing algorithms are broadly used to avoid anti-malware systems, and the proportion of packed malware has been growing rapidly. However, just a few studies have been conducted on detection various types of packing algorithms in a systemic way. Following this understanding, we elaborate a method to classify packing algorithms of a given executable into three categories: single-layer packing, re-packing, or multi-layer packing. We convert entropy values of the executable file loaded into memory into symbolic representations, for which we used SAX (Symbolic Aggregate Approximation). Based on experiments of 2196 programs and 19 packing algorithms, we identify that precision (97.7%), accuracy (97.5%), and recall ( 96.8%) of our method are respectively high to confirm that entropy analysis is applicable in identifying packing algorithms

    Association of clonal hematopoiesis mutations with clinical outcomes: A systematic review and meta‐analysis

    No full text
    Clonal hematopoiesis (CH) mutations are common among individuals without known hematologic disease. CH mutations have been associated with numerous adverse clinical outcomes across many different studies. We systematically reviewed the available literature for clinical outcomes associated with CH mutations in patients without hematologic disease. We searched PubMed, EMBASE, and Scopus for eligible studies. Three investigators independently extracted the data, and each study was verified by a second author. Risk of bias was assessed using the Newcastle-Ottawa Scale. We identified 32 studies with 56 cohorts that examine the association between CH mutations and clinical outcomes. We conducted meta-analyses comparing outcomes among individuals with and without detectable CH mutations. We conducted meta-analyses for cardiovascular diseases (nine studies; HR = 1.61, 95% CI = 1.26-2.07, p = .0002), hematologic malignancies (seven studies; HR = 5.59, 95% CI = 3.31-9.45, p < .0001), therapy-related myeloid neoplasms (four studies; HR = 7.55, 95% CI = 4.3-13.57, p < .001), and death (nine studies; HR = 1.34, 95% CI = 1.2-1.5, p < .0001). The cardiovascular disease analysis was further stratified by variant allele fraction (VAF) and gene, which showed a statistically significant association only with a VAF of ≥ 10% (HR = 1.42, 95% CI = 1.24-1.62, p < .0001), as well as statistically significant associations for each gene examined with the largest magnitude of effect found for CH mutations in JAK2 (HR = 3.5, 95% CI = 1.84-6.68, p < .0001). Analysis of the association of CH mutations with hematologic malignancy demonstrated a numeric stepwise increase in risk with increasing VAF thresholds. This analysis strongly supports the association of CH mutations with a clinically meaningful increased risk of adverse clinical outcomes among individuals without hematologic disease, particularly with increasing VAF thresholds
    corecore