25 research outputs found

    Rare variant testing across methods and thresholds using the multi-kernel sequence kernel association test (MK-SKAT)

    Get PDF
    Analysis of rare genetic variants has focused on region-based analysis wherein a subset of the variants within a genomic region is tested for association with a complex trait. Two important practical challenges have emerged. First, it is difficult to choose which test to use. Second, it is unclear which group of variants within a region should be tested. Both depend on the unknown true state of nature. Therefore, we develop the Multi-Kernel SKAT (MK-SKAT) which tests across a range of rare variant tests and groupings. Specifically, we demonstrate that several popular rare variant tests are special cases of the sequence kernel association test which compares pair-wise similarity in trait value to similarity in the rare variant genotypes between subjects as measured through a kernel function. Choosing a particular test is equivalent to choosing a kernel. Similarly, choosing which group of variants to test also reduces to choosing a kernel. Thus, MK-SKAT uses perturbation to test across a range of kernels. Simulations and real data analyses show that our framework controls type I error while maintaining high power across settings: MK-SKAT loses power when compared to the kernel for a particular scenario but has much greater power than poor choices

    Rare variant testing across methods and thresholds using the multi-kernel sequence kernel association test (MK-SKAT)

    No full text
    Analysis of rare genetic variants has focused on region-based analysis wherein a subset of the variants within a genomic region is tested for association with a complex trait. Two important practical challenges have emerged. First, it is difficult to choose which test to use. Second, it is unclear which group of variants within a region should be tested. Both depend on the unknown true state of nature. Therefore, we develop the Multi-Kernel SKAT (MK-SKAT) which tests across a range of rare variant tests and groupings. Specifically, we demonstrate that several popular rare variant tests are special cases of the sequence kernel association test which compares pair-wise similarity in trait value to similarity in the rare variant genotypes between subjects as measured through a kernel function. Choosing a particular test is equivalent to choosing a kernel. Similarly, choosing which group of variants to test also reduces to choosing a kernel. Thus, MK-SKAT uses perturbation to test across a range of kernels. Simulations and real data analyses show that our framework controls type I error while maintaining high power across settings: MK-SKAT loses power when compared to the kernel for a particular scenario but has much greater power than poor choices

    VCSEL: PRIORITIZING SNP-SET BY PENALIZED VARIANCE COMPONENT SELECTION.

    No full text
    Single nucleotide polymorphism (SNP) set analysis aggregates both common and rare variants and tests for association between phenotype(s) of interest and a set. However, multiple SNP-sets, such as genes, pathways, or sliding windows are usually investigated across the whole genome in which all groups are tested separately, followed by multiple testing adjustments. We propose a novel method to prioritize SNP-sets in a joint multivariate variance component model. Each SNP-set corresponds to a variance component (or kernel), and model selection is achieved by incorporating either convex or nonconvex penalties. The uniqueness of this variance component selection framework, which we call VCSEL, is that it naturally encompasses multivariate traits (VCSEL-M) and SNP-set-treatment or -environment interactions (VCSEL-I). We devise an optimization algorithm scalable to many variance components, based on the majorization-minimization (MM) principle. Simulation studies demonstrate the superiority of our methods in model selection performance, as measured by the area under the precision-recall (PR) curve, compared to the commonly used marginal testing and group penalization methods. Finally, we apply our methods to a real pharmacogenomics study and a real whole exome sequencing study. Some top ranked genes by VCSEL are detected as insignificant by the marginal test methods which emphasizes formal inference of individual genes with a strict significance threshold. This provides alternative insights for biologists to prioritize follow-up studies and develop polygenic risk score models

    Multi-population European validation results.

    No full text
    <p>Medium heterogeneity non-thresholded and thresholded cross-validation results for HLA*IMP:02 and HLA*IMP:01: GS&HLARES_EU 2/3 is used to impute GS&HLARES_EU 1/3. Accuracy (PPV) is measured at 4-digit resolution. “# Validated” refers to the number of validated alleleles (pre-thresholding).</p

    Features of haplotype graph models.

    No full text
    <p>Illustration of the features of haplotype graph models. Haplotype graphs are a subclass of connected directed graphs and belong to the class of acyclic probabilistic finite automata. Their most important properties are illustrated here: 1) They are leveled, i.e. each vertex has an associated positive number 1, and all edges emanating from at level lead to a vertex at level and represent the same genetic locus. Vertices at level are final vertices with no outgoing edges, and there is a path from every vertex in the graph to one of the final vertices. 2) Edges carry “emission symbols” which are emitted when an edge is traversed (in the figure: the symbols after the “|” character adjacent to the edges), and there are no two edges emanating from the same vertex which carry the same symbol. 3) Each vertex has an edge probability distribution over its attached edges (in the figure: the numbers in front of the “|” character adjacent to the edges), according to which an edge is selected conditional in being at that vertex.</p

    Missing data in the inference panel.

    No full text
    <p>4-digit resolution accuracies (PPV) when 70% and 90% of the inference panel SNP genotypes (GS&HLARES_EU 1/3) in the second experiment are randomly set to “missing”. No call threshold is employed. “# Validated” refers to the number of validated alleleles.</p
    corecore