108 research outputs found

    Searching for a trail of evidence in a maze

    Full text link
    Consider a graph with a set of vertices and oriented edges connecting pairs of vertices. Each vertex is associated with a random variable and these are assumed to be independent. In this setting, suppose we wish to solve the following hypothesis testing problem: under the null, the random variables have common distribution N(0,1) while under the alternative, there is an unknown path along which random variables have distribution N(μ,1)N(\mu,1), μ>0\mu> 0, and distribution N(0,1) away from it. For which values of the mean shift μ\mu can one reliably detect and for which values is this impossible? Consider, for example, the usual regular lattice with vertices of the form {(i,j):0i,ijiandjhastheparityofi}\{(i,j):0\le i,-i\le j\le i and j has the parity of i\} and oriented edges (i,j)(i+1,j+s)(i,j)\to (i+1,j+s), where s=±1s=\pm1. We show that for paths of length mm starting at the origin, the hypotheses become distinguishable (in a minimax sense) if μm1/logm\mu_m\gg1/\sqrt{\log m}, while they are not if μm1/logm\mu_m\ll1/\log m. We derive equivalent results in a Bayesian setting where one assumes that all paths are equally likely; there, the asymptotic threshold is μmm1/4\mu_m\approx m^{-1/4}. We obtain corresponding results for trees (where the threshold is of order 1 and independent of the size of the tree), for distributions other than the Gaussian and for other graphs. The concept of the predictability profile, first introduced by Benjamini, Pemantle and Peres, plays a crucial role in our analysis.Comment: Published in at http://dx.doi.org/10.1214/07-AOS526 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nonparametric Detection and Estimation of Highly Oscillatory Signals

    Get PDF
    This thesis considers the problem of detecting and estimating highly oscillatory signals from noisy measurements. These signals are often referred to as chirps in the literature; they are found everywhere in nature, and frequently arise in scientific and engineering problems. Mathematically, they can be written in the general form A(t) exp(ilambda varphi(t)), where lambda is a large constant base frequency, the phase varphi(t) is time-varying, and the envelope A(t) is slowly varying. Given a sequence of noisy measurements, we study two problems seperately: 1) the problem of testing whether or not there is a chirp hidden in the noisy data, and 2) the problem of estimating this chirp from the data. This thesis introduces novel, flexible and practical strategies for addressing these important nonparametric statistical problems. The main idea is to calculate correlations of the data with a rich family of local templates in a first step, the multiscale chirplets, and in a second step, search for meaningful aggregations or chains of chirplets which provide a good global fit to the data. From a physical viewpoint, these chains correspond to realistic signals since they model arbitrary chirps. From an algorithmic viewpoint, these chains are identified as paths in a convenient graph. The key point is that this important underlying graph structure allows to unleash very effective algorithms such as network flow algorithms for finding those chains which optimize a near optimal trade-off between goodness of fit and complexity. Our estimation procedures provide provably near optimal performance over a wide range of chirps and numerical experiments show that both our detection and estimation procedures perform exceptionally well over a broad class of chirps. This thesis also introduces general strategies for extracting signals of unknown duration in long streams of data when we have no idea where these signals may be. The approach is leveraging testing methods designed to detect the presence of signals with known time support. Underlying our methods is a general abstraction which postulates an abstract statistical problem of detecting paths in graphs which have random variables attached to their vertices. The formulation of this problem was inspired by our chirp detection methods and is of great independent interest.</p

    Smoothing Windows for the Synthesis of Gaussian Stationary Random Fields Using Circulant Matrix Embedding

    Get PDF
    When generating Gaussian stationary random fields, a standard method based on circulant matrix embedding usually fails because some of the associated eigenvalues are negative. The eigenvalues can be shown to be nonnegative in the limit of increasing sample size. Computationally feasible large sample sizes, however, rarely lead to nonnegative eigenvalues. Another solution is to extend suitably the covariance function of interest so that the eigenvalues of the embedded circulant matrix become nonnegative in theory. Though such extensions have been found for a number of examples of stationary fields, the method depends on nontrivial constructions in specific cases. In this work, the embedded circulant matrix is smoothed at the boundary by using a cutoff window or overlapping windows over a transition region. The windows are not specific to particular examples of stationary fields. The resulting method modifies the standard circulant embedding, and is easy to use. It is shown that this straightforward approach works for many examples of interest, with the overlapping windows performing consistently better. The method even outperforms in the cases where extending covariance leads to nonnegative eigenvalues in theory, in the sense that the transition region is considerably smaller. The Matlab code implementing the method is publicly available at www.hermir.org

    Whole genome characterization of sequence diversity of 15,220 Icelanders

    Get PDF
    Understanding of sequence diversity is the cornerstone of analysis of genetic disorders, population genetics, and evolutionary biology. Here, we present an update of our sequencing set to 15,220 Icelanders who we sequenced to an average genome-wide coverage of 34X. We identified 39,020,168 autosomal variants passing GATK filters: 31,079,378 SNPs and 7,940,790 indels. Calling de novo mutations (DNMs) is a formidable challenge given the high false positive rate in sequencing datasets relative to the mutation rate. Here we addressed this issue by using segregation of alleles in three-generation families. Using this transmission assay, we controlled the false positive rate and identified 108,778 high quality DNMs. Furthermore, we used our extended family structure and read pair tracing of DNMs to a panel of phased SNPs, to determine the parent of origin of 42,961 DNMs.Peer Reviewe

    Common and rare variants associated with kidney stones and biochemical traits.

    Get PDF
    To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked Files. This article is open access.Kidney stone disease is a complex disorder with a strong genetic component. We conducted a genome-wide association study of 28.3 million sequence variants detected through whole-genome sequencing of 2,636 Icelanders that were imputed into 5,419 kidney stone cases, including 2,172 cases with a history of recurrent kidney stones, and 279,870 controls. We identify sequence variants associating with kidney stones at ALPL (rs1256328[T], odds ratio (OR)=1.21, P=5.8 × 10(-10)) and a suggestive association at CASR (rs7627468[A], OR=1.16, P=2.0 × 10(-8)). Focusing our analysis on coding sequence variants in 63 genes with preferential kidney expression we identify two rare missense variants SLC34A1 p.Tyr489Cys (OR=2.38, P=2.8 × 10(-5)) and TRPV5 p.Leu530Arg (OR=3.62, P=4.1 × 10(-5)) associating with recurrent kidney stones. We also observe associations of the identified kidney stone variants with biochemical traits in a large population set, indicating potential biological mechanism.Rare Kidney Stone Consortium 5U54DK083908-07 National Center for Advancing Translational Sciences (NCATS) Rare Diseases Clinical Research Network (RDCRN) Rare Kidney Stone Consortiu

    Insertion of an SVA-E retrotransposon into the CASP8 gene is associated with protection against prostate cancer

    Get PDF
    To access publisher's full text version of this article, please click on the hyperlink in Additional Links field or click on the hyperlink at the top of the page marked Files. This article is open access.Transcriptional and splicing anomalies have been observed in intron 8 of the CASP8 gene (encoding procaspase-8) in association with cutaneous basal-cell carcinoma (BCC) and linked to a germline SNP rs700635. Here, we show that the rs700635[C] allele, which is associated with increased risk of BCC and breast cancer, is protective against prostate cancer [odds ratio (OR) = 0.91, P = 1.0 × 10(-6)]. rs700635[C] is also associated with failures to correctly splice out CASP8 intron 8 in breast and prostate tumours and in corresponding normal tissues. Investigation of rs700635[C] carriers revealed that they have a human-specific short interspersed element-variable number of tandem repeat-Alu (SINE-VNTR-Alu), subfamily-E retrotransposon (SVA-E) inserted into CASP8 intron 8. The SVA-E shows evidence of prior activity, because it has transduced some CASP8 sequences during subsequent retrotransposition events. Whole-genome sequence (WGS) data were used to tag the SVA-E with a surrogate SNP rs1035142[T] (r(2) = 0.999), which showed associations with both the splicing anomalies (P = 6.5 × 10(-32)) and with protection against prostate cancer (OR = 0.91, P = 3.8 × 10(-7)).National Cancer Research Institute (NCRI) G0500966/75466 Department of Health, Medical Research Council Cancer Research UK University of Cambridge NIHR Department of Health Anniversary Fund of the Austrian National Bank 15079 Medical and Scientific Fund of the Mayor of the City of Vienna 10077 Common Fund of the Office of the Director of the National Institutes of Health NCI NHGRI NHLBI NIDA NIMH NINDS NCI\SAIC-Frederick, Inc. (SAIC-F) 10XS170 Roswell Park Cancer Institute 10XS171 Science Care, Inc. X10S172 SAIC-F 10ST1035 HHSN261200800001E deCODE genetics/AMGEN HHSN268201000029C DA006227 DA033684 N01MH000028 MH090941 MH101814 MH090951 MH090937 MH101820 MH101825 MH090936 MH101819 MH090948 MH101782 MH101810 MH10182

    A sequence variant associating with educational attainment also affects childhood cognition

    Get PDF
    Only a few common variants in the sequence of the genome have been shown to impact cognitive traits. Here we demonstrate that polygenic scores of educational attainment predict specific aspects of childhood cognition, as measured with IQ. Recently, three sequence variants were shown to associate with educational attainment, a confluence phenotype of genetic and environmental factors contributing to academic success. We show that one of these variants associating with educational attainment, rs4851266-T, also associates with Verbal IQ in dyslexic children (P=4.3 x 10(-4), beta=0.16 s.d.). The effect of 0.16 s.d. corresponds to 1.4 IQ points for heterozygotes and 2.8 IQ points for homozygotes. We verified this association in independent samples consisting of adults (P=8.3 x 10(-5), beta=0.12 s.d., combined P=2.2 x 10(-7), beta=0.14 s.d.). Childhood cognition is unlikely to be affected by education attained later in life, and the variant explains a greater fraction of the variance in verbal IQ than in educational attainment (0.7% vs 0.12%,. P=1.0 x 10(-5))
    corecore