107,694 research outputs found

    Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated.</p> <p>Methods</p> <p>We compare methods for global clustering evaluation including Tango's Index, Moran's <it>I</it>, and Oden's <it>I</it>*<sub><it>pop</it></sub>; and cluster detection methods such as local Moran's <it>I </it>and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States.</p> <p>Results</p> <p>For simulated data with outlier patterns, Tango's MEET, Moran's <it>I </it>and <it>I</it>*<sub><it>pop </it></sub>had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and <it>I</it>*<sub><it>pop </it></sub>(with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's <it>I </it>has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data.</p> <p>Conclusion</p> <p>SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's <it>I</it>*<sub><it>pop </it></sub>perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan.</p

    Event Texture Search for Phase Transitions in Pb+Pb Collisions

    Get PDF
    NA44 uses a 512 channel Si pad array covering 1.5<η<3.31.5 <\eta < 3.3 to study charged hadron production in 158 A GeV Pb+Pb collisions at the CERN SPS. We apply a multiresolution analysis, based on a Discrete Wavelet Transformation, to probe the texture of particle distributions event-by-event, allowing simultaneous localization of features in space and scale. Scanning a broad range of multiplicities, we search for signals of clustering and of critical behavior in the power spectra of local density fluctuations. The data are compared with detailed simulations of detector response, using heavy ion event generators, and with a reference sample created via event mixing. An upper limit is set on the probability and magnitude of dynamical fluctuations

    Deciphering the large-scale environment of radio galaxies in the local Universe: where do they born, grow and die?

    Get PDF
    The role played by the large-scale environment on the nuclear activity of radio galaxies (RGs), is still not completely understood. Accretion mode, jet power and galaxy evolution are connected with their large-scale environment from tens to hundreds of kpc. Here we present a detailed, statistical, analysis of the large-scale environment for two samples of RGs up to redshifts zsrcz_\mathrm{src}=0.15. The main advantages of our study, with respect to those already present in the literature, are due to the extremely homogeneous selection criteria of catalogs adopted to perform our investigation. This is also coupled with the use of several clustering algorithms. We performed a direct search of galaxy-rich environments around RGs using them as beacon. To perform this study we also developed a new method that does not appear to suffer by a strong zsrcz_\mathrm{src} dependence as other algorithms. We conclude that, despite their radio morphological (FR\,I vsvs FR\,II) and/or their optical (HERG vsvs LERG) classification, RGs in the local Universe tend to live in galaxy-rich large-scale environments having similar characteristics and richness. We highlight that the fraction of FR\,Is-LERG, inhabiting galaxy rich environments, appears larger than that of FR\,IIs-LERG. We also found that 5 out of 7 FR\,II-HERGs, with zsrc≀z_\mathrm{src}\leq0.11, lie in groups/clusters of galaxies. However, we recognize that, despite the high level of completeness of our catalogs, when restricting to the local Universe, the low number of HERGs (∌\sim10\% of the total FR\,IIs investigated) prevent us to make a strong statistical conclusion about this source class.Comment: 21 pages, 25 figures, accepted for publication on the Astrophysical Journal Supplement Series - pre-proof versio

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
    • 

    corecore