25,353 research outputs found

    Localizzazione di comunita per similarita su reti peer to peer basate su DHT

    Get PDF
    La tesi presenta la definizione e l'implementazione di un sistema per la ricerca per similarità di profili di comunità su reti di tipo DHT. Il sistema è caratterizzato dall'utilizzo di funzioni LSH basate su min-wise independent permutations per l'indicizzazione dei profili memorizzati sulla DHT allo scopo di abbattere il consumo di banda e spazio di memorizzazione

    Probabilistic existence of regular combinatorial structures

    Full text link
    We show the existence of regular combinatorial objects which previously were not known to exist. Specifically, for a wide range of the underlying parameters, we show the existence of non-trivial orthogonal arrays, t-designs, and t-wise permutations. In all cases, the sizes of the objects are optimal up to polynomial overhead. The proof of existence is probabilistic. We show that a randomly chosen structure has the required properties with positive yet tiny probability. Our method allows also to give rather precise estimates on the number of objects of a given size and this is applied to count the number of orthogonal arrays, t-designs and regular hypergraphs. The main technical ingredient is a special local central limit theorem for suitable lattice random walks with finitely many steps.Comment: An extended abstract of this work [arXiv:1111.0492] appeared in STOC 2012. This version expands the literature discussio

    Interval Selection in the Streaming Model

    Full text link
    A set of intervals is independent when the intervals are pairwise disjoint. In the interval selection problem we are given a set I\mathbb{I} of intervals and we want to find an independent subset of intervals of largest cardinality. Let α(I)\alpha(\mathbb{I}) denote the cardinality of an optimal solution. We discuss the estimation of α(I)\alpha(\mathbb{I}) in the streaming model, where we only have one-time, sequential access to the input intervals, the endpoints of the intervals lie in {1,...,n}\{1,...,n \}, and the amount of the memory is constrained. For intervals of different sizes, we provide an algorithm in the data stream model that computes an estimate α^\hat\alpha of α(I)\alpha(\mathbb{I}) that, with probability at least 2/32/3, satisfies 12(1ε)α(I)α^α(I)\tfrac 12(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I}). For same-length intervals, we provide another algorithm in the data stream model that computes an estimate α^\hat\alpha of α(I)\alpha(\mathbb{I}) that, with probability at least 2/32/3, satisfies 23(1ε)α(I)α^α(I)\tfrac 23(1-\varepsilon) \alpha(\mathbb{I}) \le \hat\alpha \le \alpha(\mathbb{I}). The space used by our algorithms is bounded by a polynomial in ε1\varepsilon^{-1} and logn\log n. We also show that no better estimations can be achieved using o(n)o(n) bits of storage. We also develop new, approximate solutions to the interval selection problem, where we want to report a feasible solution, that use O(α(I))O(\alpha(\mathbb{I})) space. Our algorithms for the interval selection problem match the optimal results by Emek, Halld{\'o}rsson and Ros{\'e}n [Space-Constrained Interval Selection, ICALP 2012], but are much simpler.Comment: Minor correction

    Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM

    Get PDF
    Permutation testing is a non-parametric method for obtaining the max null distribution used to compute corrected pp-values that provide strong control of false positives. In neuroimaging, however, the computational burden of running such an algorithm can be significant. We find that by viewing the permutation testing procedure as the construction of a very large permutation testing matrix, TT, one can exploit structural properties derived from the data and the test statistics to reduce the runtime under certain conditions. In particular, we see that TT is low-rank plus a low-variance residual. This makes TT a good candidate for low-rank matrix completion, where only a very small number of entries of TT (0.35%\sim0.35\% of all entries in our experiments) have to be computed to obtain a good estimate. Based on this observation, we present RapidPT, an algorithm that efficiently recovers the max null distribution commonly obtained through regular permutation testing in voxel-wise analysis. We present an extensive validation on a synthetic dataset and four varying sized datasets against two baselines: Statistical NonParametric Mapping (SnPM13) and a standard permutation testing implementation (referred as NaivePT). We find that RapidPT achieves its best runtime performance on medium sized datasets (50n20050 \leq n \leq 200), with speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger datasets (n200n \geq 200) RapidPT outperforms NaivePT (6x - 200x) on all datasets, and provides large speedups over SnPM13 when more than 10000 permutations (2x - 15x) are needed. The implementation is a standalone toolbox and also integrated within SnPM13, able to leverage multi-core architectures when available.Comment: 36 pages, 16 figure

    Increasing power for voxel-wise genome-wide association studies : the random field theory, least square kernel machines and fast permutation procedures

    Get PDF
    Imaging traits are thought to have more direct links to genetic variation than diagnostic measures based on cognitive or clinical assessments and provide a powerful substrate to examine the influence of genetics on human brains. Although imaging genetics has attracted growing attention and interest, most brain-wide genome-wide association studies focus on voxel-wise single-locus approaches, without taking advantage of the spatial information in images or combining the effect of multiple genetic variants. In this paper we present a fast implementation of voxel- and cluster-wise inferences based on the random field theory to fully use the spatial information in images. The approach is combined with a multi-locus model based on least square kernel machines to associate the joint effect of several single nucleotide polymorphisms (SNP) with imaging traits. A fast permutation procedure is also proposed which significantly reduces the number of permutations needed relative to the standard empirical method and provides accurate small p-value estimates based on parametric tail approximation. We explored the relation between 448,294 single nucleotide polymorphisms and 18,043 genes in 31,662 voxels of the entire brain across 740 elderly subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Structural MRI scans were analyzed using tensor-based morphometry (TBM) to compute 3D maps of regional brain volume differences compared to an average template image based on healthy elderly subjects. We find method to be more sensitive compared with voxel-wise single-locus approaches. A number of genes were identified as having significant associations with volumetric changes. The most associated gene was GRIN2B, which encodes the N-methyl-d-aspartate (NMDA) glutamate receptor NR2B subunit and affects both the parietal and temporal lobes in human brains. Its role in Alzheimer's disease has been widely acknowledged and studied, suggesting the validity of the approach. The various advantages over existing approaches indicate a great potential offered by this novel framework to detect genetic influences on human brains

    Quantum to Classical Randomness Extractors

    Full text link
    The goal of randomness extraction is to distill (almost) perfect randomness from a weak source of randomness. When the source yields a classical string X, many extractor constructions are known. Yet, when considering a physical randomness source, X is itself ultimately the result of a measurement on an underlying quantum system. When characterizing the power of a source to supply randomness it is hence a natural question to ask, how much classical randomness we can extract from a quantum system. To tackle this question we here take on the study of quantum-to-classical randomness extractors (QC-extractors). We provide constructions of QC-extractors based on measurements in a full set of mutually unbiased bases (MUBs), and certain single qubit measurements. As the first application, we show that any QC-extractor gives rise to entropic uncertainty relations with respect to quantum side information. Such relations were previously only known for two measurements. As the second application, we resolve the central open question in the noisy-storage model [Wehner et al., PRL 100, 220502 (2008)] by linking security to the quantum capacity of the adversary's storage device.Comment: 6+31 pages, 2 tables, 1 figure, v2: improved converse parameters, typos corrected, new discussion, v3: new reference

    b-Bit Minwise Hashing

    Full text link
    This paper establishes the theoretical framework of b-bit minwise hashing. The original minwise hashing method has become a standard technique for estimating set similarity (e.g., resemblance) with applications in information retrieval, data management, social networks and computational advertising. By only storing the lowest bb bits of each (minwise) hashed value (e.g., b=1 or 2), one can gain substantial advantages in terms of computational efficiency and storage space. We prove the basic theoretical results and provide an unbiased estimator of the resemblance for any b. We demonstrate that, even in the least favorable scenario, using b=1 may reduce the storage space at least by a factor of 21.3 (or 10.7) compared to using b=64 (or b=32), if one is interested in resemblance > 0.5
    corecore