25,353 research outputs found
Localizzazione di comunita per similarita su reti peer to peer basate su DHT
La tesi presenta la definizione e l'implementazione di un sistema per la ricerca per similarità di profili di comunità su reti di tipo DHT. Il sistema è caratterizzato dall'utilizzo di funzioni LSH basate su min-wise independent permutations per l'indicizzazione dei profili memorizzati sulla DHT allo scopo di abbattere il consumo di banda e spazio di memorizzazione
Probabilistic existence of regular combinatorial structures
We show the existence of regular combinatorial objects which previously were
not known to exist. Specifically, for a wide range of the underlying
parameters, we show the existence of non-trivial orthogonal arrays, t-designs,
and t-wise permutations. In all cases, the sizes of the objects are optimal up
to polynomial overhead. The proof of existence is probabilistic. We show that a
randomly chosen structure has the required properties with positive yet tiny
probability. Our method allows also to give rather precise estimates on the
number of objects of a given size and this is applied to count the number of
orthogonal arrays, t-designs and regular hypergraphs. The main technical
ingredient is a special local central limit theorem for suitable lattice random
walks with finitely many steps.Comment: An extended abstract of this work [arXiv:1111.0492] appeared in STOC
2012. This version expands the literature discussio
Interval Selection in the Streaming Model
A set of intervals is independent when the intervals are pairwise disjoint.
In the interval selection problem we are given a set of intervals
and we want to find an independent subset of intervals of largest cardinality.
Let denote the cardinality of an optimal solution. We
discuss the estimation of in the streaming model, where we
only have one-time, sequential access to the input intervals, the endpoints of
the intervals lie in , and the amount of the memory is
constrained.
For intervals of different sizes, we provide an algorithm in the data stream
model that computes an estimate of that, with
probability at least , satisfies . For same-length
intervals, we provide another algorithm in the data stream model that computes
an estimate of that, with probability at
least , satisfies . The space used by our algorithms is bounded
by a polynomial in and . We also show that no better
estimations can be achieved using bits of storage.
We also develop new, approximate solutions to the interval selection problem,
where we want to report a feasible solution, that use
space. Our algorithms for the interval selection problem match the optimal
results by Emek, Halld{\'o}rsson and Ros{\'e}n [Space-Constrained Interval
Selection, ICALP 2012], but are much simpler.Comment: Minor correction
Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM
Permutation testing is a non-parametric method for obtaining the max null
distribution used to compute corrected -values that provide strong control
of false positives. In neuroimaging, however, the computational burden of
running such an algorithm can be significant. We find that by viewing the
permutation testing procedure as the construction of a very large permutation
testing matrix, , one can exploit structural properties derived from the
data and the test statistics to reduce the runtime under certain conditions. In
particular, we see that is low-rank plus a low-variance residual. This
makes a good candidate for low-rank matrix completion, where only a very
small number of entries of ( of all entries in our experiments)
have to be computed to obtain a good estimate. Based on this observation, we
present RapidPT, an algorithm that efficiently recovers the max null
distribution commonly obtained through regular permutation testing in
voxel-wise analysis. We present an extensive validation on a synthetic dataset
and four varying sized datasets against two baselines: Statistical
NonParametric Mapping (SnPM13) and a standard permutation testing
implementation (referred as NaivePT). We find that RapidPT achieves its best
runtime performance on medium sized datasets (), with
speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger
datasets () RapidPT outperforms NaivePT (6x - 200x) on all
datasets, and provides large speedups over SnPM13 when more than 10000
permutations (2x - 15x) are needed. The implementation is a standalone toolbox
and also integrated within SnPM13, able to leverage multi-core architectures
when available.Comment: 36 pages, 16 figure
Increasing power for voxel-wise genome-wide association studies : the random field theory, least square kernel machines and fast permutation procedures
Imaging traits are thought to have more direct links to genetic variation than diagnostic measures based on cognitive or clinical assessments and provide a powerful substrate to examine the influence of genetics on human brains. Although imaging genetics has attracted growing attention and interest, most brain-wide genome-wide association studies focus on voxel-wise single-locus approaches, without taking advantage of the spatial information in images or combining the effect of multiple genetic variants. In this paper we present a fast implementation of voxel- and cluster-wise inferences based on the random field theory to fully use the spatial information in images. The approach is combined with a multi-locus model based on least square kernel machines to associate the joint effect of several single nucleotide polymorphisms (SNP) with imaging traits. A fast permutation procedure is also proposed which significantly reduces the number of permutations needed relative to the standard empirical method and provides accurate small p-value estimates based on parametric tail approximation. We explored the relation between 448,294 single nucleotide polymorphisms and 18,043 genes in 31,662 voxels of the entire brain across 740 elderly subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Structural MRI scans were analyzed using tensor-based morphometry (TBM) to compute 3D maps of regional brain volume differences compared to an average template image based on healthy elderly subjects. We find method to be more sensitive compared with voxel-wise single-locus approaches. A number of genes were identified as having significant associations with volumetric changes. The most associated gene was GRIN2B, which encodes the N-methyl-d-aspartate (NMDA) glutamate receptor NR2B subunit and affects both the parietal and temporal lobes in human brains. Its role in Alzheimer's disease has been widely acknowledged and studied, suggesting the validity of the approach. The various advantages over existing approaches indicate a great potential offered by this novel framework to detect genetic influences on human brains
Quantum to Classical Randomness Extractors
The goal of randomness extraction is to distill (almost) perfect randomness
from a weak source of randomness. When the source yields a classical string X,
many extractor constructions are known. Yet, when considering a physical
randomness source, X is itself ultimately the result of a measurement on an
underlying quantum system. When characterizing the power of a source to supply
randomness it is hence a natural question to ask, how much classical randomness
we can extract from a quantum system. To tackle this question we here take on
the study of quantum-to-classical randomness extractors (QC-extractors). We
provide constructions of QC-extractors based on measurements in a full set of
mutually unbiased bases (MUBs), and certain single qubit measurements. As the
first application, we show that any QC-extractor gives rise to entropic
uncertainty relations with respect to quantum side information. Such relations
were previously only known for two measurements. As the second application, we
resolve the central open question in the noisy-storage model [Wehner et al.,
PRL 100, 220502 (2008)] by linking security to the quantum capacity of the
adversary's storage device.Comment: 6+31 pages, 2 tables, 1 figure, v2: improved converse parameters,
typos corrected, new discussion, v3: new reference
b-Bit Minwise Hashing
This paper establishes the theoretical framework of b-bit minwise hashing.
The original minwise hashing method has become a standard technique for
estimating set similarity (e.g., resemblance) with applications in information
retrieval, data management, social networks and computational advertising.
By only storing the lowest bits of each (minwise) hashed value (e.g., b=1
or 2), one can gain substantial advantages in terms of computational efficiency
and storage space. We prove the basic theoretical results and provide an
unbiased estimator of the resemblance for any b. We demonstrate that, even in
the least favorable scenario, using b=1 may reduce the storage space at least
by a factor of 21.3 (or 10.7) compared to using b=64 (or b=32), if one is
interested in resemblance > 0.5
- …