115 research outputs found
Noise-Resilient Group Testing: Limitations and Constructions
We study combinatorial group testing schemes for learning -sparse Boolean
vectors using highly unreliable disjunctive measurements. We consider an
adversarial noise model that only limits the number of false observations, and
show that any noise-resilient scheme in this model can only approximately
reconstruct the sparse vector. On the positive side, we take this barrier to
our advantage and show that approximate reconstruction (within a satisfactory
degree of approximation) allows us to break the information theoretic lower
bound of that is known for exact reconstruction of
-sparse vectors of length via non-adaptive measurements, by a
multiplicative factor .
Specifically, we give simple randomized constructions of non-adaptive
measurement schemes, with measurements, that allow efficient
reconstruction of -sparse vectors up to false positives even in the
presence of false positives and false negatives within the
measurement outcomes, for any constant . We show that, information
theoretically, none of these parameters can be substantially improved without
dramatically affecting the others. Furthermore, we obtain several explicit
constructions, in particular one matching the randomized trade-off but using measurements. We also obtain explicit constructions
that allow fast reconstruction in time \poly(m), which would be sublinear in
for sufficiently sparse vectors. The main tool used in our construction is
the list-decoding view of randomness condensers and extractors.Comment: Full version. A preliminary summary of this work appears (under the
same title) in proceedings of the 17th International Symposium on
Fundamentals of Computation Theory (FCT 2009
A New Approach to Speeding Up Topic Modeling
Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic
modeling paradigm, and recently finds many applications in computer vision and
computational biology. In this paper, we propose a fast and accurate batch
algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA
algorithms require repeated scanning of the entire corpus and searching the
complete topic space. To process massive corpora having a large number of
topics, the training iteration of batch LDA algorithms is often inefficient and
time-consuming. To accelerate the training speed, ABP actively scans the subset
of corpus and searches the subset of topic space for topic modeling, therefore
saves enormous training time in each iteration. To ensure accuracy, ABP selects
only those documents and topics that contribute to the largest residuals within
the residual belief propagation (RBP) framework. On four real-world corpora,
ABP performs around to times faster than state-of-the-art batch LDA
algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure
Hypergraph-based Source Codes for Function Computation Under Maximal Distortion
This work investigates functional source coding problems with maximal
distortion, motivated by approximate function computation in many modern
applications. The maximal distortion treats imprecise reconstruction of a
function value as good as perfect computation if it deviates less than a
tolerance level, while treating reconstruction that differs by more than that
level as a failure. Using a geometric understanding of the maximal distortion,
we propose a hypergraph-based source coding scheme for function computation
that is constructive in the sense that it gives an explicit procedure for
defining auxiliary random variables. Moreover, we find that the
hypergraph-based coding scheme achieves the optimal rate-distortion function in
the setting of coding for computing with side information and the Berger-Tung
sum-rate inner bound in the setting of distributed source coding for computing.
It also achieves the El Gamal-Cover inner bound for multiple description coding
for computing and is optimal for successive refinement and cascade multiple
description problems for computing. Lastly, the benefit of complexity reduction
of finding a forward test channel is shown for a class of Markov sources
Point Cloud in the Air
Acquisition and processing of point clouds (PCs) is a crucial enabler for
many emerging applications reliant on 3D spatial data, such as robot
navigation, autonomous vehicles, and augmented reality. In most scenarios, PCs
acquired by remote sensors must be transmitted to an edge server for fusion,
segmentation, or inference. Wireless transmission of PCs not only puts on
increased burden on the already congested wireless spectrum, but also confronts
a unique set of challenges arising from the irregular and unstructured nature
of PCs. In this paper, we meticulously delineate these challenges and offer a
comprehensive examination of existing solutions while candidly acknowledging
their inherent limitations. In response to these intricacies, we proffer four
pragmatic solution frameworks, spanning advanced techniques, hybrid schemes,
and distributed data aggregation approaches. In doing so, our goal is to chart
a path toward efficient, reliable, and low-latency wireless PC transmission
COMPARISON BETWEEN (RLE AND HUFFMAN) ALGORITHMSFOR LOSSLESS DATA COMPRESSION
Multimedia field is distinguished from other areas of the need for massive storage volumes. This caused a lot of problems, particularly the speed of reading files when (transmission and reception) and increase the cost (up capacities petition) was to be the presence of ways we can get rid of these problems resulting from the increase Size was one of the  successful solutions innovation algorithms to compress files. This paper aims to compare  between (RLE and Huffman) algorithms which are also non-compression algorithms devoid texts, according to the standard file size. Propagated the comparison between the original file size and file size after compression using (RLE & HUFFMAN) algorithms for more than (30) text file. We used c++ program to compress the files and Microsoft excel program in the description analysis so as to calculate the compression ratio and others things  . The study pointed to the effectiveness of the algorithm (HUFFMAN) in the process of reducing the size of the file
A spatial mediator model for integrating heterogeneous spatial data
The complexity and richness of geospatial data create specific problems in heterogeneous data integration. To deal with this type of data integration, we propose a spatial mediator embedded in a large distributed mobile environment (GeoGrid). The spatial mediator takes a user request from a field application and uses the request to select the appropriate data sources, constructs subqueries for the selected data sources, defines the process of combining the results from the subqueries, and develop an integration script that controls the integration process in order to respond to the request. The spatial mediator uses ontologies to support search for both geographic location based on symbolic terms as well as providing a term-based index to spatial data sources based on the relational model. In our approach, application designers only need to be aware of a minimum amount about the queries needed to supply users with the required data. The key part of this research has been the development of the spatial mediator that can dynamically respond to requests within the GeoGrid environment for geographic maps and related relational spatial data
Applications of Derandomization Theory in Coding
Randomized techniques play a fundamental role in theoretical computer science
and discrete mathematics, in particular for the design of efficient algorithms
and construction of combinatorial objects. The basic goal in derandomization
theory is to eliminate or reduce the need for randomness in such randomized
constructions. In this thesis, we explore some applications of the fundamental
notions in derandomization theory to problems outside the core of theoretical
computer science, and in particular, certain problems related to coding theory.
First, we consider the wiretap channel problem which involves a communication
system in which an intruder can eavesdrop a limited portion of the
transmissions, and construct efficient and information-theoretically optimal
communication protocols for this model. Then we consider the combinatorial
group testing problem. In this classical problem, one aims to determine a set
of defective items within a large population by asking a number of queries,
where each query reveals whether a defective item is present within a specified
group of items. We use randomness condensers to explicitly construct optimal,
or nearly optimal, group testing schemes for a setting where the query outcomes
can be highly unreliable, as well as the threshold model where a query returns
positive if the number of defectives pass a certain threshold. Finally, we
design ensembles of error-correcting codes that achieve the
information-theoretic capacity of a large class of communication channels, and
then use the obtained ensembles for construction of explicit capacity achieving
codes.
[This is a shortened version of the actual abstract in the thesis.]Comment: EPFL Phd Thesi
- âŠ