115 research outputs found

    Noise-Resilient Group Testing: Limitations and Constructions

    Full text link
    We study combinatorial group testing schemes for learning dd-sparse Boolean vectors using highly unreliable disjunctive measurements. We consider an adversarial noise model that only limits the number of false observations, and show that any noise-resilient scheme in this model can only approximately reconstruct the sparse vector. On the positive side, we take this barrier to our advantage and show that approximate reconstruction (within a satisfactory degree of approximation) allows us to break the information theoretic lower bound of Ω~(d2log⁥n)\tilde{\Omega}(d^2 \log n) that is known for exact reconstruction of dd-sparse vectors of length nn via non-adaptive measurements, by a multiplicative factor Ω~(d)\tilde{\Omega}(d). Specifically, we give simple randomized constructions of non-adaptive measurement schemes, with m=O(dlog⁥n)m=O(d \log n) measurements, that allow efficient reconstruction of dd-sparse vectors up to O(d)O(d) false positives even in the presence of Ύm\delta m false positives and O(m/d)O(m/d) false negatives within the measurement outcomes, for any constant Ύ<1\delta < 1. We show that, information theoretically, none of these parameters can be substantially improved without dramatically affecting the others. Furthermore, we obtain several explicit constructions, in particular one matching the randomized trade-off but using m=O(d1+o(1)log⁥n)m = O(d^{1+o(1)} \log n) measurements. We also obtain explicit constructions that allow fast reconstruction in time \poly(m), which would be sublinear in nn for sufficiently sparse vectors. The main tool used in our construction is the list-decoding view of randomness condensers and extractors.Comment: Full version. A preliminary summary of this work appears (under the same title) in proceedings of the 17th International Symposium on Fundamentals of Computation Theory (FCT 2009

    A New Approach to Speeding Up Topic Modeling

    Full text link
    Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic modeling paradigm, and recently finds many applications in computer vision and computational biology. In this paper, we propose a fast and accurate batch algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA algorithms require repeated scanning of the entire corpus and searching the complete topic space. To process massive corpora having a large number of topics, the training iteration of batch LDA algorithms is often inefficient and time-consuming. To accelerate the training speed, ABP actively scans the subset of corpus and searches the subset of topic space for topic modeling, therefore saves enormous training time in each iteration. To ensure accuracy, ABP selects only those documents and topics that contribute to the largest residuals within the residual belief propagation (RBP) framework. On four real-world corpora, ABP performs around 1010 to 100100 times faster than state-of-the-art batch LDA algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure

    Hypergraph-based Source Codes for Function Computation Under Maximal Distortion

    Full text link
    This work investigates functional source coding problems with maximal distortion, motivated by approximate function computation in many modern applications. The maximal distortion treats imprecise reconstruction of a function value as good as perfect computation if it deviates less than a tolerance level, while treating reconstruction that differs by more than that level as a failure. Using a geometric understanding of the maximal distortion, we propose a hypergraph-based source coding scheme for function computation that is constructive in the sense that it gives an explicit procedure for defining auxiliary random variables. Moreover, we find that the hypergraph-based coding scheme achieves the optimal rate-distortion function in the setting of coding for computing with side information and the Berger-Tung sum-rate inner bound in the setting of distributed source coding for computing. It also achieves the El Gamal-Cover inner bound for multiple description coding for computing and is optimal for successive refinement and cascade multiple description problems for computing. Lastly, the benefit of complexity reduction of finding a forward test channel is shown for a class of Markov sources

    Point Cloud in the Air

    Full text link
    Acquisition and processing of point clouds (PCs) is a crucial enabler for many emerging applications reliant on 3D spatial data, such as robot navigation, autonomous vehicles, and augmented reality. In most scenarios, PCs acquired by remote sensors must be transmitted to an edge server for fusion, segmentation, or inference. Wireless transmission of PCs not only puts on increased burden on the already congested wireless spectrum, but also confronts a unique set of challenges arising from the irregular and unstructured nature of PCs. In this paper, we meticulously delineate these challenges and offer a comprehensive examination of existing solutions while candidly acknowledging their inherent limitations. In response to these intricacies, we proffer four pragmatic solution frameworks, spanning advanced techniques, hybrid schemes, and distributed data aggregation approaches. In doing so, our goal is to chart a path toward efficient, reliable, and low-latency wireless PC transmission

    COMPARISON BETWEEN (RLE AND HUFFMAN) ALGORITHMSFOR LOSSLESS DATA COMPRESSION

    Get PDF
    Multimedia field is distinguished from other areas of the need for massive storage volumes. This caused a lot of problems, particularly the speed of reading files when (transmission and reception) and increase the cost (up capacities petition) was to be the presence of ways we can get rid of these problems resulting from the increase Size was one of the  successful solutions innovation algorithms to compress files. This paper aims to compare  between (RLE and Huffman) algorithms  which are also non-compression algorithms devoid texts, according to the standard file size. Propagated the comparison between the original file size and file size after compression using  (RLE &amp; HUFFMAN) algorithms for more than (30) text file. We used c++ program to compress the files and Microsoft excel program in the description  analysis so as to calculate the compression ratio and others things  . The study pointed to the effectiveness of the algorithm (HUFFMAN) in the process of reducing the size of the file

    A spatial mediator model for integrating heterogeneous spatial data

    Get PDF
    The complexity and richness of geospatial data create specific problems in heterogeneous data integration. To deal with this type of data integration, we propose a spatial mediator embedded in a large distributed mobile environment (GeoGrid). The spatial mediator takes a user request from a field application and uses the request to select the appropriate data sources, constructs subqueries for the selected data sources, defines the process of combining the results from the subqueries, and develop an integration script that controls the integration process in order to respond to the request. The spatial mediator uses ontologies to support search for both geographic location based on symbolic terms as well as providing a term-based index to spatial data sources based on the relational model. In our approach, application designers only need to be aware of a minimum amount about the queries needed to supply users with the required data. The key part of this research has been the development of the spatial mediator that can dynamically respond to requests within the GeoGrid environment for geographic maps and related relational spatial data

    Applications of Derandomization Theory in Coding

    Get PDF
    Randomized techniques play a fundamental role in theoretical computer science and discrete mathematics, in particular for the design of efficient algorithms and construction of combinatorial objects. The basic goal in derandomization theory is to eliminate or reduce the need for randomness in such randomized constructions. In this thesis, we explore some applications of the fundamental notions in derandomization theory to problems outside the core of theoretical computer science, and in particular, certain problems related to coding theory. First, we consider the wiretap channel problem which involves a communication system in which an intruder can eavesdrop a limited portion of the transmissions, and construct efficient and information-theoretically optimal communication protocols for this model. Then we consider the combinatorial group testing problem. In this classical problem, one aims to determine a set of defective items within a large population by asking a number of queries, where each query reveals whether a defective item is present within a specified group of items. We use randomness condensers to explicitly construct optimal, or nearly optimal, group testing schemes for a setting where the query outcomes can be highly unreliable, as well as the threshold model where a query returns positive if the number of defectives pass a certain threshold. Finally, we design ensembles of error-correcting codes that achieve the information-theoretic capacity of a large class of communication channels, and then use the obtained ensembles for construction of explicit capacity achieving codes. [This is a shortened version of the actual abstract in the thesis.]Comment: EPFL Phd Thesi
    • 

    corecore