12,315 research outputs found

    Privacy-Preserving Shortest Path Computation

    Full text link
    Navigation is one of the most popular cloud computing services. But in virtually all cloud-based navigation systems, the client must reveal her location and destination to the cloud service provider in order to learn the fastest route. In this work, we present a cryptographic protocol for navigation on city streets that provides privacy for both the client's location and the service provider's routing data. Our key ingredient is a novel method for compressing the next-hop routing matrices in networks such as city street maps. Applying our compression method to the map of Los Angeles, for example, we achieve over tenfold reduction in the representation size. In conjunction with other cryptographic techniques, this compressed representation results in an efficient protocol suitable for fully-private real-time navigation on city streets. We demonstrate the practicality of our protocol by benchmarking it on real street map data for major cities such as San Francisco and Washington, D.C.Comment: Extended version of NDSS 2016 pape

    Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

    Full text link
    This paper introduces a high-throughput software tool framework called {\it sam2bam} that enables users to significantly speedup pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156-186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize the multiple processors, available memory, high-bandwidth of storage, and hardware compression accelerators if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting the input data are provided by {\it plug-in} tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of NGS data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime for whole-genome sequencing data from about 20 hours to about nine minutes on the same system using up to 711 GB of memory

    Compressed Representations of Conjunctive Query Results

    Full text link
    Relational queries, and in particular join queries, often generate large output results when executed over a huge dataset. In such cases, it is often infeasible to store the whole materialized output if we plan to reuse it further down a data processing pipeline. Motivated by this problem, we study the construction of space-efficient compressed representations of the output of conjunctive queries, with the goal of supporting the efficient access of the intermediate compressed result for a given access pattern. In particular, we initiate the study of an important tradeoff: minimizing the space necessary to store the compressed result, versus minimizing the answer time and delay for an access request over the result. Our main contribution is a novel parameterized data structure, which can be tuned to trade off space for answer time. The tradeoff allows us to control the space requirement of the data structure precisely, and depends both on the structure of the query and the access pattern. We show how we can use the data structure in conjunction with query decomposition techniques, in order to efficiently represent the outputs for several classes of conjunctive queries.Comment: To appear in PODS'18; 35 pages; comments welcom

    Reversible Embedding to Covers Full of Boundaries

    Full text link
    In reversible data embedding, to avoid overflow and underflow problem, before data embedding, boundary pixels are recorded as side information, which may be losslessly compressed. The existing algorithms often assume that a natural image has little boundary pixels so that the size of side information is small. Accordingly, a relatively high pure payload could be achieved. However, there actually may exist a lot of boundary pixels in a natural image, implying that, the size of side information could be very large. Therefore, when to directly use the existing algorithms, the pure embedding capacity may be not sufficient. In order to address this problem, in this paper, we present a new and efficient framework to reversible data embedding in images that have lots of boundary pixels. The core idea is to losslessly preprocess boundary pixels so that it can significantly reduce the side information. Experimental results have shown the superiority and applicability of our work

    User data dissemination concepts for earth resources

    Get PDF
    Domestic data dissemination networks for earth-resources data in the 1985-1995 time frame were evaluated. The following topics were addressed: (1) earth-resources data sources and expected data volumes, (2) future user demand in terms of data volume and timeliness, (3) space-to-space and earth point-to-point transmission link requirements and implementation, (4) preprocessing requirements and implementation, (5) network costs, and (6) technological development to support this implementation. This study was parametric in that the data input (supply) was varied by a factor of about fifteen while the user request (demand) was varied by a factor of about nineteen. Correspondingly, the time from observation to delivery to the user was varied. This parametric evaluation was performed by a computer simulation that was based on network alternatives and resulted in preliminary transmission and preprocessing requirements. The earth-resource data sources considered were: shuttle sorties, synchronous satellites (e.g., SEOS), aircraft, and satellites in polar orbits

    Making Queries Tractable on Big Data with Preprocessing

    Get PDF
    A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME al-gorithms often become infeasible in practice. A traditional and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to pro-vide a formal foundation for this approach in terms of com-putational complexity. (1) We propose a set of Π-tractable queries, denoted by ΠT0Q, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natu-ral query classes are Π-tractable and are feasible on big data. (3) We also study a set ΠTQ of query classes that can be ef-fectively converted to Π-tractable queries by re-factorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for ΠTQ. (5) We also show that ΠT0Q ⊂ P unless P = NC, i.e., the set ΠT0Q of all Π-tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, ΠTQ = P, i.e., all PTIME query classes can be made Π-tractable via proper re-factorizations. This work is a step towards understanding the tractability of queries in the context of big data. 1
    • …
    corecore