12,315 research outputs found
Privacy-Preserving Shortest Path Computation
Navigation is one of the most popular cloud computing services. But in
virtually all cloud-based navigation systems, the client must reveal her
location and destination to the cloud service provider in order to learn the
fastest route. In this work, we present a cryptographic protocol for navigation
on city streets that provides privacy for both the client's location and the
service provider's routing data. Our key ingredient is a novel method for
compressing the next-hop routing matrices in networks such as city street maps.
Applying our compression method to the map of Los Angeles, for example, we
achieve over tenfold reduction in the representation size. In conjunction with
other cryptographic techniques, this compressed representation results in an
efficient protocol suitable for fully-private real-time navigation on city
streets. We demonstrate the practicality of our protocol by benchmarking it on
real street map data for major cities such as San Francisco and Washington,
D.C.Comment: Extended version of NDSS 2016 pape
Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools
This paper introduces a high-throughput software tool framework called {\it
sam2bam} that enables users to significantly speedup pre-processing for
next-generation sequencing data. The sam2bam is especially efficient on
single-node multi-core large-memory systems. It can reduce the runtime of data
pre-processing in marking duplicate reads on a single node system by 156-186x
compared with de facto standard tools. The sam2bam consists of parallel
software components that can fully utilize the multiple processors, available
memory, high-bandwidth of storage, and hardware compression accelerators if
available.
The sam2bam provides file format conversion between well-known genome file
formats, from SAM to BAM, as a basic feature. Additional features such as
analyzing, filtering, and converting the input data are provided by {\it
plug-in} tools, e.g., duplicate marking, which can be attached to sam2bam at
runtime.
We demonstrated that sam2bam could significantly reduce the runtime of NGS
data pre-processing from about two hours to about one minute for a whole-exome
data set on a 16-core single-node system using up to 130 GB of memory. The
sam2bam could reduce the runtime for whole-genome sequencing data from about 20
hours to about nine minutes on the same system using up to 711 GB of memory
Compressed Representations of Conjunctive Query Results
Relational queries, and in particular join queries, often generate large
output results when executed over a huge dataset. In such cases, it is often
infeasible to store the whole materialized output if we plan to reuse it
further down a data processing pipeline. Motivated by this problem, we study
the construction of space-efficient compressed representations of the output of
conjunctive queries, with the goal of supporting the efficient access of the
intermediate compressed result for a given access pattern. In particular, we
initiate the study of an important tradeoff: minimizing the space necessary to
store the compressed result, versus minimizing the answer time and delay for an
access request over the result. Our main contribution is a novel parameterized
data structure, which can be tuned to trade off space for answer time. The
tradeoff allows us to control the space requirement of the data structure
precisely, and depends both on the structure of the query and the access
pattern. We show how we can use the data structure in conjunction with query
decomposition techniques, in order to efficiently represent the outputs for
several classes of conjunctive queries.Comment: To appear in PODS'18; 35 pages; comments welcom
Reversible Embedding to Covers Full of Boundaries
In reversible data embedding, to avoid overflow and underflow problem, before
data embedding, boundary pixels are recorded as side information, which may be
losslessly compressed. The existing algorithms often assume that a natural
image has little boundary pixels so that the size of side information is small.
Accordingly, a relatively high pure payload could be achieved. However, there
actually may exist a lot of boundary pixels in a natural image, implying that,
the size of side information could be very large. Therefore, when to directly
use the existing algorithms, the pure embedding capacity may be not sufficient.
In order to address this problem, in this paper, we present a new and efficient
framework to reversible data embedding in images that have lots of boundary
pixels. The core idea is to losslessly preprocess boundary pixels so that it
can significantly reduce the side information. Experimental results have shown
the superiority and applicability of our work
User data dissemination concepts for earth resources
Domestic data dissemination networks for earth-resources data in the 1985-1995 time frame were evaluated. The following topics were addressed: (1) earth-resources data sources and expected data volumes, (2) future user demand in terms of data volume and timeliness, (3) space-to-space and earth point-to-point transmission link requirements and implementation, (4) preprocessing requirements and implementation, (5) network costs, and (6) technological development to support this implementation. This study was parametric in that the data input (supply) was varied by a factor of about fifteen while the user request (demand) was varied by a factor of about nineteen. Correspondingly, the time from observation to delivery to the user was varied. This parametric evaluation was performed by a computer simulation that was based on network alternatives and resulted in preliminary transmission and preprocessing requirements. The earth-resource data sources considered were: shuttle sorties, synchronous satellites (e.g., SEOS), aircraft, and satellites in polar orbits
Making Queries Tractable on Big Data with Preprocessing
A query class is traditionally considered tractable if there exists a polynomial-time (PTIME) algorithm to answer its queries. When it comes to big data, however, PTIME al-gorithms often become infeasible in practice. A traditional and effective approach to coping with this is to preprocess data off-line, so that queries in the class can be subsequently evaluated on the data efficiently. This paper aims to pro-vide a formal foundation for this approach in terms of com-putational complexity. (1) We propose a set of Π-tractable queries, denoted by ΠT0Q, to characterize classes of queries that can be answered in parallel poly-logarithmic time (NC) after PTIME preprocessing. (2) We show that several natu-ral query classes are Π-tractable and are feasible on big data. (3) We also study a set ΠTQ of query classes that can be ef-fectively converted to Π-tractable queries by re-factorizing its data and queries for preprocessing. We introduce a form of NC reductions to characterize such conversions. (4) We show that a natural query class is complete for ΠTQ. (5) We also show that ΠT0Q ⊂ P unless P = NC, i.e., the set ΠT0Q of all Π-tractable queries is properly contained in the set P of all PTIME queries. Nonetheless, ΠTQ = P, i.e., all PTIME query classes can be made Π-tractable via proper re-factorizations. This work is a step towards understanding the tractability of queries in the context of big data. 1
- …