48 research outputs found
On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching
We present parallel algorithms for exact and approximate pattern matching
with suffix arrays, using a CREW-PRAM with processors. Given a static text
of length , we first show how to compute the suffix array interval of a
given pattern of length in
time for . For approximate pattern matching with differences or
mismatches, we show how to compute all occurrences of a given pattern in
time, where is the size of the alphabet
and . The workhorse of our algorithms is a data structure
for merging suffix array intervals quickly: Given the suffix array intervals
for two patterns and , we present a data structure for computing the
interval of in sequential time, or in
parallel time. All our data structures are of size bits (in addition to
the suffix array)
Fast antijamming timing acquisition using multilayer synchronization sequence
Pseudonoise (PN) sequences are widely used as preamble sequences to establish timing synchronization in military wireless communication systems. At the receiver, searching and detection techniques, such as the full parallel search (FPS) and the serial search (SS), are usually adopted to acquire correct timing position. However, the synchronization sequence has to be very long to combat jamming that reduces the signal-to-noise ratio (SNR) to an extremely low level. In this adverse scenario, the FPS scheme becomes too complex to implement, whereas the SS method suffers from the drawback of long mean acquisition time (MAT). In this paper, a fast timing acquisition method is proposed, using the multilayer synchronization sequence based on cyclical codes. Specifically, the transmitted preamble is the Kronecker product of Bose–Chaudhuri-Hocquenghem (BCH) codewords and PN sequences. At the receiver, the cyclical nature of BCH codes is exploited to test only a part of the entire sequence, resulting in shorter acquisition time. The algorithm is evaluated using the metrics of MAT and detection probability (DP). Theoretical expressions of MAT and DP are derived from the constant false-alarm rate (CFAR) criterion. Theoretical analysis and simulation results show that our proposed scheme dramatically reduces the acquisition time while achieving similar DP performance and maintaining a reasonably low real-time hardware implementation complexity, in comparison with the SS schem
An Efficient Multiway Mergesort for GPU Architectures
Sorting is a primitive operation that is a building block for countless
algorithms. As such, it is important to design sorting algorithms that approach
peak performance on a range of hardware architectures. Graphics Processing
Units (GPUs) are particularly attractive architectures as they provides massive
parallelism and computing power. However, the intricacies of their compute and
memory hierarchies make designing GPU-efficient algorithms challenging. In this
work we present GPU Multiway Mergesort (MMS), a new GPU-efficient multiway
mergesort algorithm. MMS employs a new partitioning technique that exposes the
parallelism needed by modern GPU architectures. To the best of our knowledge,
MMS is the first sorting algorithm for the GPU that is asymptotically optimal
in terms of global memory accesses and that is completely free of shared memory
bank conflicts.
We realize an initial implementation of MMS, evaluate its performance on
three modern GPU architectures, and compare it to competitive implementations
available in state-of-the-art GPU libraries. Despite these implementations
being highly optimized, MMS compares favorably, achieving performance
improvements for most random inputs. Furthermore, unlike MMS, state-of-the-art
algorithms are susceptible to bank conflicts. We find that for certain inputs
that cause these algorithms to incur large numbers of bank conflicts, MMS can
achieve up to a 37.6% speedup over its fastest competitor. Overall, even though
its current implementation is not fully optimized, due to its efficient use of
the memory hierarchy, MMS outperforms the fastest comparison-based sorting
implementations available to date
Learning Character Strings via Mastermind Queries, with a Case Study Involving mtDNA
We study the degree to which a character string, , leaks details about
itself any time it engages in comparison protocols with a strings provided by a
querier, Bob, even if those protocols are cryptographically guaranteed to
produce no additional information other than the scores that assess the degree
to which matches strings offered by Bob. We show that such scenarios allow
Bob to play variants of the game of Mastermind with so as to learn the
complete identity of . We show that there are a number of efficient
implementations for Bob to employ in these Mastermind attacks, depending on
knowledge he has about the structure of , which show how quickly he can
determine . Indeed, we show that Bob can discover using a number of
rounds of test comparisons that is much smaller than the length of , under
reasonable assumptions regarding the types of scores that are returned by the
cryptographic protocols and whether he can use knowledge about the distribution
that comes from. We also provide the results of a case study we performed
on a database of mitochondrial DNA, showing the vulnerability of existing
real-world DNA data to the Mastermind attack.Comment: Full version of related paper appearing in IEEE Symposium on Security
and Privacy 2009, "The Mastermind Attack on Genomic Data." This version
corrects the proofs of what are now Theorems 2 and 4
Fast Parallel Operations on Search Trees
Using (a,b)-trees as an example, we show how to perform a parallel split with
logarithmic latency and parallel join, bulk updates, intersection, union (or
merge), and (symmetric) set difference with logarithmic latency and with
information theoretically optimal work. We present both asymptotically optimal
solutions and simplified versions that perform well in practice - they are
several times faster than previous implementations
GPUSCAN:Efficient Structural Graph Clustering on GPUs
Structural clustering is one of the most popular graph clustering methods,
which has achieved great performance improvement by utilizing GPUs. Even
though, the state-of-the-art GPU-based structural clustering algorithm,
GPUSCAN, still suffers from efficiency issues since lots of extra costs are
introduced for parallelization. Moreover, GPUSCAN assumes that the graph is
resident in the GPU memory. However, the GPU memory capacity is limited
currently while many real-world graphs are big and cannot fit in the GPU
memory, which makes GPUSCAN unable to handle large graphs. Motivated by this,
we present a new GPU-based structural clustering algorithm, GPUSCAN++, in this
paper. To address the efficiency issue, we propose a new progressive clustering
method tailored for GPUs that not only avoid high parallelization costs but
also fully exploits the computing resources of GPUs. To address the GPU memory
limitation issue, we propose a partition-based algorithm for structural
clustering that can process large graphs with limited GPU memory. We conduct
experiments on real graphs, and the experimental results demonstrate that our
algorithm can achieve up to 168 times speedup compared with the
state-of-the-art GPU-based algorithm when the graph can be resident in the GPU
memory. Moreover, our algorithm is scalable to handle large graphs. As an
example, our algorithm can finish the structural clustering on a graph with 1.8
billion edges using less than 2 GB GPU memory