59,549 research outputs found
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for
\textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search
system for ultra-high dimensional datasets on a single machine, that does not
require similarity computations and is tailored for high-performance computing
platforms. By leveraging a LSH style randomized indexing procedure and
combining it with several principled techniques, such as reservoir sampling,
recent advances in one-pass minwise hashing, and count based estimations, we
reduce the computational and parallelization costs of similarity search, while
retaining sound theoretical guarantees.
We evaluate FLASH on several real, high-dimensional datasets from different
domains, including text, malicious URL, click-through prediction, social
networks, etc. Our experiments shed new light on the difficulties associated
with datasets having several million dimensions. Current state-of-the-art
implementations either fail on the presented scale or are orders of magnitude
slower than FLASH. FLASH is capable of computing an approximate k-NN graph,
from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than
10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam
dataset, using brute-force (), will require at least 20 teraflops. We
provide CPU and GPU implementations of FLASH for replicability of our results
Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects
We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs
Implementation Aspects of a Transmitted-Reference UWB Receiver
In this paper, we discuss the design issues of an ultra wide band (UWB) receiver targeting a single-chip CMOS implementation for low data-rate applications like ad hoc wireless sensor networks. A non-coherent transmitted reference (TR) receiver is chosen because of its small complexity compared to other architectures. After a brief recapitulation of the UWB fundamentals and a short discussion on the major differences between coherent and non-coherent receivers, we discuss issues, challenges and possible design solutions. Several simulation results obtained by means of a behavioral model are presented, together with an analysis of the trade-off between performance and complexity in an integrated circuit implementation
GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization
X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical
concern in most image guided radiation therapy procedures. It is the goal of
this paper to develop a fast GPU-based algorithm to reconstruct high quality
CBCT images from undersampled and noisy projection data so as to lower the
imaging dose. For this purpose, we have developed an iterative tight frame (TF)
based CBCT reconstruction algorithm. A condition that a real CBCT image has a
sparse representation under a TF basis is imposed in the iteration process as
regularization to the solution. To speed up the computation, a multi-grid
method is employed. Our GPU implementation has achieved high computational
efficiency and a CBCT image of resolution 512\times512\times70 can be
reconstructed in ~5 min. We have tested our algorithm on a digital NCAT phantom
and a physical Catphan phantom. It is found that our TF-based algorithm is able
to reconstrct CBCT in the context of undersampling and low mAs levels. We have
also quantitatively analyzed the reconstructed CBCT image quality in terms of
modulation-transfer-function and contrast-to-noise ratio under various scanning
conditions. The results confirm the high CBCT image quality obtained from our
TF algorithm. Moreover, our algorithm has also been validated in a real
clinical context using a head-and-neck patient case. Comparisons of the
developed TF algorithm and the current state-of-the-art TV algorithm have also
been made in various cases studied in terms of reconstructed image quality and
computation efficiency.Comment: 24 pages, 8 figures, accepted by Phys. Med. Bio
Anode-Coupled Readout for Light Collection in Liquid Argon TPCs
This paper will discuss a new method of signal read-out from photon detectors
in ultra-large, underground liquid argon time projection chambers. In this
design, the signal from the light collection system is coupled via capacitive
plates to the TPC wire-planes. This signal is then read out using the same
cabling and electronics as the charge information. This greatly benefits light
collection: it eliminates the need for an independent readout, substantially
reducing cost; It reduces the number of cables in the vapor region of the TPC
that can produce impurities; And it cuts down on the number of feed-throughs in
the cryostat wall that can cause heat-leaks and potential points of failure. We
present experimental results that demonstrate the sensitivity of a LArTPC wire
plane to photon detector signals. We also simulate the effect of a 1 s
shaping time and a 2 MHz sampling rate on these signals in the presence of
noise, and find that a single photoelectron timing resolution of 30 ns
can be achieved.Comment: 16 pages, 15 figure
- ā¦