59,549 research outputs found

    FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

    Full text link
    We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for \textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2Dn^2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results

    Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects

    Get PDF
    We report a novel application of a graphics processing unit (GPU) for the purpose of accelerating the search pipelines for gravitational waves from coalescing binaries of compact objects. A speed-up of 16-fold in total has been achieved with an NVIDIA GeForce 8800 Ultra GPU card compared with one core of a 2.5 GHz Intel Q9300 central processing unit (CPU). We show that substantial improvements are possible and discuss the reduction in CPU count required for the detection of inspiral sources afforded by the use of GPUs

    Implementation Aspects of a Transmitted-Reference UWB Receiver

    Get PDF
    In this paper, we discuss the design issues of an ultra wide band (UWB) receiver targeting a single-chip CMOS implementation for low data-rate applications like ad hoc wireless sensor networks. A non-coherent transmitted reference (TR) receiver is chosen because of its small complexity compared to other architectures. After a brief recapitulation of the UWB fundamentals and a short discussion on the major differences between coherent and non-coherent receivers, we discuss issues, challenges and possible design solutions. Several simulation results obtained by means of a behavioral model are presented, together with an analysis of the trade-off between performance and complexity in an integrated circuit implementation

    GPU-based Iterative Cone Beam CT Reconstruction Using Tight Frame Regularization

    Full text link
    X-ray imaging dose from serial cone-beam CT (CBCT) scans raises a clinical concern in most image guided radiation therapy procedures. It is the goal of this paper to develop a fast GPU-based algorithm to reconstruct high quality CBCT images from undersampled and noisy projection data so as to lower the imaging dose. For this purpose, we have developed an iterative tight frame (TF) based CBCT reconstruction algorithm. A condition that a real CBCT image has a sparse representation under a TF basis is imposed in the iteration process as regularization to the solution. To speed up the computation, a multi-grid method is employed. Our GPU implementation has achieved high computational efficiency and a CBCT image of resolution 512\times512\times70 can be reconstructed in ~5 min. We have tested our algorithm on a digital NCAT phantom and a physical Catphan phantom. It is found that our TF-based algorithm is able to reconstrct CBCT in the context of undersampling and low mAs levels. We have also quantitatively analyzed the reconstructed CBCT image quality in terms of modulation-transfer-function and contrast-to-noise ratio under various scanning conditions. The results confirm the high CBCT image quality obtained from our TF algorithm. Moreover, our algorithm has also been validated in a real clinical context using a head-and-neck patient case. Comparisons of the developed TF algorithm and the current state-of-the-art TV algorithm have also been made in various cases studied in terms of reconstructed image quality and computation efficiency.Comment: 24 pages, 8 figures, accepted by Phys. Med. Bio

    Anode-Coupled Readout for Light Collection in Liquid Argon TPCs

    Get PDF
    This paper will discuss a new method of signal read-out from photon detectors in ultra-large, underground liquid argon time projection chambers. In this design, the signal from the light collection system is coupled via capacitive plates to the TPC wire-planes. This signal is then read out using the same cabling and electronics as the charge information. This greatly benefits light collection: it eliminates the need for an independent readout, substantially reducing cost; It reduces the number of cables in the vapor region of the TPC that can produce impurities; And it cuts down on the number of feed-throughs in the cryostat wall that can cause heat-leaks and potential points of failure. We present experimental results that demonstrate the sensitivity of a LArTPC wire plane to photon detector signals. We also simulate the effect of a 1 Ī¼\mus shaping time and a 2 MHz sampling rate on these signals in the presence of noise, and find that a single photoelectron timing resolution of āˆ¼\sim30 ns can be achieved.Comment: 16 pages, 15 figure
    • ā€¦
    corecore