18,820 research outputs found
Comprehensive characterization of an open source document search engine
This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.Web of Science162art. no. 1
An automated system for lung nodule detection in low-dose computed tomography
A computer-aided detection (CAD) system for the identification of pulmonary
nodules in low-dose multi-detector helical Computed Tomography (CT) images was
developed in the framework of the MAGIC-5 Italian project. One of the main
goals of this project is to build a distributed database of lung CT scans in
order to enable automated image analysis through a data and cpu GRID
infrastructure. The basic modules of our lung-CAD system, a dot-enhancement
filter for nodule candidate selection and a neural classifier for
false-positive finding reduction, are described. The system was designed and
tested for both internal and sub-pleural nodules. The results obtained on the
collected database of low-dose thin-slice CT scans are shown in terms of free
response receiver operating characteristic (FROC) curves and discussed.Comment: 9 pages, 9 figures; Proceedings of the SPIE Medical Imaging
Conference, 17-22 February 2007, San Diego, California, USA, Vol. 6514,
65143
CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions
<p>Abstract</p> <p>Background</p> <p>Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models.</p> <p>Findings</p> <p>This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA). A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT) abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD) abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72) times using the optimized SIMT algorithm and up to 1.77 (1.66) times using the partitioned vectorized algorithm, with a performance of up to 17 (30) billion cells update per second (GCUPS) on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295) graphics card.</p> <p>Conclusions</p> <p>CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.</p
Delay-Optimal User Scheduling and Inter-Cell Interference Management in Cellular Network via Distributive Stochastic Learning
In this paper, we propose a distributive queueaware intra-cell user
scheduling and inter-cell interference (ICI) management control design for a
delay-optimal celluar downlink system with M base stations (BSs), and K users
in each cell. Each BS has K downlink queues for K users respectively with
heterogeneous arrivals and delay requirements. The ICI management control is
adaptive to joint queue state information (QSI) over a slow time scale, while
the user scheduling control is adaptive to both the joint QSI and the joint
channel state information (CSI) over a faster time scale. We show that the
problem can be modeled as an infinite horizon average cost Partially Observed
Markov Decision Problem (POMDP), which is NP-hard in general. By exploiting the
special structure of the problem, we shall derive an equivalent Bellman
equation to solve the POMDP problem. To address the distributive requirement
and the issue of dimensionality and computation complexity, we derive a
distributive online stochastic learning algorithm, which only requires local
QSI and local CSI at each of the M BSs. We show that the proposed learning
algorithm converges almost surely (with probability 1) and has significant gain
compared with various baselines. The proposed solution only has linear
complexity order O(MK)
Computer-aided detection of pulmonary nodules in low-dose CT
A computer-aided detection (CAD) system for the identification of pulmonary
nodules in low-dose multi-detector helical CT images with 1.25 mm slice
thickness is being developed in the framework of the INFN-supported MAGIC-5
Italian project. The basic modules of our lung-CAD system, a dot enhancement
filter for nodule candidate selection and a voxel-based neural classifier for
false-positive finding reduction, are described. Preliminary results obtained
on the so-far collected database of lung CT scans are discussed.Comment: 3 pages, 4 figures; Proceedings of the CompIMAGE - International
Symposium on Computational Modelling of Objects Represented in Images:
Fundamentals, Methods and Applications, 20-21 Oct. 2006, Coimbra, Portuga
Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
Database management systems (DBMSs) carefully optimize complex multi-join
queries to avoid expensive disk I/O. As servers today feature tens or hundreds
of gigabytes of RAM, a significant fraction of many analytic databases becomes
memory-resident. Even after careful tuning for an in-memory environment, a
linear disk I/O model such as the one implemented in PostgreSQL may make query
response time predictions that are up to 2X slower than the optimal multi-join
query plan over memory-resident data. This paper introduces a memory I/O cost
model to identify good evaluation strategies for complex query plans with
multiple hash-based equi-joins over memory-resident data. The proposed cost
model is carefully validated for accuracy using three different systems,
including an Amazon EC2 instance, to control for hardware-specific differences.
Prior work in parallel query evaluation has advocated right-deep and bushy
trees for multi-join queries due to their greater parallelization and
pipelining potential. A surprising finding is that the conventional wisdom from
shared-nothing disk-based systems does not directly apply to the modern
shared-everything memory hierarchy. As corroborated by our model, the
performance gap between the optimal left-deep and right-deep query plan can
grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in
SoCC'1
- …