155,697 research outputs found
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
Approximate MIMO Iterative Processing with Adjustable Complexity Requirements
Targeting always the best achievable bit error rate (BER) performance in
iterative receivers operating over multiple-input multiple-output (MIMO)
channels may result in significant waste of resources, especially when the
achievable BER is orders of magnitude better than the target performance (e.g.,
under good channel conditions and at high signal-to-noise ratio (SNR)). In
contrast to the typical iterative schemes, a practical iterative decoding
framework that approximates the soft-information exchange is proposed which
allows reduced complexity sphere and channel decoding, adjustable to the
transmission conditions and the required bit error rate. With the proposed
approximate soft information exchange the performance of the exact soft
information can still be reached with significant complexity gains.Comment: The final version of this paper appears in IEEE Transactions on
Vehicular Technolog
Multidimensional Range Queries on Modern Hardware
Range queries over multidimensional data are an important part of database
workloads in many applications. Their execution may be accelerated by using
multidimensional index structures (MDIS), such as kd-trees or R-trees. As for
most index structures, the usefulness of this approach depends on the
selectivity of the queries, and common wisdom told that a simple scan beats
MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom
is largely based on evaluations that are almost two decades old, performed on
data being held on disks, applying IO-optimized data structures, and using
single-core systems. The question is whether this rule of thumb still holds
when multidimensional range queries (MDRQ) are performed on modern
architectures with large main memories holding all data, multi-core CPUs and
data-parallel instruction sets. In this paper, we study the question whether
and how much modern hardware influences the performance ratio between index
structures and scans for MDRQ. To this end, we conservatively adapted three
popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit
features of modern servers and compared their performance to different flavors
of parallel scans using multiple (synthetic and real-world) analytical
workloads over multiple (synthetic and real-world) datasets of varying size,
dimensionality, and skew. We find that all approaches benefit considerably from
using main memory and parallelization, yet to varying degrees. Our evaluation
indicates that, on current machines, scanning should be favored over parallel
versions of classical MDIS even for very selective queries
Search algorithms as a framework for the optimization of drug combinations
Combination therapies are often needed for effective clinical outcomes in the
management of complex diseases, but presently they are generally based on
empirical clinical experience. Here we suggest a novel application of search
algorithms, originally developed for digital communication, modified to
optimize combinations of therapeutic interventions. In biological experiments
measuring the restoration of the decline with age in heart function and
exercise capacity in Drosophila melanogaster, we found that search algorithms
correctly identified optimal combinations of four drugs with only one third of
the tests performed in a fully factorial search. In experiments identifying
combinations of three doses of up to six drugs for selective killing of human
cancer cells, search algorithms resulted in a highly significant enrichment of
selective combinations compared with random searches. In simulations using a
network model of cell death, we found that the search algorithms identified the
optimal combinations of 6-9 interventions in 80-90% of tests, compared with
15-30% for an equivalent random search. These findings suggest that modified
search algorithms from information theory have the potential to enhance the
discovery of novel therapeutic drug combinations. This report also helps to
frame a biomedical problem that will benefit from an interdisciplinary effort
and suggests a general strategy for its solution.Comment: 36 pages, 10 figures, revised versio
The STRESS Method for Boundary-point Performance Analysis of End-to-end Multicast Timer-Suppression Mechanisms
Evaluation of Internet protocols usually uses random scenarios or scenarios
based on designers' intuition. Such approach may be useful for average-case
analysis but does not cover boundary-point (worst or best-case) scenarios. To
synthesize boundary-point scenarios a more systematic approach is needed.In
this paper, we present a method for automatic synthesis of worst and best case
scenarios for protocol boundary-point evaluation.
Our method uses a fault-oriented test generation (FOTG) algorithm for
searching the protocol and system state space to synthesize these scenarios.
The algorithm is based on a global finite state machine (FSM) model. We extend
the algorithm with timing semantics to handle end-to-end delays and address
performance criteria. We introduce the notion of a virtual LAN to represent
delays of the underlying multicast distribution tree. The algorithms used in
our method utilize implicit backward search using branch and bound techniques
and start from given target events. This aims to reduce the search complexity
drastically. As a case study, we use our method to evaluate variants of the
timer suppression mechanism, used in various multicast protocols, with respect
to two performance criteria: overhead of response messages and response time.
Simulation results for reliable multicast protocols show that our method
provides a scalable way for synthesizing worst-case scenarios automatically.
Results obtained using stress scenarios differ dramatically from those obtained
through average-case analyses. We hope for our method to serve as a model for
applying systematic scenario generation to other multicast protocols.Comment: 24 pages, 10 figures, IEEE/ACM Transactions on Networking (ToN) [To
appear
- …