825 research outputs found
Recommended from our members
New application of failure functions
Several algorithms are presented whose operations are governed by a principle of failure functions: when searching for an extremal value within a sequence, it suffices to consider only the subsequence of items each of which is the first possible improvement of its predecessor. These algorithms are more efficient than their more traditional counterparts
The Ubiquitous B-tree: Volume II
Major developments relating to the B-tree from early 1979 through the fall of 1986 are presented. This updates the well-known article, The Ubiquitous B-Tree by Douglas Comer (Computing Surveys, June 1979). After a basic overview of B and B+ trees, recent research is cited as well as descriptions of nine B-tree variants developed since Comer\u27s article. The advantages and disadvantages of each variant over the basic B-tree are emphasized. Also included are a discussion of concurrency control issues in B-trees and a speculation on the future of B-trees
Efficient Management of Short-Lived Data
Motivated by the increasing prominence of loosely-coupled systems, such as
mobile and sensor networks, which are characterised by intermittent
connectivity and volatile data, we study the tagging of data with so-called
expiration times. More specifically, when data are inserted into a database,
they may be tagged with time values indicating when they expire, i.e., when
they are regarded as stale or invalid and thus are no longer considered part of
the database. In a number of applications, expiration times are known and can
be assigned at insertion time. We present data structures and algorithms for
online management of data tagged with expiration times. The algorithms are
based on fully functional, persistent treaps, which are a combination of binary
search trees with respect to a primary attribute and heaps with respect to a
secondary attribute. The primary attribute implements primary keys, and the
secondary attribute stores expiration times in a minimum heap, thus keeping a
priority queue of tuples to expire. A detailed and comprehensive experimental
study demonstrates the well-behavedness and scalability of the approach as well
as its efficiency with respect to a number of competitors.Comment: switched to TimeCenter latex styl
Alphabet-based Multisensory Data Fusion and Classification using Factor Graphs
The way of multisensory data integration is a crucial step of any data fusion method. Different physical types of sensors (optic, thermal, acoustic, or radar) with different resolutions, and different types of GIS digital data (elevation, vector map) require a proper method for data integration. Incommensurability of the data may not allow to use conventional statistical methods for fusion and processing of the data. A correct and established way of multisensory data integration is required to deal with such incommensurable data as the employment of an inappropriate methodology may lead to errors in the fusion process. To perform a proper multisensory data fusion several strategies were developed (Bayesian, linear (log linear) opinion pool, neural networks, fuzzy logic approaches). Employment of these approaches is motivated by weighted consensus theory, which lead to fusion processes that are correctly performed for the variety of data properties
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Asymptotic normality of plug-in level set estimates
We establish the asymptotic normality of the -measure of the symmetric
difference between the level set and a plug-in-type estimator of it formed by
replacing the density in the definition of the level set by a kernel density
estimator. Our proof will highlight the efficacy of Poissonization methods in
the treatment of large sample theory problems of this kind.Comment: Published in at http://dx.doi.org/10.1214/08-AAP569 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …