Search CORE

93,555 research outputs found

Top-Down Skiplists

Author: Barba Luis
Morin Pat
Publication venue
Publication date: 29/07/2014
Field of study

We describe todolists (top-down skiplists), a variant of skiplists (Pugh 1990) that can execute searches using at most

\log_{2-\varepsilon} n + O(1)

binary comparisons per search and that have amortized update time

O(\varepsilon^{-1}\log n)

. A variant of todolists, called working-todolists, can execute a search for any element

x

using

\log_{2-\varepsilon} w(x) + o(\log w(x))

binary comparisons and have amortized search time

O(\varepsilon^{-1}\log w(w))

. Here,

w(x)

is the "working-set number" of

x

. No previous data structure is known to achieve a bound better than

4\log_2 w(x)

comparisons. We show through experiments that, if implemented carefully, todolists are comparable to other common dictionary implementations in terms of insertion times and outperform them in terms of search times.Comment: 18 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

The Fresh-Finger Property

Author: Howat John
Iacono John
Morin Pat
Publication venue
Publication date: 01/01/2013
Field of study

The unified property roughly states that searching for an element is fast when the current access is close to a recent access. Here, "close" refers to rank distance measured among all elements stored by the dictionary. We show that distance need not be measured this way: in fact, it is only necessary to consider a small working-set of elements to measure this rank distance. This results in a data structure with access time that is an improvement upon those offered by the unified property for many query sequences

arXiv.org e-Print Archive

CiteSeerX

DI-fusion

Recommended from our members

Metaheuristic approaches for the quartet method of hierarchical clustering

Author: Consoli S
Darby-Dowman K
Geleijnse G
Korst J
Pauws S
Publication venue
Publication date: 01/01/2008
Field of study

Given a set of objects and their pairwise distances, we wish to determine a visual representation of the data. We use the quartet paradigm to compute a hierarchy of clusters of the objects. The method is based on an NP-hard graph optimization problem called the Minimum Quartet Tree Cost problem. This paper presents and compares several metaheuristic approaches to approximate the optimal hierarchy. The performance of the algorithms is tested through extensive computational experiments and it is shown that the Reduced Variable Neighbourhood Search metaheuristic is the most effective approach to the problem, obtaining high quality solutions in short computational running times

Brunel University Research Archive

A Static Optimality Transformation with Applications to Planar Point Location

Author: Adel′son-Vel′skiĭ G. M.
Arya S.
Asano T.
Berinde R.
Cover T. M.
JOHN IACONO
Mehlhorn K.
Mehlhorn K.
Shannon C. E.
WOLFGANG MULZER
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2011
Field of study

Over the last decade, there have been several data structures that, given a planar subdivision and a probability distribution over the plane, provide a way for answering point location queries that is fine-tuned for the distribution. All these methods suffer from the requirement that the query distribution must be known in advance. We present a new data structure for point location queries in planar triangulations. Our structure is asymptotically as fast as the optimal structures, but it requires no prior information about the queries. This is a 2D analogue of the jump from Knuth's optimum binary search trees (discovered in 1971) to the splay trees of Sleator and Tarjan in 1985. While the former need to know the query distribution, the latter are statically optimal. This means that we can adapt to the query sequence and achieve the same asymptotic performance as an optimum static structure, without needing any additional information.Comment: 13 pages, 1 figure, a preliminary version appeared at SoCG 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

DI-fusion

Fast Supervised Hashing with Decision Trees for High-Dimensional Data

Author: Hengel Anton van den
Lin Guosheng
Shen Chunhua
Shi Qinfeng
Suter David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the Hamming space. Non-linear hash functions have demonstrated the advantage over linear ones due to their powerful generalization capability. In the literature, kernel functions are typically used to achieve non-linearity in hashing, which achieve encouraging retrieval performance at the price of slow evaluation and training time. Here we propose to use boosted decision trees for achieving non-linearity in hashing, which are fast to train and evaluate, hence more suitable for hashing with high dimensional data. In our approach, we first propose sub-modular formulations for the hashing binary code inference problem and an efficient GraphCut based block search method for solving large-scale inference. Then we learn hash functions by training boosted decision trees to fit the binary codes. Experiments demonstrate that our proposed method significantly outperforms most state-of-the-art methods in retrieval precision and training time. Especially for high-dimensional data, our method is orders of magnitude faster than many methods in terms of training time.Comment: Appearing in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2014, Ohio, US

arXiv.org e-Print Archive

CiteSeerX

Crossref

Adelaide Research & Scholarship

Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

Author: Turney P. D.
Publication venue
Publication date: 01/01/1995
Field of study

This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive