252,536 research outputs found

    Indexing Metric Spaces for Exact Similarity Search

    Full text link
    With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

    Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

    Full text link
    This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time O(logn)O(\log n) per input symbol (as opposed to amortized O(logn)O(\log n) time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves O(logn)O(\log n) worst case time per input symbol. Searching for a pattern of length mm in the resulting suffix tree takes O(min(mlogΣ,m+logn)+tocc)O(\min(m\log |\Sigma|, m + \log n) + tocc) time, where tocctocc is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

    Dynamic load balancing in parallel KD-tree k-means

    Get PDF
    One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

    Model-based dependability analysis : state-of-the-art, challenges and future outlook

    Get PDF
    Abstract: Over the past two decades, the study of model-based dependability analysis has gathered significant research interest. Different approaches have been developed to automate and address various limitations of classical dependability techniques to contend with the increasing complexity and challenges of modern safety-critical system. Two leading paradigms have emerged, one which constructs predictive system failure models from component failure models compositionally using the topology of the system. The other utilizes design models - typically state automata - to explore system behaviour through fault injection. This paper reviews a number of prominent techniques under these two paradigms, and provides an insight into their working mechanism, applicability, strengths and challenges, as well as recent developments within these fields. We also discuss the emerging trends on integrated approaches and advanced analysis capabilities. Lastly, we outline the future outlook for model-based dependability analysis

    Space Saving by Dynamic Algebraization

    Full text link
    Dynamic programming is widely used for exact computations based on tree decompositions of graphs. However, the space complexity is usually exponential in the treewidth. We study the problem of designing efficient dynamic programming algorithm based on tree decompositions in polynomial space. We show how to construct a tree decomposition and extend the algebraic techniques of Lokshtanov and Nederlof such that the dynamic programming algorithm runs in time O(2h)O^*(2^h), where hh is the maximum number of vertices in the union of bags on the root to leaf paths on a given tree decomposition, which is a parameter closely related to the tree-depth of a graph. We apply our algorithm to the problem of counting perfect matchings on grids and show that it outperforms other polynomial-space solutions. We also apply the algorithm to other set covering and partitioning problems.Comment: 14 pages, 1 figur
    corecore