11 research outputs found

    Similarity measure models and algorithms for hierarchical cases

    Full text link
    Many business situations such as events, products and services, are often described in a hierarchical structure. When we use case-based reasoning (CBR) techniques to support business decision-making, we require a hierarchical-CBR technique which can effectively compare and measure similarity between two hierarchical cases. This study first defines hierarchical case trees (HC-trees) and discusses related features. It then develops a similarity evaluation model which takes into account all the information on nodes' structures, concepts, weights, and values in order to comprehensively compare two hierarchical case trees. A similarity measure algorithm is proposed which includes a node concept correspondence degree computation algorithm and a maximum correspondence tree mapping construction algorithm, for HC-trees. We provide two illustrative examples to demonstrate the effectiveness of the proposed hierarchical case similarity evaluation model and algorithms, and possible applications in CBR systems. © 2011 Elsevier Ltd. All rights reserved

    On the Approximation of Largest Common Subtrees and Largest Common Point Sets

    No full text
    . This paper considers the approximability of the largest common subtree and the largest common subgraph problems, which have applications in molecular biology. It is shown that approximating the problems within a factor of n ffl is NP-complete, while a general search algorithm which approximates both problems within a factor of O(n= log n) is presented. Moreover, several variants of the largest common subtree problem are studied. 1 Introduction In computational biology and chemistry, there is a frequent need to extract a common pattern from multiple data. A good example is the extraction of a common pattern from multiple amino acid sequences automatically, which has been studied extensively from both theoretical and practical viewpoints. We consider two such applications. One involves finding a common pattern from multiple molecules. Several methods have been developed for finding the largest common connected subgraph of multiple graphs, and have been applied to study the relation..

    Fine-grained Complexity Meets IP = PSPACE

    Full text link
    In this paper we study the fine-grained complexity of finding exact and approximate solutions to problems in P. Our main contribution is showing reductions from exact to approximate solution for a host of such problems. As one (notable) example, we show that the Closest-LCS-Pair problem (Given two sets of strings AA and BB, compute exactly the maximum LCS(a,b)\textsf{LCS}(a, b) with (a,b)A×B(a, b) \in A \times B) is equivalent to its approximation version (under near-linear time reductions, and with a constant approximation factor). More generally, we identify a class of problems, which we call BP-Pair-Class, comprising both exact and approximate solutions, and show that they are all equivalent under near-linear time reductions. Exploring this class and its properties, we also show: \bullet Under the NC-SETH assumption (a significantly more relaxed assumption than SETH), solving any of the problems in this class requires essentially quadratic time. \bullet Modest improvements on the running time of known algorithms (shaving log factors) would imply that NEXP is not in non-uniform NC1\textsf{NC}^1. \bullet Finally, we leverage our techniques to show new barriers for deterministic approximation algorithms for LCS. At the heart of these new results is a deep connection between interactive proof systems for bounded-space computations and the fine-grained complexity of exact and approximate solutions to problems in P. In particular, our results build on the proof techniques from the classical IP = PSPACE result

    Efficient similarity computations on parallel machines using data shaping

    Get PDF
    Similarity computation is a fundamental operation in all forms of data. Big Data is, typically, characterized by attributes such as volume, velocity, variety, veracity, etc. In general, Big Data variety appears as structured, semi-structured or unstructured forms. The volume of Big Data in general, and semi-structured data in particular, is increasing at a phenomenal rate. Big Data phenomenon is posing new set of challenges to similarity computation problems occurring in semi-structured data. Technology and processor architecture trends suggest very strongly that future processors shall have ten\u27s of thousands of cores (hardware threads). Another crucial trend is that ratio between on-chip and off-chip memory to core counts is decreasing. State-of-the-art parallel computing platforms such as General Purpose Graphics Processors (GPUs) and MICs are promising for high performance as well high throughput computing. However, processing semi-structured component of Big Data efficiently using parallel computing systems (e.g. GPUs) is challenging. Reason being most of the emerging platforms (e.g. GPUs) are organized as Single Instruction Multiple Thread/Data machines which are highly structured, where several cores (streaming processors) operate in lock-step manner, or they require a high degree of task-level parallelism. We argue that effective and efficient solutions to key similarity computation problems need to operate in a synergistic manner with the underlying computing hardware. Moreover, semi-structured form input data needs to be shaped or reorganized with the goal to exploit the enormous computing power of \textit{state-of-the-art} highly threaded architectures such as GPUs. For example, shaping input data (via encoding) with minimal data-dependence can facilitate flexible and concurrent computations on high throughput accelerators/co-processors such as GPU, MIC, etc. We consider various instances of traditional and futuristic problems occurring in intersection of semi-structured data and data analytics. Preprocessing is an operation common at initial stages of data processing pipelines. Typically, the preprocessing involves operations such as data extraction, data selection, etc. In context of semi-structured data, twig filtering is used in identifying (and extracting) data of interest. Duplicate detection and record linkage operations are useful in preprocessing tasks such as data cleaning, data fusion, and also useful in data mining, etc., in order to find similar tree objects. Likewise, tree edit is a fundamental metric used in context of tree problems; and similarity computation between trees another key problem in context of Big Data. This dissertation makes a case for platform-centric data shaping as a potent mechanism to tackle the data- and architecture-borne issues in context of semi-structured data processing on GPU and GPU-like parallel architecture machines. In this dissertation, we propose several data shaping techniques for tree matching problems occurring in semi-structured data. We experiment with real world datasets. The experimental results obtained reveal that the proposed platform-centric data shaping approach is effective for computing similarities between tree objects using GPGPUs. The techniques proposed result in performance gains up to three orders of magnitude, subject to problem and platform
    corecore