224 research outputs found

    Graph edit distance from spectral seriation

    Get PDF
    This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that they lack some of the formality and rigor of the computation of string edit distance. Hence, our aim is to convert graphs to string sequences so that string matching techniques can be used. To do this, we use a graph spectral seriation method to convert the adjacency matrix into a string or sequence order. We show how the serial ordering can be established using the leading eigenvector of the graph adjacency matrix. We pose the problem of graph-matching as a maximum a posteriori probability (MAP) alignment of the seriation sequences for pairs of graphs. This treatment leads to an expression in which the edit cost is the negative logarithm of the a posteriori sequence alignment probability. We compute the edit distance by finding the sequence of string edit operations which minimizes the cost of the path traversing the edit lattice. The edit costs are determined by the components of the leading eigenvectors of the adjacency matrix and by the edge densities of the graphs being matched. We demonstrate the utility of the edit distance on a number of graph clustering problems

    Distributed Algorithm for Parallel Edit Distance Computation

    Get PDF
    The edit distance is the measure that quantifies the difference between two strings. It is an important concept because it has its usage in many domains such as natural language processing, spell checking, genome matching, and pattern recognition. Edit distance is also known as Levenshtein distance. Sequentially, the edit distance is computed by using dynamic programming based strategy that may not provide results in reasonable time when input strings are large. In this work, a distributed algorithm is presented for parallel edit distance computation. The proposed algorithm is both time and space efficient. It is evaluated on a hybrid setup of distributed and shared memory systems. Results suggest that the proposed algorithm achieves significant performance gain over the existing parallel approach

    Accelerating pairwise sequence alignment on GPUs using the Wavefront Algorithm

    Get PDF
    Advances in genomics and sequencing technologies demand faster and more scalable analysis methods that can process longer sequences with higher accuracy. However, classical pairwise alignment methods, based on dynamic programming (DP), impose impractical computational requirements to align long and noisy sequences like those produced by PacBio, and Nanopore technologies. The recently proposed Wavefront Alignment (WFA) algorithm paves the way for more efficient alignment tools, improving time and memory complexity over previous methods. Notwithstanding the advantages of the WFA algorithm, modern high performance computing (HPC) platforms rely on accelerator-based architectures that exploit parallel computing resources to improve over classical computing CPUs. Hence, a GPU-enabled implementation of the WFA could exploit the hardware resources of modern GPUs and further accelerate sequence alignment in current genome analysis pipelines. This thesis presents two GPU-accelerated implementations based on the WFA for fast pairwise DNA sequence alignment: eWFA-GPU and WFA-GPU. Our first proposal, eWFA-GPU, computes the exact edit-distance alignment between two short sequences (up to a few thousand bases), taking full advantage of the massive parallel capabilities of modern GPUs. We propose a succinct representation of the alignment data that successfully reduces the overall amount of memory required, allowing the exploitation of the fast on-chip memory of a GPU. Our results show that eWFA-GPU outperforms by 3-9X the edit-distance WFA implementation running on a 20 core machine. Compared to other state-of-the-art tools computing the edit-distance, eWFA-GPU is up to 265X faster than CPU tools and up to 56 times faster than other GPU-enabled implementations. Our second contribution, the WFA-GPU tool, extends the work of eWFA-GPU to compute the exact gap-affine distance (i.e., a more general alignment problem) between arbitrary long sequences. In this work, we propose a CPU-GPU co-design capable of performing inter and intra-sequence parallel alignment of multiple sequences, combining a succinct WFA-data representation with an efficient GPU implementation. As a result, we demonstrate that our implementation outperforms the original WFA implementation between 1.5-7.7X times when computing the alignment path, and between 2.6-16X when computing only the alignment score. Moreover, compared to other state-of-the-art tools, the WFA-GPU is up to 26.7X faster than other GPU implementations and up to four orders of magnitude faster than other CPU implementations

    String editing on an SIMD hypercube multicomputer☆

    Get PDF

    An Algorithm for the Longest Common Subsequence and Substring Problem

    Full text link
    In this note, we first introduce a new problem called the longest common subsequence and substring problem. Let XX and YY be two strings over an alphabet Σ\Sigma. The longest common subsequence and substring problem for XX and YY is to find the longest string which is a subsequence of XX and a substring of YY. We propose an algorithm to solve the problem

    Analytical tools for optimizing the error correction performance of arithmetic codes

    No full text
    International audienceIn joint source-channel arithmetic coding (JSCAC) schemes, additional redundancy may be introduced into an arithmetic source code in order to be more robust against transmission errors. The purpose of this work is to provide analytical tools to predict and evaluate the effectiveness of that redundancy. Integer binary Arithmetic Coding (AC) is modeled by a reduced-state automaton in order to obtain a bit-clock trellis describing the encoding process. Considering AC as a trellis code, distance spectra are then derived. In particular, an algorithm to compute the free distance of an arithmetic code is proposed. The obtained code properties allow to compute upper bounds on both bit error and symbol error probabilities and thus provide an objective criterion to analyze the behavior of JSCAC schemes when used on noisy channels. This criterion is then exploited to design efficient error-correcting arithmetic codes. Simulation results highlight the validity of the theoretical error bounds and show that for equivalent rate and complexity, a simple optimization yields JSCACs that outperform classical tandem schemes at low to medium SNR

    One-class classifiers based on entropic spanning graphs

    Get PDF
    One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the α\alpha-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad
    • …
    corecore