1,055 research outputs found

    Graph edit distance from spectral seriation

    Get PDF
    This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that they lack some of the formality and rigor of the computation of string edit distance. Hence, our aim is to convert graphs to string sequences so that string matching techniques can be used. To do this, we use a graph spectral seriation method to convert the adjacency matrix into a string or sequence order. We show how the serial ordering can be established using the leading eigenvector of the graph adjacency matrix. We pose the problem of graph-matching as a maximum a posteriori probability (MAP) alignment of the seriation sequences for pairs of graphs. This treatment leads to an expression in which the edit cost is the negative logarithm of the a posteriori sequence alignment probability. We compute the edit distance by finding the sequence of string edit operations which minimizes the cost of the path traversing the edit lattice. The edit costs are determined by the components of the leading eigenvectors of the adjacency matrix and by the edge densities of the graphs being matched. We demonstrate the utility of the edit distance on a number of graph clustering problems

    A New Computational Framework for Efficient Parallelization and Optimization of Large Scale Graph Matching

    Get PDF
    There are so many applications in data fusion, comparison, and recognition that require a robust and efficient algorithm to match features of multiple images. To improve accuracy and get a more stable result is important to take into consideration both local appearance and the pairwise relationship of features. Graphs are a powerful and flexible data structure, allowing for the description of complex relationships between data elements, whose nodes correspond to salient features and edges correspond to relational aspects between features. Therefore, the problem of graph matching is to find a mapping between the two sets of nodes that preserves the relationships between them as much as possible. This graph-matching problem is mathematically formulated as an IQP problem which solving it is NP-hard, and obtaining exact Optima only plausible for very small data. Therefore, handling large-scale scientific visual data is quite limited, necessitating both efficient serial algorithms, as well as scalable parallel formulations. In this thesis, we first focused on exploring techniques to reduce the computation cost as well as memory usage of Pairwise graph matching by adopting a heuristic pruning strategy together with a redundancy pattern suppression scheme. We also modified the structure of the affinity matrix for minimizing memory requirement and parallelizing our algorithm by employing CPU’s and GPU’s accelerated libraries. Any pair of features with similar distance from first image results in same sub-matrices, therefore instead of constructing the whole affinity matrix, we only built the sub-blocked affinity for those distinct feature distances. By employing this scheme not only saved large memory and reduced computation time tremendously but also, the matrix-vector multiplication of gradient computation performed in parallel, where each block-vector calculation computed independently without synchronization. The accelerated libraries such as MKL, cuSparse, cuBlas and thrust applied to solving the GM problem, following the scheme of the spectral matching algorithm. We also extended our work for Multi-graph imaging, since many tasks require finding correspondences across multiple images. Also, considering more graph improves the matching accuracy. Most algorithms obtain approximate solutions for solving the GM NP-hard problem, result in a weak optimal solution. Therefore, we proposed a new solver, which iteratively modified the affinity matrix and binarized the solution by optimizing the original problem with its integer constraints

    PharmacoNet: Accelerating Large-Scale Virtual Screening by Deep Pharmacophore Modeling

    Full text link
    As the size of accessible compound libraries expands to over 10 billion, the need for more efficient structure-based virtual screening methods is emerging. Different pre-screening methods have been developed for rapid screening, but there is still a lack of structure-based methods applicable to various proteins that perform protein-ligand binding conformation prediction and scoring in an extremely short time. Here, we describe for the first time a deep-learning framework for structure-based pharmacophore modeling to address this challenge. We frame pharmacophore modeling as an instance segmentation problem to determine each protein hotspot and the location of corresponding pharmacophores, and protein-ligand binding pose prediction as a graph-matching problem. PharmacoNet is significantly faster than state-of-the-art structure-based approaches, yet reasonably accurate with a simple scoring function. Furthermore, we show the promising result that PharmacoNet effectively retains hit candidates even under the high pre-screening filtration rates. Overall, our study uncovers the hitherto untapped potential of a pharmacophore modeling approach in deep learning-based drug discovery.Comment: 21 pages, 5 figure

    Image Understanding by Hierarchical Symbolic Representation and Inexact Matching of Attributed Graphs

    Get PDF
    We study the symbolic representation of imagery information by a powerful global representation scheme in the form of Attributed Relational Graph (ARG), and propose new techniques for the extraction of such representation from spatial-domain images, and for performing the task of image understanding through the analysis of the extracted ARG representation. To achieve practical image understanding tasks, the system needs to comprehend the imagery information in a global form. Therefore, we propose a multi-layer hierarchical scheme for the extraction of global symbolic representation from spatial-domain images. The proposed scheme produces a symbolic mapping of the input data in terms of an output alphabet, whose elements are defined over global subimages. The proposed scheme uses a combination of model-driven and data-driven concepts. The model- driven principle is represented by a graph transducer, which is used to specify the alphabet at each layer in the scheme. A symbolic mapping is driven by the input data to map the input local alphabet into the output global alphabet. Through the iterative application of the symbolic transformational mapping at different levels of hierarchy, the system extracts a global representation from the image in the form of attributed relational graphs. Further processing and interpretation of the imagery information can, then, be performed on their ARG representation. We also propose an efficient approach for calculating a distance measure and finding the best inexact matching configuration between attributed relational graphs. For two ARGs, we define sequences of weighted error-transformations which when performed on one ARG (or a subgraph of it), will produce the other ARG. A distance measure between two ARGs is defined as the weight of the sequence which possesses minimum total-weight. Moreover, this minimum-total weight sequence defines the best inexact matching configuration between the two ARGs. The global minimization over the possible sequences is performed by a dynamic programming technique, the approach shows good results for ARGs of practical sizes. The proposed system possesses the capability to inference the alphabets of the ARG representation which it uses. In the inference phase, the hierarchical scheme is usually driven by the input data only, which normally consist of images of model objects. It extracts the global alphabet of the ARG representation of the models. The extracted model representation is then used in the operation phase of the system to: perform the mapping in the multi-layer scheme. We present our experimental results for utilizing the proposed system for locating objects in complex scenes

    Vascular Tree Structure: Fast Curvature Regularization and Validation

    Get PDF
    This work addresses the challenging problem of accurate vessel structure analysis in high resolution 3D biomedical images. Typical segmentation methods fail on recent micro-CT data sets resolving near-capillary vessels due to limitations of standard first-order regularization models. While regularization is needed to address noise and partial volume issues in the data, we argue that extraction of thin tubular structures requires higher-order curvature-based regularization. There are no standard segmentation methods regularizing surface curvature in 3D that could be applied to large 3D volumes. However, we observe that standard measures for vessels structure are more concerned with topology, bifurcation angles, and other parameters that can be directly addressed without segmentation. We propose a novel methodology reconstructing tree structure of the vessels using a new centerline curvature regularization technique. Our high-order regularization model is based on a recent curvature estimation method. We developed a Levenberg-Marquardt optimization scheme and an efficient GPU-based implementation of our algorithm. We also propose a validation mechanism based on synthetic vessel images. Our preliminary results on real ultra-resolution micro CT volumes are promising

    A Geometric Approach for Deciphering Protein Structure from Cryo-EM Volumes

    Get PDF
    Electron Cryo-Microscopy or cryo-EM is an area that has received much attention in the recent past. Compared to the traditional methods of X-Ray Crystallography and NMR Spectroscopy, cryo-EM can be used to image much larger complexes, in many different conformations, and under a wide range of biochemical conditions. This is because it does not require the complex to be crystallisable. However, cryo-EM reconstructions are limited to intermediate resolutions, with the state-of-the-art being 3.6A, where secondary structure elements can be visually identified but not individual amino acid residues. This lack of atomic level resolution creates new computational challenges for protein structure identification. In this dissertation, we present a suite of geometric algorithms to address several aspects of protein modeling using cryo-EM density maps. Specifically, we develop novel methods to capture the shape of density volumes as geometric skeletons. We then use these skeletons to find secondary structure elements: SSEs) of a given protein, to identify the correspondence between these SSEs and those predicted from the primary sequence, and to register high-resolution protein structures onto the density volume. In addition, we designed and developed Gorgon, an interactive molecular modeling system, that integrates the above methods with other interactive routines to generate reliable and accurate protein backbone models

    A Survey on Image Mining Techniques: Theory and Applications

    Get PDF
    Image mining is a vital technique which is used to mine knowledge straightforwardly from image. Image segmentation is the primary phase in image mining. Image mining is simply an expansion of data mining in the field of image processing. Image mining handles with the hidden knowledge extraction, image data association and additional patterns which are not clearly accumulated in the images. It is an interdisciplinary field that integrates techniques like computer vision, image processing, data mining, machine learning, data base and artificial intelligence. The most important function of the mining is to generate all significant patterns without prior information of the patterns. Rule mining has been adopting to huge image data bases. Mining has been done in accordance with the integrated collections of images and its related data. Numerous researches have been carried on this image mining. This paper presents a survey on various image mining techniques that were proposed earlier in literature. Also, this paper provides a marginal overview for future research and improvements. Keywords— Data Mining, Image Mining, Knowledge Discovery, Segmentation, Machine Learning, Artificial Intelligence, Rule Mining, Datasets

    Evaluating the usability and security of a video CAPTCHA

    Get PDF
    A CAPTCHA is a variation of the Turing test, in which a challenge is used to distinguish humans from computers (`bots\u27) on the internet. They are commonly used to prevent the abuse of online services. CAPTCHAs discriminate using hard articial intelligence problems: the most common type requires a user to transcribe distorted characters displayed within a noisy image. Unfortunately, many users and them frustrating and break rates as high as 60% have been reported (for Microsoft\u27s Hotmail). We present a new CAPTCHA in which users provide three words (`tags\u27) that describe a video. A challenge is passed if a user\u27s tag belongs to a set of automatically generated ground-truth tags. In an experiment, we were able to increase human pass rates for our video CAPTCHAs from 69.7% to 90.2% (184 participants over 20 videos). Under the same conditions, the pass rate for an attack submitting the three most frequent tags (estimated over 86,368 videos) remained nearly constant (5% over the 20 videos, roughly 12.9% over a separate sample of 5146 videos). Challenge videos were taken from YouTube.com. For each video, 90 tags were added from related videos to the ground-truth set; security was maintained by pruning all tags with a frequency 0.6%. Tag stemming and approximate matching were also used to increase human pass rates. Only 20.1% of participants preferred text-based CAPTCHAs, while 58.2% preferred our video-based alternative. Finally, we demonstrate how our technique for extending the ground truth tags allows for different usability/security trade-offs, and discuss how it can be applied to other types of CAPTCHAs
    • …