973 research outputs found

    Structural Pattern Recognition for Chemical-Compound Virtual Screening

    Get PDF
    Les molècules es configuren de manera natural com a xarxes, de manera que són ideals per estudiar utilitzant les seves representacions gràfiques, on els nodes representen àtoms i les vores representen els enllaços químics. Una alternativa per a aquesta representació directa és el gràfic reduït ampliat, que resumeix les estructures químiques mitjançant descripcions de nodes de tipus farmacòfor per codificar les propietats moleculars rellevants. Un cop tenim una manera adequada de representar les molècules com a gràfics, hem de triar l’eina adequada per comparar-les i analitzar-les. La distància d'edició de gràfics s'utilitza per resoldre la concordança de gràfics tolerant als errors; aquesta metodologia calcula la distància entre dos gràfics determinant el nombre mínim de modificacions necessàries per transformar un gràfic en l’altre. Aquestes modificacions (conegudes com a operacions d’edició) tenen associat un cost d’edició (també conegut com a cost de transformació), que s’ha de determinar en funció del problema. Aquest estudi investiga l’eficàcia d’una comparació molecular basada només en gràfics que utilitza gràfics reduïts ampliats i distància d’edició de gràfics com a eina per a aplicacions de cribratge virtual basades en lligands. Aquestes aplicacions estimen la bioactivitat d'una substància química que utilitza la bioactivitat de compostos similars. Una part essencial d’aquest estudi es centra en l’ús d’aprenentatge automàtic i tècniques de processament del llenguatge natural per optimitzar els costos de transformació utilitzats en les comparacions moleculars amb la distància d’edició de gràfics.Las moléculas tienen la forma natural de redes, lo que las hace ideales para estudiar mediante el empleo de sus representaciones gráficas, donde los nodos representan los átomos y los bordes representan los enlaces químicos. Una alternativa para esta representación sencilla es el gráfico reducido extendido, que resume las estructuras químicas utilizando descripciones de nodos de tipo farmacóforo para codificar las propiedades moleculares relevantes. Una vez que tenemos una forma adecuada de representar moléculas como gráficos, debemos elegir la herramienta adecuada para compararlas y analizarlas. La distancia de edición de gráficos se utiliza para resolver la coincidencia de gráficos tolerante a errores; esta metodología estima una distancia entre dos gráficos determinando el número mínimo de modificaciones necesarias para transformar un gráfico en el otro. Estas modificaciones (conocidas como operaciones de edición) tienen un costo de edición (también conocido como costo de transformación) asociado, que debe determinarse en función del problema. Este estudio investiga la efectividad de una comparación molecular basada solo en gráficos que emplea gráficos reducidos extendidos y distancia de edición de gráficos como una herramienta para aplicaciones de detección virtual basadas en ligandos. Estas aplicaciones estiman la bioactividad de una sustancia química empleando la bioactividad de compuestos similares. Una parte esencial de este estudio se centra en el uso de técnicas de procesamiento de lenguaje natural y aprendizaje automático para optimizar los costos de transformación utilizados en las comparaciones moleculares con la distancia de edición de gráficos.Molecules are naturally shaped as networks, making them ideal for studying by employing their graph representations, where nodes represent atoms and edges represent the chemical bonds. An alternative for this straightforward representation is the extended reduced graph, which summarizes the chemical structures using pharmacophore-type node descriptions to encode the relevant molecular properties. Once we have a suitable way to represent molecules as graphs, we need to choose the right tool to compare and analyze them. Graph edit distance is used to solve the error-tolerant graph matching; this methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications (known as edit operations) have an edit cost (also known as transformation cost) associated, which must be determined depending on the problem. This study investigates the effectiveness of a graph-only driven molecular comparison employing extended reduced graphs and graph edit distance as a tool for ligand-based virtual screening applications. Those applications estimate the bioactivity of a chemical employing the bioactivity of similar compounds. An essential part of this study focuses on using machine learning and natural language processing techniques to optimize the transformation costs used in the molecular comparisons with the graph edit distance. Overall, this work shows a framework that combines graph reduction and comparison with optimization tools and natural language processing to identify bioactivity similarities in a structurally diverse group of molecules. We confirm the efficiency of this framework with several chemoinformatic tests applied to regression and classification problems over different publicly available datasets

    Learning templates from fuzzy examples in structural pattern recognition

    Get PDF
    Fuzzy-Attribute Graph (FAG) was proposed to handle fuzziness in the pattern primitives in structural pattern recognition. FAG has the advantage that we can combine several possible definition into a single template. However, the template require a human expert to define. In this paper, we propose an algorithm that can; from a number of fuzzy instances, find a template that can be matched to the patterns by the original matching metric.published_or_final_versio

    Irregular graph pyramids and representative cocycles of cohomology generators

    Get PDF
    Structural pattern recognition describes and classifies data based on the relationships of features and parts. Topological invariants, like the Euler number, characterize the structure of objects of any dimension. Cohomology can provide more refined algebraic invariants to a topological space than does homology. It assigns ‘quantities’ to the chains used in homology to characterize holes of any dimension. Graph pyramids can be used to describe subdivisions of the same object at multiple levels of detail. This paper presents cohomology in the context of structural pattern recognition and introduces an algorithm to efficiently compute representative cocycles (the basic elements of cohomology) in 2D using a graph pyramid. Extension to nD and application in the context of pattern recognition are discussed

    Invariant Representative Cocycles of Cohomology Generators using Irregular Graph Pyramids

    Get PDF
    Structural pattern recognition describes and classifies data based on the relationships of features and parts. Topological invariants, like the Euler number, characterize the structure of objects of any dimension. Cohomology can provide more refined algebraic invariants to a topological space than does homology. It assigns `quantities' to the chains used in homology to characterize holes of any dimension. Graph pyramids can be used to describe subdivisions of the same object at multiple levels of detail. This paper presents cohomology in the context of structural pattern recognition and introduces an algorithm to efficiently compute representative cocycles (the basic elements of cohomology) in 2D using a graph pyramid. An extension to obtain scanning and rotation invariant cocycles is given.Comment: Special issue on Graph-Based Representations in Computer Visio

    Toward a multilevel representation of protein molecules: comparative approaches to the aggregation/folding propensity problem

    Full text link
    This paper builds upon the fundamental work of Niwa et al. [34], which provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free standardized microenvironment. The hardness of the problem comes from the superposition between the driving forces of intra- and inter-molecule interactions and it is mirrored by the evidences of shift from folding to aggregation phenotypes by single-point mutations [10]. Here we apply several state-of-the-art classification methods coming from the field of structural pattern recognition, with the aim to compare different representations of the same proteins gathered from the Niwa et al. data base; such representations include sequences and labeled (contact) graphs enriched with chemico-physical attributes. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating "easily foldable" from "hardly foldable" molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution used in the various discrimination systems.Comment: 17 pages, 3 figures, 46 reference

    GEDLIB: Une bibliothèque C++ pour le calcul de la distance d'édition sur graphes

    Get PDF
    International audienceThe graph edit distance (GED) is a flexible graph dissimilarity measure widely used within the structural pattern recognition field. In this paper, we present GEDLIB, a C++ library for exactly or approximately computing GED. Many existing algorithms for GED are already implemented in GEDLIB. Moreover, GEDLIB is designed to be easily extensible: for implementing new edit cost functions and GED algorithms, it suffices to implement abstract classes contained in the library. For implementing these extensions, the user has access to a wide range of utilities, such as deep neural networks, support vector machines, mixed integer linear programming solvers, a blackbox optimizer, and solvers for the linear sum assignment problem with and without error-correction
    corecore