785 research outputs found

    Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity

    Get PDF
    The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level. Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism. From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable. In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems. Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis

    Training Optimization for Artificial Neural Networks

    Get PDF
    Debido a la habilidad para modelar problemas complejos, actualmente las Redes Neuronales Artificiales (nn) son muy populares en Reconocimiento de Patrones, MinerĂ­a de Datos y Aprendizaje AutomĂĄtico. No obstante, el elevado costo computacional asociado a la fase en entrenamiento, cuando grandes bases de datos son utilizados, es su principal desventaja. Con la intenciĂłn de disminuir el costo computacional e incrementar la convergencia de la nn, el presente trabajo analiza la conveniencia de realizar pre-procesamiento a los conjuntos de datos. De forma especĂ­fica, se evalĂșan los mĂ©todos de grafo de vecindad relativa (rng), grafo de Gabriel (gg) y el mĂ©todo basado en los vecinos envolventes k-ncn. Los resultados experimentales muestran la factibilidad y las mĂșltiples ventajas de esas metodologĂ­as para solventar los problemas descritos previamente.Debido a la habilidad para modelar problemas complejos, actualmente las Redes Neuronales ArtiÀciales (nn) son muy populares en Reconocimiento de Patrones, MinerĂ­a de Datos y Aprendizaje AutomĂĄtico. No obstante, el elevado costo computacional asociado a la fase en entrenamiento, cuando grandes bases de datos son utilizados, es su principal desventaja. Con la intenciĂłn de disminuir el costo computacional e incrementar la convergencia de la nn, el presente trabajo analiza la conveniencia de realizar pre-procesamiento a los conjuntos de datos. De forma especíÀca, se evalĂșan los mĂ©todos de grafo de vecindad relativa (rng), grafo de Gabriel (gg) y el mĂ©todo basado en los vecinos envolventes k-ncn. Los resultados experimentales muestran la factibilidad y las mĂșltiples ventajas de esas metodologĂ­as para solventar los problemas descritos previament

    Predictive Pattern Discovery in Dynamic Data Systems

    Get PDF
    This dissertation presents novel methods for analyzing nonlinear time series in dynamic systems. The purpose of the newly developed methods is to address the event prediction problem through modeling of predictive patterns. Firstly, a novel categorization mechanism is introduced to characterize different underlying states in the system. A new hybrid method was developed utilizing both generative and discriminative models to address the event prediction problem through optimization in multivariate systems. Secondly, in addition to modeling temporal dynamics, a Bayesian approach is employed to model the first-order Markov behavior in the multivariate data sequences. Experimental evaluations demonstrated superior performance over conventional methods, especially when the underlying system is chaotic and has heterogeneous patterns during state transitions. Finally, the concept of adaptive parametric phase space is introduced. The equivalence between time-domain phase space and associated parametric space is theoretically analyzed

    Biometric Authentication using Nonparametric Methods

    Full text link
    The physiological and behavioral trait is employed to develop biometric authentication systems. The proposed work deals with the authentication of iris and signature based on minimum variance criteria. The iris patterns are preprocessed based on area of the connected components. The segmented image used for authentication consists of the region with large variations in the gray level values. The image region is split into quadtree components. The components with minimum variance are determined from the training samples. Hu moments are applied on the components. The summation of moment values corresponding to minimum variance components are provided as input vector to k-means and fuzzy kmeans classifiers. The best performance was obtained for MMU database consisting of 45 subjects. The number of subjects with zero False Rejection Rate [FRR] was 44 and number of subjects with zero False Acceptance Rate [FAR] was 45. This paper addresses the computational load reduction in off-line signature verification based on minimal features using k-means, fuzzy k-means, k-nn, fuzzy k-nn and novel average-max approaches. FRR of 8.13% and FAR of 10% was achieved using k-nn classifier. The signature is a biometric, where variations in a genuine case, is a natural expectation. In the genuine signature, certain parts of signature vary from one instance to another. The system aims to provide simple, fast and robust system using less number of features when compared to state of art works.Comment: 20 page

    Computational Analysis of Structure-Activity Relationships : From Prediction to Visualization Methods

    Get PDF
    Understanding how structural modifications affect the biological activity of small molecules is one of the central themes in medicinal chemistry. By no means is structure-activity relationship (SAR) analysis a priori dependent on computational methods. However, as molecular data sets grow in size, we quickly approach our limits to access and compare structures and associated biological properties so that computational data processing and analysis often become essential. Here, different types of approaches of varying complexity for the analysis of SAR information are presented, which can be applied in the context of screening and chemical optimization projects. The first part of this thesis is dedicated to machine-learning strategies that aim at de novo ligand prediction and the preferential detection of potent hits in virtual screening. High emphasis is put on benchmarking of different strategies and a thorough evaluation of their utility in practical applications. However, an often claimed disadvantage of these prediction methods is their "black box" character because they do not necessarily reveal which structural features are associated with biological activity. Therefore, these methods are complemented by more descriptive SAR analysis approaches showing a higher degree of interpretability. Concepts from information theory are adapted to identify activity-relevant structure-derived descriptors. Furthermore, compound data mining methods exploring prespecified properties of available bioactive compounds on a large scale are designed to systematically relate molecular transformations to activity changes. Finally, these approaches are complemented by graphical methods that primarily help to access and visualize SAR data in congeneric series of compounds and allow the formulation of intuitive SAR rules applicable to the design of new compounds. The compendium of SAR analysis tools introduced in this thesis investigates SARs from different perspectives
    • 

    corecore