785 research outputs found
Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity
The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level.
Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism.
From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable.
In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems.
Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis
Training Optimization for Artificial Neural Networks
Debido a la habilidad para modelar problemas complejos, actualmente las Redes Neuronales Artificiales (nn) son muy populares en Reconocimiento de Patrones, MinerĂa de Datos y Aprendizaje AutomĂĄtico. No obstante, el elevado costo computacional asociado a la fase en entrenamiento, cuando grandes bases de datos son utilizados, es su principal desventaja. Con la intenciĂłn de disminuir el costo computacional e incrementar la convergencia de la nn, el presente trabajo analiza la conveniencia de realizar pre-procesamiento a los conjuntos de datos. De forma especĂfica, se evalĂșan los mĂ©todos de grafo de vecindad relativa (rng), grafo de Gabriel (gg) y el mĂ©todo basado en los vecinos envolventes k-ncn. Los resultados experimentales muestran la factibilidad y las mĂșltiples ventajas de esas metodologĂas para solventar los problemas descritos previamente.Debido a la habilidad para modelar problemas complejos, actualmente las Redes Neuronales ArtiĂciales (nn) son muy populares en Reconocimiento de Patrones, MinerĂa de Datos y Aprendizaje AutomĂĄtico. No obstante, el elevado costo computacional asociado a la fase en entrenamiento, cuando grandes bases de datos son utilizados, es su principal desventaja. Con la intenciĂłn de disminuir el costo computacional e incrementar la convergencia de la nn, el presente trabajo analiza la conveniencia de realizar pre-procesamiento a los conjuntos de datos. De forma especĂĂca, se evalĂșan los mĂ©todos de grafo de vecindad relativa (rng), grafo de Gabriel (gg) y el mĂ©todo basado en los vecinos envolventes k-ncn. Los resultados experimentales muestran la factibilidad y las mĂșltiples ventajas de esas metodologĂas para solventar los problemas descritos previament
Predictive Pattern Discovery in Dynamic Data Systems
This dissertation presents novel methods for analyzing nonlinear time series in dynamic systems. The purpose of the newly developed methods is to address the event prediction problem through modeling of predictive patterns. Firstly, a novel categorization mechanism is introduced to characterize different underlying states in the system. A new hybrid method was developed utilizing both generative and discriminative models to address the event prediction problem through optimization in multivariate systems.
Secondly, in addition to modeling temporal dynamics, a Bayesian approach is employed to model the first-order Markov behavior in the multivariate data sequences. Experimental evaluations demonstrated superior performance over conventional methods, especially when the underlying system is chaotic and has heterogeneous patterns during state transitions.
Finally, the concept of adaptive parametric phase space is introduced. The equivalence between time-domain phase space and associated parametric space is theoretically analyzed
Biometric Authentication using Nonparametric Methods
The physiological and behavioral trait is employed to develop biometric
authentication systems. The proposed work deals with the authentication of iris
and signature based on minimum variance criteria. The iris patterns are
preprocessed based on area of the connected components. The segmented image
used for authentication consists of the region with large variations in the
gray level values. The image region is split into quadtree components. The
components with minimum variance are determined from the training samples. Hu
moments are applied on the components. The summation of moment values
corresponding to minimum variance components are provided as input vector to
k-means and fuzzy kmeans classifiers. The best performance was obtained for MMU
database consisting of 45 subjects. The number of subjects with zero False
Rejection Rate [FRR] was 44 and number of subjects with zero False Acceptance
Rate [FAR] was 45. This paper addresses the computational load reduction in
off-line signature verification based on minimal features using k-means, fuzzy
k-means, k-nn, fuzzy k-nn and novel average-max approaches. FRR of 8.13% and
FAR of 10% was achieved using k-nn classifier. The signature is a biometric,
where variations in a genuine case, is a natural expectation. In the genuine
signature, certain parts of signature vary from one instance to another. The
system aims to provide simple, fast and robust system using less number of
features when compared to state of art works.Comment: 20 page
Computational Analysis of Structure-Activity Relationships : From Prediction to Visualization Methods
Understanding how structural modifications affect the biological activity of small molecules is one of the central themes in medicinal chemistry. By no means is structure-activity relationship (SAR) analysis a priori dependent on computational methods. However, as molecular data sets grow in size, we quickly approach our limits to access and compare structures and associated biological properties so that computational data processing and analysis often become essential. Here, different types of approaches of varying complexity for the analysis of SAR information are presented, which can be applied in the context of screening and chemical optimization projects. The first part of this thesis is dedicated to machine-learning strategies that aim at de novo ligand prediction and the preferential detection of potent hits in virtual screening. High emphasis is put on benchmarking of different strategies and a thorough evaluation of their utility in practical applications. However, an often claimed disadvantage of these prediction methods is their "black box" character because they do not necessarily reveal which structural features are associated with biological activity. Therefore, these methods are complemented by more descriptive SAR analysis approaches showing a higher degree of interpretability. Concepts from information theory are adapted to identify activity-relevant structure-derived descriptors. Furthermore, compound data mining methods exploring prespecified properties of available bioactive compounds on a large scale are designed to systematically relate molecular transformations to activity changes. Finally, these approaches are complemented by graphical methods that primarily help to access and visualize SAR data in congeneric series of compounds and allow the formulation of intuitive SAR rules applicable to the design of new compounds. The compendium of SAR analysis tools introduced in this thesis investigates SARs from different perspectives
- âŠ