1,209 research outputs found

    Quasi-Convex Scoring Functions in Branch-and-Bound Ranked Search

    Get PDF
    For answering top-k queries in which attributes are aggregated to a scalar value for defining a ranking, usually the well-known branch-and-bound principle can be used for efficient query answering. Standard algorithms (e.g., Branch-and-Bound Ranked Search, BRS for short) require scoring functions to be monotone, such that a top-k ranking can be computed in sublinear time in the average case. If monotonicity cannot be guaranteed, efficient query answering algorithms are not known. To make branch-and-bound effective with descending or ascending rankings (maximum top-k or minimum top-k queries, respectively), BRS must be able to identify bounds for exploring search partitions, and only for monotonic ranking functions this is trivial. In this paper, we investigate the class of quasi-convex functions used for scoring objects, and we examine how bounds for exploring data partitions can correctly and efficiently be computed for quasi-convex functions in BRS for maximum top-k queries. Given that quasi-convex scoring functions can usefully be employed for ranking objects in a variety of applications, the mathematical findings presented in this paper are indeed significant for practical top-k query answering

    Branch-and-Bound Ranked Search by Minimizing Parabolic Polynomials

    Get PDF
    The Branch-and-Bound Ranked Search algorithm (BRS) is an efficient method for answering top-k queries based on R-trees using multivariate scoring functions. To make BRS effective with ascending rankings, the algorithm must be able to identify lower bounds of the scoring functions for exploring search partitions. This paper presents BRS supporting parabolic polynomials. These functions are common to minimize combined scores over different attributes and cover a variety of applications. To the best of our knowledge the problem to develop an algorithm for computing lower bounds for the BRS method has not been well addressed yet

    Optimization in bioinformatics

    Get PDF
    In this work, we present novel optimization approaches for important bioinformatical problems. The rst part deals mainly with the local optimization of molecular structures and its applications to molecular docking, while the second part discusses discrete global optimization. In the rst part, we present a novel algorithm to an old task: nd the next local optimum into a given direction on a molecular potential energy function (line search). We show that replacing a standard line search method with the new algorithm reduced the number of function/gradient evaluations in our test runs down to 47.7% (down to 85% on average) . Then, we include this method into our novel approach for locally optimizing exible ligands in the presence of their receptors, which we describe in detail, avoiding the singularity problem of orientational parameters. We extend this approach to a full ligand-receptor docking program using a Lamarckian genetic algorithm. Our validation runs show that we gained an up to tenfold speedup in comparison to other tested methods. Then, we further incorporate side chain exibility of the receptor into our approach and introduce limited backbone exibility by interpolating between known extremal conformations using spherical linear extrapolation. Our results show that this approach is very promising for exible ligand-receptor docking. However, the drawback is that we need known extremal backbone conformations for the interpolation. In the last section of the rst part, we allow a loop region to be fully exible. We present a new method to nd all possible conformations using the Go-Scheraga ring closure equations and interval arithmetic. Our results show that this algorithm reliably nds alternative conformations and is able to identify promising loop/ligand complexes of the studied example. In the second part of this work, we describe the bond order assignment problem for molecular structures. We present our novel linear 0-1-programming formulation for the very efficient computation of all optimal and suboptimal bond order assignments and show that our approach does not only outperform the original heuristic approach of Wang et al. but also commonly used software for determining bond orders on our test set considering all optimal results. This test set consists of 761 thoroughly prepared drug like molecules that were originally used for the validation of the Merck Molecular Force Field. Then, we present our lter method for feature subset selection that is based on mutual information and uses second order information. We show our mathematically well motivated criterion and, in contrast to other methods, solve the resulting optimization problem exactly by quadratic 0-1-programming. In the validation runs, our method could achieve in 18 out of 21 test scenarios the best classification accuracies. In the last section, we give our integer linear programming formulation for the detection of deregulated subgraphs in regulatory networks using expression proles. Our approach identies the subnetwork of a certain size of the regulatory network with the highest sum of node scores. To demonstrate the capabilities of our algorithm, we analyzed expression proles from nonmalignant primary mammary epithelial cells derived from BRCA1 mutation carriers and epithelial cells without BRCA1 mutation. Our results suggest that oxidative stress plays an important role in epithelial cells with BRCA1 mutations that may contribute to the later development of breast cancer. The application of our algorithm to already published data can yield new insights. As expression data and network data are still growing, methods as our algorithm will be valuable to detect deregulated subgraphs in different conditions and help contribute to a better understanding of diseases.In der vorliegenden Arbeit prĂ€sentieren wir neue OptimierungsansĂ€tze fĂŒr wichtige Probleme der Bioinformatik. Der erste Teil behandelt vorwiegend die lokale Optimierung von MolekĂŒlen und die Anwendung beim molekularen Docking. Der zweite Teil diskutiert diskrete globale Optimierung. Im ersten Teil prĂ€sentieren wir einen neuartigen Algorithmus fĂŒr ein altes Problem: finde das nĂ€chste lokale Optimum in einer gegebenen Richtung auf einer Energiefunktion (Liniensuche, "line search"). Wir zeigen, dass die Ersetzung einer Standardliniensuche mit unserer neuen Methode die Anzahl der Funktions- und Gradientauswertungen in unseren TestlĂ€ufen auf bis zu 47.7% reduzierte (85% im Mittel). Danach nehmen wir diese Methode in unseren neuen Ansatz zur lokalen Optimierung von flexiblen Liganden im Beisein ihres Rezeptors auf, den wir im Detail beschreiben. Unser Verfahren vermeidet das SingularitĂ€tsproblem von Orientierungsparametern. Wir erweitern diese Methode zu einem vollstĂ€ndigen Liganden-Rezeptor-Dockingprogramm, indem wir einen Lamarck'schen genetischen Algorithmus einsetzen. Unsere ValidierungslĂ€ufe zeigen, dass wir im Vergleich zu anderen getesteten Methoden einen bis zu zehnfachen Geschwindigkeitszuwachs erreichen. Danach arbeiten wir in unseren Ansatz Seitenketten- und begrenzte Backbone exibilitĂ€t ein, indem wir zwischen bekannten Extremkonformationen mittels sphĂ€rischer linearer Extrapolation interpolieren. Unsere Resultate zeigen, dass unsere Methode sehr viel versprechend fĂŒr flexibles Liganden-Rezeptor-Docking ist. Dennoch hat dieser Ansatz den Nachteil, dass man bekannte Extremkonformationen des Backbones fĂŒr die Interpolation benötigt. Im letzten Abschnitt des ersten Teils behandeln wir eine Loopregion voll flexibel. Wir zeigen eine neue Methode, die die Go-Scheraga Ringschlussgleichungen und Intervalarithmetik nutzt, um alle möglichen Konformationen zu nden. Unsere Resultate zeigen, dass dieser Algorithmus zuverlĂ€ssig in der Lage ist, alternative Konformationen zu nden. Er identiziert sehr vielversprechende Loop-Ligandenkomplexe unseres Testbeispiels. Im zweiten Teil dieser Arbeit beschreiben wir das Bindungsordnungszuweisungsproblem von MolekĂŒlen. Wir prĂ€sentieren unsere neuartige Formulierung, die auf linearer 0-1-Programmierung basiert. Dieser Ansatz ist in der Lage sehr effizient alle optimalen und suboptimalen Bindngsordnungszuweisungen zu berechnen. Unsere Methode ist nicht nur besser als der ursprĂŒngliche Ansatz von Wang et al., sondern auch weitverbreiteter Software zur Bindungszuordnung auf unserem Testdatensatz ĂŒberlegen. Dieser Datensatz besteht aus 761 sorgfĂ€ltig prĂ€parierten, arzneimittelĂ€hnlichen MolekĂŒlen, die ursprĂŒnglich zur Validierung des Merck-Kraftfeldes eingesetzt wurden. Danach prĂ€sentieren wir unsere Filtermethode zur "Feature Subset Selection", die auf "Mutual Information" basiert und Informationen zweiter Ordnung nutzt. Wir geben unser mathematisch motiviertes Kriterium an und lösen das resultierende Optimierungsproblem global optimal im Gegensatz zu anderen AnsĂ€tzen. In unseren ValidierungslĂ€ufen konnte unsere Methode in 18 von 21 Testszenarien die beste Klassizierungsrate erreichen. Im letzten Abschnitt geben wir unsere, auf linearer 0-1-Programmierung basierende Formulierung zur Berechnung von deregulierten Untergraphen in regulatorischen Netzwerken an. Die Basisdaten fĂŒr diese Methode sind Expressionsprole. Unser Ansatz identiziert die Unternetze einer gewissen GrĂ¶ĂŸe mit der höchsten Summe der Knotenscores. Wir analysierten Expressionsprole von nicht bösartigen Brustepithelzellen von BRCA1 MutationstrĂ€gern und Epithelzellen ohne BRCA1 Mutation, um die FĂ€higkeiten unseres Algorithmuses zu demonstrieren. Unsere Resultate legen nahe, dass oxidativer Stress eine wichtige Rolle bei Epithelzellen mit BRCA1 Mutation spielt, der zur spĂ€teren Entwicklung von Brustkrebs beitragen könnte. Die Anwendung unseres Ansatzes auf bereits publizierte Daten kann zu neuen Erkenntnissen fĂŒhren. Da sowohl Expressions- wie auch Netzwerkdaten stĂ€ndig anwachsen, sind es Methoden wie unser Algorithmus die wertvoll sein werden, um deregulierte Subgraphen in verschiedenen Situationen zu entdecken. Damit trĂ€gt unser Ansatz zu einem besseren VerstĂ€ndnis von Krankheiten und deren Verlauf bei

    Software for molecular docking: a review

    Get PDF
    Publshed ArticleMolecular docking methodology explores the behavior of small molecules in the binding site of a target protein. As more protein structures are determined experimentally using X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy, molecular docking is increasingly used as a tool in drug discovery. Docking against homologymodeled targets also becomes possible for proteins whose structures are not known. With the docking strategies, the druggability of the compounds and their specificity against a particular target can be calculated for further lead optimization processes. Molecular docking programs perform a search algorithm in which the conformation of the ligand is evaluated recursively until the convergence to the minimum energy is reached. Finally, an affinity scoring function, ΔG [U total in kcal/mol], is employed to rank the candidate poses as the sum of the electrostatic and van der Waals energies. The driving forces for these specific interactions in biological systems aim toward complementarities between the shape and electrostatics of the binding site surfaces and the ligand or substrate

    Monotonicity-based consensus states for the monometric rationalisation of ranking rules with application in decision making

    Get PDF

    Development and Application of Pseudoreceptor Modeling

    Get PDF
    Quantitative Structure-Activity Relationship (QSAR) methods are a commonly used tool in the drug discovery process. These methods attempt to form a statistical model that relates descriptor properties of a ligand to the activity of that ligand compound towards a specific desired physiological response. QSAR methods are known as a ligand-based method, as they specifically use information from ligands and not protein structural data. However, a derivation of QSAR methods are pseudoreceptor methods. Pseudoreceptor methods go beyond standard QSAR by building a model representation of the protein pocket. However, the ability of pseudoreceptors to accurately replicate natural protein surfaces has not been studied. The goal of this thesis work is to investigate the necessary descriptors to map a protein binding pocket and a method to accurately recreate the 3-D spatial structure of the binding pocket. In addition, additional applications of existing pseudoreceptor methods are explored

    Bottom-up Object Segmentation for Visual Recognition

    Get PDF
    Automatic recognition and segmentation of objects in images is a central open problem in computer vision. Most previous approaches have pursued either sliding-window object detection or dense classification of overlapping local image patches. Differently, the framework introduced in this thesis attempts to identify the spatial extent of objects prior to recognition, using bottom-up computational processes and mid-level selection cues. After a set of plausible object hypotheses is identified, a sequential recognition process is executed, based on continuous estimates of the spatial overlap between the image segment hypotheses and each putative class. The object hypotheses are represented as figure-ground segmentations, and are extracted automatically, without prior knowledge of the properties of individual object classes, by solving a sequence of constrained parametric min-cut problems (CPMC) on a regular image grid. It is show that CPMC significantly outperforms the state of the art for low-level segmentation in the PASCAL VOC 2009 and 2010 datasets. Results beyond the current state of the art for image classification, object detection and semantic segmentation are also demonstrated in a number of challenging datasets including Caltech-101, ETHZ-Shape as well as PASCAL VOC 2009-11. These results suggest that a greater emphasis on grouping and image organization may be valuable for making progress in high-level tasks such as object recognition and scene understanding

    New approaches to protein docking

    Get PDF
    In the first part of this work, we propose new methods for protein docking. First, we present two approaches to protein docking with flexible side chains. The first approach is a fast greedy heuristic, while the second is a branch -&-cut algorithm that yields optimal solutions. For a test set of protease-inhibitor complexes, both approaches correctly predict the true complex structure. Another problem in protein docking is the prediction of the binding free energy, which is the the final step of many protein docking algorithms. Therefore, we propose a new approach that avoids the expensive and difficult calculation of the binding free energy and, instead, employs a scoring function that is based on the similarity of the proton nuclear magnetic resonance spectra of the tentative complexes with the experimental spectrum. Using this method, we could even predict the structure of a very difficult protein-peptide complex that could not be solved using any energy-based scoring functions. The second part of this work presents BALL (Biochemical ALgorithms Library), a framework for Rapid Application Development in the field of Molecular Modeling. BALL provides an extensive set of data structures as well as classes for Molecular Mechanics, advanced solvation methods, comparison and analysis of protein structures, file import/export, NMR shift prediction, and visualization. BALL has been carefully designed to be robust, easy to use, and open to extensions. Especially its extensibility, which results from an object-oriented and generic programming approach, distinguishes it from other software packages.Der erste Teil dieser Arbeit beschĂ€ftigt sich mit neuen AnsĂ€tzen zum Proteindocking. ZunĂ€chst stellen wir zwei AnsĂ€tze zum Proteindocking mit flexiblen Seitenketten vor. Der erste Ansatz beruht auf einer schnellen, gierigen Heuristik, wĂ€hrend der zweite Ansatz auf branch-&-cut-Techniken beruht und das Problem optimal lösen kann. Beide AnsĂ€tze sind in der Lage die korrekte Komplexstruktur fĂŒr einen Satz von Testbeispielen (bestehend aus Protease-Inhibitor-Komplexen) vorherzusagen. Ein weiteres, grösstenteils ungelöstes, Problem ist der letzte Schritt vieler Protein-Docking-Algorithmen, die Vorhersage der freien Bindungsenthalpie. Daher schlagen wir eine neue Methode vor, die die schwierige und aufwĂ€ndige Berechnung der freien Bindungsenthalpie vermeidet. Statt dessen wird eine Bewertungsfunktion eingesetzt, die auf der Ähnlichkeit der Protonen-Kernresonanzspektren der potentiellen Komplexstrukturen mit dem experimentellen Spektrum beruht. Mit dieser Methode konnten wir sogar die korrekte Struktur eines Protein-Peptid-Komplexes vorhersagen, an dessen Vorhersage energiebasierte Bewertungsfunktionen scheitern. Der zweite Teil der Arbeit stellt BALL (Biochemical ALgorithms Library) vor, ein Rahmenwerk zur schnellen Anwendungsentwicklung im Bereich MolecularModeling. BALL stellt eine Vielzahl von Datenstrukturen und Algorithmen fĂŒr die FelderMolekĂŒlmechanik,Vergleich und Analyse von Proteinstrukturen, Datei-Import und -Export, NMR-Shiftvorhersage und Visualisierung zur VerfĂŒgung. Beim Entwurf von BALL wurde auf Robustheit, einfache Benutzbarkeit und Erweiterbarkeit Wert gelegt. Von existierenden Software-Paketen hebt es sich vor allem durch seine Erweiterbarkeit ab, die auf der konsequenten Anwendung von objektorientierter und generischer Programmierung beruht
    • 

    corecore