187 research outputs found

    Computational Approaches to Simulation and Analysis of Large Conformational Transitions in Proteins

    Get PDF
    abstract: In a typical living cell, millions to billions of proteins—nanomachines that fluctuate and cycle among many conformational states—convert available free energy into mechanochemical work. A fundamental goal of biophysics is to ascertain how 3D protein structures encode specific functions, such as catalyzing chemical reactions or transporting nutrients into a cell. Protein dynamics span femtosecond timescales (i.e., covalent bond oscillations) to large conformational transition timescales in, and beyond, the millisecond regime (e.g., glucose transport across a phospholipid bilayer). Actual transition events are fast but rare, occurring orders of magnitude faster than typical metastable equilibrium waiting times. Equilibrium molecular dynamics (EqMD) can capture atomistic detail and solute-solvent interactions, but even microseconds of sampling attainable nowadays still falls orders of magnitude short of transition timescales, especially for large systems, rendering observations of such "rare events" difficult or effectively impossible. Advanced path-sampling methods exploit reduced physical models or biasing to produce plausible transitions while balancing accuracy and efficiency, but quantifying their accuracy relative to other numerical and experimental data has been challenging. Indeed, new horizons in elucidating protein function necessitate that present methodologies be revised to more seamlessly and quantitatively integrate a spectrum of methods, both numerical and experimental. In this dissertation, experimental and computational methods are put into perspective using the enzyme adenylate kinase (AdK) as an illustrative example. We introduce Path Similarity Analysis (PSA)—an integrative computational framework developed to quantify transition path similarity. PSA not only reliably distinguished AdK transitions by the originating method, but also traced pathway differences between two methods back to charge-charge interactions (neglected by the stereochemical model, but not the all-atom force field) in several conserved salt bridges. Cryo-electron microscopy maps of the transporter Bor1p are directly incorporated into EqMD simulations using MD flexible fitting to produce viable structural models and infer a plausible transport mechanism. Conforming to the theme of integration, a short compendium of an exploratory project—developing a hybrid atomistic-continuum method—is presented, including initial results and a novel fluctuating hydrodynamics model and corresponding numerical code.Dissertation/ThesisDoctoral Dissertation Physics 201

    STATISTICS IN THE BILLERA-HOLMES-VOGTMANN TREESPACE

    Get PDF
    This dissertation is an effort to adapt two classical non-parametric statistical techniques, kernel density estimation (KDE) and principal components analysis (PCA), to the Billera-Holmes-Vogtmann (BHV) metric space for phylogenetic trees. This adaption gives a more general framework for developing and testing various hypotheses about apparent differences or similarities between sets of phylogenetic trees than currently exists. For example, while the majority of gene histories found in a clade of organisms are expected to be generated by a common evolutionary process, numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from the histories of the majority of genes. Such “outlying” gene trees are considered to be biologically interesting and identifying these genes has become an important problem in phylogenetics. The R sofware package kdetrees, developed in Chapter 2, contains an implementation of the kernel density estimation method. The primary theoretical difficulty involved in this adaptation concerns the normalizion of the kernel functions in the BHV metric space. This problem is addressed in Chapter 3. In both chapters, the software package is applied to both simulated and empirical datasets to demonstrate the properties of the method. A few first theoretical steps in adaption of principal components analysis to the BHV space are presented in Chapter 4. It becomes necessary to generalize the notion of a set of perpendicular vectors in Euclidean space to the BHV metric space, but there some ambiguity about how to best proceed. We show that convex hulls are one reasonable approach to the problem. The Nye-PCA- algorithm provides a method of projecting onto arbitrary convex hulls in BHV space, providing the core of a modified PCA-type method

    Tighter Connections Between Formula-SAT and Shaving Logs

    Get PDF
    A noticeable fraction of Algorithms papers in the last few decades improve the running time of well-known algorithms for fundamental problems by logarithmic factors. For example, the O(n2)O(n^2) dynamic programming solution to the Longest Common Subsequence problem (LCS) was improved to O(n2/log2n)O(n^2/\log^2 n) in several ways and using a variety of ingenious tricks. This line of research, also known as "the art of shaving log factors", lacks a tool for proving negative results. Specifically, how can we show that it is unlikely that LCS can be solved in time O(n2/log3n)O(n^2/\log^3 n)? Perhaps the only approach for such results was suggested in a recent paper of Abboud, Hansen, Vassilevska W. and Williams (STOC'16). The authors blame the hardness of shaving logs on the hardness of solving satisfiability on Boolean formulas (Formula-SAT) faster than exhaustive search. They show that an O(n2/log1000n)O(n^2/\log^{1000} n) algorithm for LCS would imply a major advance in circuit lower bounds. Whether this approach can lead to tighter barriers was unclear. In this paper, we push this approach to its limit and, in particular, prove that a well-known barrier from complexity theory stands in the way for shaving five additional log factors for fundamental combinatorial problems. For LCS, regular expression pattern matching, as well as the Fr\'echet distance problem from Computational Geometry, we show that an O(n2/log7+εn)O(n^2/\log^{7+\varepsilon} n) runtime would imply new Formula-SAT algorithms. Our main result is a reduction from SAT on formulas of size ss over nn variables to LCS on sequences of length N=2n/2s1+o(1)N=2^{n/2} \cdot s^{1+o(1)}. Our reduction is essentially as efficient as possible, and it greatly improves the previously known reduction for LCS with N=2n/2scN=2^{n/2} \cdot s^c, for some c100c \geq 100

    Linear-time protein 3-D structure searching with insertions and deletions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era.</p> <p>Results</p> <p>We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (<it>i.e.</it>, insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels <it>k </it>is bounded by a constant. Our algorithm solves the above problem for a query of size <it>m </it>and a database of size <it>N </it>in average-case <it>O</it>(<it>N</it>) time, whereas the time complexity of the previously best algorithm was <it>O</it>(<it>Nm</it><sup><it>k</it>+1</sup>).</p> <p>Conclusions</p> <p>Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.</p

    Fine-grained complexity and algorithm engineering of geometric similarity measures

    Get PDF
    Point sets and sequences are fundamental geometric objects that arise in any application that considers movement data, geometric shapes, and many more. A crucial task on these objects is to measure their similarity. Therefore, this thesis presents results on algorithms, complexity lower bounds, and algorithm engineering of the most important point set and sequence similarity measures like the Fréchet distance, the Fréchet distance under translation, and the Hausdorff distance under translation. As an extension to the mere computation of similarity, also the approximate near neighbor problem for the continuous Fréchet distance on time series is considered and matching upper and lower bounds are shown.Punktmengen und Sequenzen sind fundamentale geometrische Objekte, welche in vielen Anwendungen auftauchen, insbesondere in solchen die Bewegungsdaten, geometrische Formen, und ähnliche Daten verarbeiten. Ein wichtiger Bestandteil dieser Anwendungen ist die Berechnung der Ähnlichkeit von Objekten. Diese Dissertation präsentiert Resultate, genauer gesagt Algorithmen, untere Komplexitätsschranken und Algorithm Engineering der wichtigsten Ähnlichkeitsmaße für Punktmengen und Sequenzen, wie zum Beispiel Fréchetdistanz, Fréchetdistanz unter Translation und Hausdorffdistanz unter Translation. Als eine Erweiterung der bloßen Berechnung von Ähnlichkeit betrachten wir auch das Near Neighbor Problem für die kontinuierliche Fréchetdistanz auf Zeitfolgen und zeigen obere und untere Schranken dafür

    Computing the Fréchet distance between uncertain curves in one dimension.

    Get PDF
    We consider the problem of computing the Fréchet distance between two curves for which the exact locations of the vertices are unknown. Each vertex may be placed in a given uncertainty region for that vertex, and the objective is to place vertices so as to minimise the Fréchet distance. This problem was recently shown to be NP-hard in 2D, and it is unclear how to compute an optimal vertex placement at all. We present the first general algorithmic framework for this problem. We prove that it results in a polynomial-time algorithm for curves in 1D with intervals as uncertainty regions. In contrast, we show that the problem is NP-hard in 1D in the case that vertices are placed to maximise the Fréchet distance. We also study the weak Fréchet distance between uncertain curves. While finding the optimal placement of vertices seems more difficult than the regular Fréchet distance—and indeed we can easily prove that the problem is NP-hard in 2D—the optimal placement of vertices in 1D can be computed in polynomial time. Finally, we investigate the discrete weak Fréchet distance, for which, somewhat surprisingly, the problem is NP-hard already in 1D

    Geometrical Road Segmentation and Clustering

    Get PDF
    Η βασισμένη σε περιφέρειες ανάλυση είναι θεμελιώδης και κρίσιμη σε πολλές εφαρμογές και ερευνητικά θέματα που σχετίζονται με τη γεωγραφική περιοχή, όπως η ανάλυση κυκλοφορίας,η μελέτη της ανθρώπινης κινητικότητας και η πολεοδομία. Η παρούσα διατριβή εξετάζει διάφορες μεθόδους για τον κατακερματισμό των δρόμων και τις δομές που μπορούν να προσδιορίσουν τις ομοιότητες και τη μορφολογία των παραγόμενων τμημάτων. Για την επίτευξη αυτών των στόχων, διεξήχθη μια ερευνητική μελέτη για τον εντοπισμό πιθανών τρόπων που μπορούν να οδηγήσουν σε μια επιτυχημένη διαίρεση. Σε σύγκριση με προηγούμενες μελέτες που επικεντρώνονται στην κατάτμηση των τροχιών του οδοστρώματος, σε αυτήν την έρευνα η τμηματοποίηση των οδών υποστηρίζεται από την παρακολούθηση των διασταυρώσεων και την μεταβολή της καμπυλότητας μεταξύ των οδών. Δομές δεδομένων, όπως οι πίνακες κατακερματισμού ή τα σύνολα αντικειμένων, υλοποιήθηκαν προκειμένου να διαιρεθούν τα οδικά τμήματα. Τα βασικά κριτήρια της σύγκρισης των οδών αναδεικνύονται από την εφαρμογή του κατακερματισμού σε τοπικό επίπεδο και την ανάλυση συμπλέγματος. Επιπλέον, κατά τη διαδικασία της ευθυγράμμισης των τμημάτων με μεταφορά και περιστροφή, σχεδιάσαμε ένα σύνολο προτεινόμενων μεθόδων που εξετάζουν την απόκλιση στη μορφολογία τους. Τέλος, διεξήχθησαν διάφορα πειράματα, τα οποία ανακτούν τα τμήματα, διαιρώντας τους δρόμους και καθορίζοντας τον κατάλληλο ευρετικό μηχανισμό για κάθε ταξινόμηση. Συγκρίναμε τα ευρήματα από τα πειράματά μας και καταλήξαμε στο συμπέρασμα ότι τα καλύτερα αποτελέσματα για οδούς υψηλής απόδοσης επιτεύχθηκαν όταν εφαρμόστηκε η τμηματοποίηση με βάση τις διασταυρώσεις. Για δρόμους χαμηλής απόδοσης ή συνδέσμους, η ευρετική με βάση την καμπυλότητα ήταν αυτή που προσέφερε τα καλύτερα αποτελέσματα.Region-based analysis is fundamental and crucial in many geospatial-related applications and research themes, such as traffic analysis, human mobility study and urban planning.The current thesis examines various methods for road segmentation and structures that can identify the similarities and the morphology of the generated segments. To achieve these tasks, a research study was conducted for the detection of possible ways that can lead to a prosperous division. Compared to previous studies that focus on segmenting the roads trajectories, in this research the segmentation of roads is supported by tracking the junctions and the variation of curvature among the roads. Data structures, such as hash tables or sets of objects were implemented in order to parcel segments out. The basic criteria of road comparison are stirred up by the application of locality-sensitive hashing and cluster analysis. Moreover, in the process of segments alignment by translation and rotation, we designed a set of proposed methods that examine the deviation in their morphology. Finally, a number of experiments, that retrieve the segments by dividing the roads and determine the suitable heuristic for each classification, was conducted. We compared the findings from our experiments and we concluded that the best results for high-performance roads were achieved when segmentation by junctions was applied. For low-performance or link roads, the curvature heuristic was the one that offered the best results

    Efficient Fréchet distance queries for segments

    Get PDF
    We study the problem of constructing a data structure that can store a two-dimensional polygonal curve P, such that for any query segment ab one can efficiently compute the Fréchet distance between P and ab. First we present a data structure of size O(n log n) that can compute the Fréchet distance between P and a horizontal query segment ab in O(log n) time, where n is the number of vertices of P. In comparison to prior work, this significantly reduces the required space. We extend the type of queries allowed, as we allow a query to be a horizontal segment ab together with two points s, t ∈ P (not necessarily vertices), and ask for the Fréchet distance between ab and the curve of P in between s and t. Using O(n log2 n) storage, such queries take O(log3 n) time, simplifying and significantly improving previous results. We then generalize our results to query segments of arbitrary orientation. We present an O(nk3+ϵ + n2) size data structure, where k ∈ [1, n] is a parameter the user can choose, and ϵ > 0 is an arbitrarily small constant, such that given any segment ab and two points s, t ∈ P we can compute the Fréchet distance between ab and the curve of P in between s and t in O((n/k) log2 n + log4 n) time. This is the first result that allows efficient exact Fréchet distance queries for arbitrarily oriented segments. We also present two applications of our data structure. First, we show that our data structure allows us to compute a local δ-simplification (with respect to the Fréchet distance) of a polygonal curve in O(n5/2+ϵ) time, improving a previous O(n3) time algorithm. Second, we show that we can efficiently find a translation of an arbitrary query segment ab that minimizes the Fréchet distance with respect to a subcurve of P
    corecore