Search CORE

187 research outputs found

Computational Approaches to Simulation and Analysis of Large Conformational Transitions in Proteins

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: In a typical living cell, millions to billions of proteins—nanomachines that fluctuate and cycle among many conformational states—convert available free energy into mechanochemical work. A fundamental goal of biophysics is to ascertain how 3D protein structures encode specific functions, such as catalyzing chemical reactions or transporting nutrients into a cell. Protein dynamics span femtosecond timescales (i.e., covalent bond oscillations) to large conformational transition timescales in, and beyond, the millisecond regime (e.g., glucose transport across a phospholipid bilayer). Actual transition events are fast but rare, occurring orders of magnitude faster than typical metastable equilibrium waiting times. Equilibrium molecular dynamics (EqMD) can capture atomistic detail and solute-solvent interactions, but even microseconds of sampling attainable nowadays still falls orders of magnitude short of transition timescales, especially for large systems, rendering observations of such "rare events" difficult or effectively impossible. Advanced path-sampling methods exploit reduced physical models or biasing to produce plausible transitions while balancing accuracy and efficiency, but quantifying their accuracy relative to other numerical and experimental data has been challenging. Indeed, new horizons in elucidating protein function necessitate that present methodologies be revised to more seamlessly and quantitatively integrate a spectrum of methods, both numerical and experimental. In this dissertation, experimental and computational methods are put into perspective using the enzyme adenylate kinase (AdK) as an illustrative example. We introduce Path Similarity Analysis (PSA)—an integrative computational framework developed to quantify transition path similarity. PSA not only reliably distinguished AdK transitions by the originating method, but also traced pathway differences between two methods back to charge-charge interactions (neglected by the stereochemical model, but not the all-atom force field) in several conserved salt bridges. Cryo-electron microscopy maps of the transporter Bor1p are directly incorporated into EqMD simulations using MD flexible fitting to produce viable structural models and infer a plausible transport mechanism. Conforming to the theme of integration, a short compendium of an exploratory project—developing a hybrid atomistic-continuum method—is presented, including initial results and a novel fluctuating hydrodynamics model and corresponding numerical code.Dissertation/ThesisDoctoral Dissertation Physics 201

ASU Digital Repository

STATISTICS IN THE BILLERA-HOLMES-VOGTMANN TREESPACE

Author: Weyenberg Grady S.
Publication venue: UKnowledge
Publication date: 01/01/2015
Field of study

This dissertation is an effort to adapt two classical non-parametric statistical techniques, kernel density estimation (KDE) and principal components analysis (PCA), to the Billera-Holmes-Vogtmann (BHV) metric space for phylogenetic trees. This adaption gives a more general framework for developing and testing various hypotheses about apparent differences or similarities between sets of phylogenetic trees than currently exists. For example, while the majority of gene histories found in a clade of organisms are expected to be generated by a common evolutionary process, numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from the histories of the majority of genes. Such “outlying” gene trees are considered to be biologically interesting and identifying these genes has become an important problem in phylogenetics. The R sofware package kdetrees, developed in Chapter 2, contains an implementation of the kernel density estimation method. The primary theoretical difficulty involved in this adaptation concerns the normalizion of the kernel functions in the BHV metric space. This problem is addressed in Chapter 3. In both chapters, the software package is applied to both simulated and empirical datasets to demonstrate the properties of the method. A few first theoretical steps in adaption of principal components analysis to the BHV space are presented in Chapter 4. It becomes necessary to generalize the notion of a set of perpendicular vectors in Euclidean space to the BHV metric space, but there some ambiguity about how to best proceed. We show that convex hulls are one reasonable approach to the problem. The Nye-PCA- algorithm provides a method of projecting onto arbitrary convex hulls in BHV space, providing the core of a modified PCA-type method

University of Kentucky

Tighter Connections Between Formula-SAT and Shaving Logs

Author: Abboud Amir
Bringmann Karl
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Publication date: 01/01/2018
Field of study

A noticeable fraction of Algorithms papers in the last few decades improve the running time of well-known algorithms for fundamental problems by logarithmic factors. For example, the

O(n^2)

dynamic programming solution to the Longest Common Subsequence problem (LCS) was improved to

O(n^2/\log^2 n)

in several ways and using a variety of ingenious tricks. This line of research, also known as "the art of shaving log factors", lacks a tool for proving negative results. Specifically, how can we show that it is unlikely that LCS can be solved in time

O(n^2/\log^3 n)

? Perhaps the only approach for such results was suggested in a recent paper of Abboud, Hansen, Vassilevska W. and Williams (STOC'16). The authors blame the hardness of shaving logs on the hardness of solving satisfiability on Boolean formulas (Formula-SAT) faster than exhaustive search. They show that an

O(n^2/\log^{1000} n)

algorithm for LCS would imply a major advance in circuit lower bounds. Whether this approach can lead to tighter barriers was unclear. In this paper, we push this approach to its limit and, in particular, prove that a well-known barrier from complexity theory stands in the way for shaving five additional log factors for fundamental combinatorial problems. For LCS, regular expression pattern matching, as well as the Fr\'echet distance problem from Computational Geometry, we show that an

O(n^2/\log^{7+\varepsilon} n)

runtime would imply new Formula-SAT algorithms. Our main result is a reduction from SAT on formulas of size

s

over

n

variables to LCS on sequences of length

N=2^{n/2} \cdot s^{1+o(1)}

. Our reduction is essentially as efficient as possible, and it greatly improves the previously known reduction for LCS with

N=2^{n/2} \cdot s^c

, for some

c \geq 100

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

MPG.PuRe

Linear-time protein 3-D structure searching with insertions and deletions

Author: ACR Martin
AI Jewett
B Zhu
C Gergely
CH Chionh
D Bu
D Goldman
DG Corneil
DW Eggert
E Krissinel
F Zu-Kang
G Navarro
GH Golub
H Hasegawa
HA Kramers
HM Berman
I Eidhammer
IN Shindyalov
Jesper Jansson
JT Schwartz
KS Arun
Kunihiko Sadakane
L Holm
M Comin
M Shatsky
P Koehl
PG de Gennes
PJ Flory
RH Boyd
RH Lathrop
T Shibuya
T Shibuya
Tetsuo Shibuya
W Kabsch
W Kabsch
WR Taylor
Z Aung
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era. Results We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (<it>i.e.</it>, insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels <it>k </it>is bounded by a constant. Our algorithm solves the above problem for a query of size <it>m </it>and a database of size <it>N </it>in average-case <it>O</it>(<it>N</it>) time, whereas the time complexity of the previously best algorithm was <it>O</it>(<it>Nm</it><it>k</it>+1). Conclusions Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Fine-grained complexity and algorithm engineering of geometric similarity measures

Author: Nusser André
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2021
Field of study

Point sets and sequences are fundamental geometric objects that arise in any application that considers movement data, geometric shapes, and many more. A crucial task on these objects is to measure their similarity. Therefore, this thesis presents results on algorithms, complexity lower bounds, and algorithm engineering of the most important point set and sequence similarity measures like the Fréchet distance, the Fréchet distance under translation, and the Hausdorff distance under translation. As an extension to the mere computation of similarity, also the approximate near neighbor problem for the continuous Fréchet distance on time series is considered and matching upper and lower bounds are shown.Punktmengen und Sequenzen sind fundamentale geometrische Objekte, welche in vielen Anwendungen auftauchen, insbesondere in solchen die Bewegungsdaten, geometrische Formen, und ähnliche Daten verarbeiten. Ein wichtiger Bestandteil dieser Anwendungen ist die Berechnung der Ähnlichkeit von Objekten. Diese Dissertation präsentiert Resultate, genauer gesagt Algorithmen, untere Komplexitätsschranken und Algorithm Engineering der wichtigsten Ähnlichkeitsmaße für Punktmengen und Sequenzen, wie zum Beispiel Fréchetdistanz, Fréchetdistanz unter Translation und Hausdorffdistanz unter Translation. Als eine Erweiterung der bloßen Berechnung von Ähnlichkeit betrachten wir auch das Near Neighbor Problem für die kontinuierliche Fréchetdistanz auf Zeitfolgen und zeigen obere und untere Schranken dafür

Universaar

Acronym

MPG.PuRe

Computing the Fréchet distance between uncertain curves in one dimension.

Author: Buchin Kevin
Löffler Maarten
Ophelders Tim
Popov Aleksandr
Urhausen Jérôme
Verbeek Kevin
Publication venue
Publication date: 01/02/2023
Field of study

We consider the problem of computing the Fréchet distance between two curves for which the exact locations of the vertices are unknown. Each vertex may be placed in a given uncertainty region for that vertex, and the objective is to place vertices so as to minimise the Fréchet distance. This problem was recently shown to be NP-hard in 2D, and it is unclear how to compute an optimal vertex placement at all. We present the first general algorithmic framework for this problem. We prove that it results in a polynomial-time algorithm for curves in 1D with intervals as uncertainty regions. In contrast, we show that the problem is NP-hard in 1D in the case that vertices are placed to maximise the Fréchet distance. We also study the weak Fréchet distance between uncertain curves. While finding the optimal placement of vertices seems more difficult than the regular Fréchet distance—and indeed we can easily prove that the problem is NP-hard in 2D—the optimal placement of vertices in 1D can be computed in polynomial time. Finally, we investigate the discrete weak Fréchet distance, for which, somewhat surprisingly, the problem is NP-hard already in 1D

Pure OAI Repository

Utrecht University Repository

Geometrical Road Segmentation and Clustering

Author: KONSTANTAKIS DIMITRIOS
ΚΩΝΣΤΑΝΤΑΚΗΣ ΔΗΜΗΤΡΙΟΣ
Publication venue
Publication date: 01/01/2018
Field of study

Η βασισμένη σε περιφέρειες ανάλυση είναι θεμελιώδης και κρίσιμη σε πολλές εφαρμογές και ερευνητικά θέματα που σχετίζονται με τη γεωγραφική περιοχή, όπως η ανάλυση κυκλοφορίας,η μελέτη της ανθρώπινης κινητικότητας και η πολεοδομία. Η παρούσα διατριβή εξετάζει διάφορες μεθόδους για τον κατακερματισμό των δρόμων και τις δομές που μπορούν να προσδιορίσουν τις ομοιότητες και τη μορφολογία των παραγόμενων τμημάτων. Για την επίτευξη αυτών των στόχων, διεξήχθη μια ερευνητική μελέτη για τον εντοπισμό πιθανών τρόπων που μπορούν να οδηγήσουν σε μια επιτυχημένη διαίρεση. Σε σύγκριση με προηγούμενες μελέτες που επικεντρώνονται στην κατάτμηση των τροχιών του οδοστρώματος, σε αυτήν την έρευνα η τμηματοποίηση των οδών υποστηρίζεται από την παρακολούθηση των διασταυρώσεων και την μεταβολή της καμπυλότητας μεταξύ των οδών. Δομές δεδομένων, όπως οι πίνακες κατακερματισμού ή τα σύνολα αντικειμένων, υλοποιήθηκαν προκειμένου να διαιρεθούν τα οδικά τμήματα. Τα βασικά κριτήρια της σύγκρισης των οδών αναδεικνύονται από την εφαρμογή του κατακερματισμού σε τοπικό επίπεδο και την ανάλυση συμπλέγματος. Επιπλέον, κατά τη διαδικασία της ευθυγράμμισης των τμημάτων με μεταφορά και περιστροφή, σχεδιάσαμε ένα σύνολο προτεινόμενων μεθόδων που εξετάζουν την απόκλιση στη μορφολογία τους. Τέλος, διεξήχθησαν διάφορα πειράματα, τα οποία ανακτούν τα τμήματα, διαιρώντας τους δρόμους και καθορίζοντας τον κατάλληλο ευρετικό μηχανισμό για κάθε ταξινόμηση. Συγκρίναμε τα ευρήματα από τα πειράματά μας και καταλήξαμε στο συμπέρασμα ότι τα καλύτερα αποτελέσματα για οδούς υψηλής απόδοσης επιτεύχθηκαν όταν εφαρμόστηκε η τμηματοποίηση με βάση τις διασταυρώσεις. Για δρόμους χαμηλής απόδοσης ή συνδέσμους, η ευρετική με βάση την καμπυλότητα ήταν αυτή που προσέφερε τα καλύτερα αποτελέσματα.Region-based analysis is fundamental and crucial in many geospatial-related applications and research themes, such as traffic analysis, human mobility study and urban planning.The current thesis examines various methods for road segmentation and structures that can identify the similarities and the morphology of the generated segments. To achieve these tasks, a research study was conducted for the detection of possible ways that can lead to a prosperous division. Compared to previous studies that focus on segmenting the roads trajectories, in this research the segmentation of roads is supported by tracking the junctions and the variation of curvature among the roads. Data structures, such as hash tables or sets of objects were implemented in order to parcel segments out. The basic criteria of road comparison are stirred up by the application of locality-sensitive hashing and cluster analysis. Moreover, in the process of segments alignment by translation and rotation, we designed a set of proposed methods that examine the deviation in their morphology. Finally, a number of experiments, that retrieve the segments by dividing the roads and determine the suitable heuristic for each classification, was conducted. We compared the findings from our experiments and we concluded that the best results for high-performance roads were achieved when segmentation by junctions was applied. For low-performance or link roads, the curvature heuristic was the one that offered the best results

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Efficient Fréchet distance queries for segments

Author: Buchin Maike
Chechik Shiri
Geometric Computing
Herman Grzegorz
Navarro Gonzalo
Ophelders Tim
Rotenberg Eva
Schlipf Lena
Silveira Rodrigo I.
Staals Frank
Sub Geometric Computing
van der Hoog Ivor
Publication venue
Publication date: 03/03/2022
Field of study

We study the problem of constructing a data structure that can store a two-dimensional polygonal curve P, such that for any query segment ab one can efficiently compute the Fréchet distance between P and ab. First we present a data structure of size O(n log n) that can compute the Fréchet distance between P and a horizontal query segment ab in O(log n) time, where n is the number of vertices of P. In comparison to prior work, this significantly reduces the required space. We extend the type of queries allowed, as we allow a query to be a horizontal segment ab together with two points s, t ∈ P (not necessarily vertices), and ask for the Fréchet distance between ab and the curve of P in between s and t. Using O(n log2 n) storage, such queries take O(log3 n) time, simplifying and significantly improving previous results. We then generalize our results to query segments of arbitrary orientation. We present an O(nk3+ϵ + n2) size data structure, where k ∈ [1, n] is a parameter the user can choose, and ϵ > 0 is an arbitrarily small constant, such that given any segment ab and two points s, t ∈ P we can compute the Fréchet distance between ab and the curve of P in between s and t in O((n/k) log2 n + log4 n) time. This is the first result that allows efficient exact Fréchet distance queries for arbitrarily oriented segments. We also present two applications of our data structure. First, we show that our data structure allows us to compute a local δ-simplification (with respect to the Fréchet distance) of a polygonal curve in O(n5/2+ϵ) time, improving a previous O(n3) time algorithm. Second, we show that we can efficiently find a translation of an arbitrary query segment ab that minimizes the Fréchet distance with respect to a subcurve of P

arXiv.org e-Print Archive

Utrecht University Repository