483,960 research outputs found

    New error measures and methods for realizing protein graphs from distance data

    Full text link
    The interval Distance Geometry Problem (iDGP) consists in finding a realization in RK\mathbb{R}^K of a simple undirected graph G=(V,E)G=(V,E) with nonnegative intervals assigned to the edges in such a way that, for each edge, the Euclidean distance between the realization of the adjacent vertices is within the edge interval bounds. In this paper, we focus on the application to the conformation of proteins in space, which is a basic step in determining protein function: given interval estimations of some of the inter-atomic distances, find their shape. Among different families of methods for accomplishing this task, we look at mathematical programming based methods, which are well suited for dealing with intervals. The basic question we want to answer is: what is the best such method for the problem? The most meaningful error measure for evaluating solution quality is the coordinate root mean square deviation. We first introduce a new error measure which addresses a particular feature of protein backbones, i.e. many partial reflections also yield acceptable backbones. We then present a set of new and existing quadratic and semidefinite programming formulations of this problem, and a set of new and existing methods for solving these formulations. Finally, we perform a computational evaluation of all the feasible solver++formulation combinations according to new and existing error measures, finding that the best methodology is a new heuristic method based on multiplicative weights updates

    Polynomial-Time Amoeba Neighborhood Membership and Faster Localized Solving

    Full text link
    We derive efficient algorithms for coarse approximation of algebraic hypersurfaces, useful for estimating the distance between an input polynomial zero set and a given query point. Our methods work best on sparse polynomials of high degree (in any number of variables) but are nevertheless completely general. The underlying ideas, which we take the time to describe in an elementary way, come from tropical geometry. We thus reduce a hard algebraic problem to high-precision linear optimization, proving new upper and lower complexity estimates along the way.Comment: 15 pages, 9 figures. Submitted to a conference proceeding

    Geometric Methods in Machine Learning and Data Mining

    Get PDF
    In machine learning, the standard goal of is to find an appropriate statistical model from a model space based on the training data from a data space; while in data mining, the goal is to find interesting patterns in the data from a data space. In both fields, these spaces carry geometric structures that can be exploited using methods that make use of these geometric structures (we shall call them geometric methods), or the problems themselves can be formulated in a way that naturally appeal to these methods. In such cases, studying these geometric structures and then using appropriate geometric methods not only gives insight into existing algorithms, but also helps build new and better algorithms. In my research, I develop methods that exploit geometric structure of problems for a variety of machine learning and data mining problems, and provide strong theoretical and empirical evidence in favor of using them. My dissertation is divided into two parts. In the first part, I develop algorithms to solve a well known problem in data mining i.e. distance embedding problem. In particular, I use tools from computational geometry to build a unified framework for solving a distance embedding problem known as multidimensional scaling (MDS). This geometry-inspired framework results in algorithms that can solve different variants of MDS better than previous state-of-the-art methods. In addition, these algorithms come with many other attractive properties: they are simple, intuitive, easily parallelizable, scalable, and can handle missing data. Furthermore, I extend my unified MDS framework to build scalable algorithms for dimensionality reduction, and also to solve a sensor network localization problem for mobile sensors. Experimental results show the effectiveness of this framework across all problems. In the second part of my dissertation, I turn to problems in machine learning, in particular, use geometry to reason about conjugate priors, develop a model that hybridizes between discriminative and generative frameworks, and build a new set of generative-process-driven kernels. More specifically, this part of my dissertation is devoted to the study of the geometry of the space of probabilistic models associated with statistical generative processes. This study --- based on the theory well grounded in information geometry --- allows me to reason about the appropriateness of conjugate priors from a geometric perspective, and hence gain insight into the large number of existing models that rely on these priors. Furthermore, I use this study to build hybrid models more naturally i.e., by combining discriminative and generative methods using the geometry underlying them, and also to build a family of kernels called generative kernels that can be used as off-the-shelf tool in any kernel learning method such as support vector machines. My experiments of generative kernels demonstrate their effectiveness providing further evidence in favor of using geometric methods

    Protein structure determination via an efficient geometric build-up algorithm

    Get PDF
    Abstract Background A protein structure can be determined by solving a so-called distance geometry problem whenever a set of inter-atomic distances is available and sufficient. However, the problem is intractable in general and has proved to be a NP hard problem. An updated geometric build-up algorithm (UGB) has been developed recently that controls numerical errors and is efficient in protein structure determination for cases where only sparse exact distance data is available. In this paper, the UGB method has been improved and revised with aims at solving distance geometry problems more efficiently and effectively. Methods An efficient algorithm (called the revised updated geometric build-up algorithm (RUGB)) to build up a protein structure from atomic distance data is presented and provides an effective way of determining a protein structure with sparse exact distance data. In the algorithm, the condition to determine an unpositioned atom iteratively is relaxed (when compared with the UGB algorithm) and data structure techniques are used to make the algorithm more efficient and effective. The algorithm is tested on a set of proteins selected randomly from the Protein Structure Database-PDB. Results We test a set of proteins selected randomly from the Protein Structure Database-PDB. We show that the numerical errors produced by the new RUGB algorithm are smaller when compared with the errors of the UGB algorithm and that the novel RUGB algorithm has a significantly smaller runtime than the UGB algorithm. Conclusions The RUGB algorithm relaxes the condition for updating and incorporates the data structure for accessing neighbours of an atom. The revisions result in an improvement over the UGB algorithm in two important areas: a reduction on the overall runtime and decrease of the numeric error.Peer Reviewe

    The Friendly Settlement of Human Rights Abuses in the Americas

    Get PDF
    We present a new method for estimation of seismic coda shape. It falls into the same class of methods as non-parametric shape reconstruction with the use of neural network techniques where data are split into a training and validation data sets. We particularly pursue the well-known problem of image reconstruction formulated in this case as shape isolation in the presence of a broadly defined noise. This combined approach is enabled by the intrinsic feature of seismogram which can be divided objectively into a pre-signal seismic noise with lack of the target shape, and the remainder that contains scattered waveforms compounding the coda shape. In short, we separately apply shape restoration procedure to pre-signal seismic noise and the event record, which provides successful delineation of the coda shape in the form of a smooth almost non-oscillating function of time. The new algorithm uses a recently developed generalization of classical computational-geometry tool of alpha-shape. The generalization essentially yields robust shape estimation by ignoring locally a number of points treated as extreme values, noise or non-relevant data. Our algorithm is conceptually simple and enables the desired or pre-determined level of shape detail, constrainable by an arbitrary data fit criteria. The proposed tool for coda shape delineation provides an alternative to moving averaging and/or other smoothing techniques frequently used for this purpose. The new algorithm is illustrated with an application to the problem of estimating the coda duration after a local event. The obtained relation coefficient between coda duration and epicentral distance is consistent with the earlier findings in the region of interest
    corecore