872 research outputs found

    The ARTS web server for aligning RNA tertiary structures

    Get PDF
    RNA molecules with common structural features may share similar functional properties. Structural comparison of RNAs and detection of common substructures is, thus, a highly important task. Nevertheless, the current available tools in the RNA community provide only a partial solution, since they either work at the 2D level or are suitable for detecting predefined or local contiguous tertiary motifs only. Here, we describe a web server built around ARTS, a method for aligning tertiary structures of nucleic acids (both RNA and DNA). ARTS receives a pair of 3D nucleic acid structures and searches for a priori unknown common substructures. The search is truly 3D and irrespective of the order of the nucleotides on the chain. The identified common substructures can be large global folds with hundreds and even thousands of nucleotides as well as small local motifs with at least two successive base pairs. The method is highly efficient and has been used to conduct an all-against-all comparison of all the RNA structures in the Protein Data Bank. The web server together with a software package for download are freely accessible at

    Learning Discriminative Stein Kernel for SPD Matrices and Its Applications

    Full text link
    Stein kernel has recently shown promising performance on classifying images represented by symmetric positive definite (SPD) matrices. It evaluates the similarity between two SPD matrices through their eigenvalues. In this paper, we argue that directly using the original eigenvalues may be problematic because: i) Eigenvalue estimation becomes biased when the number of samples is inadequate, which may lead to unreliable kernel evaluation; ii) More importantly, eigenvalues only reflect the property of an individual SPD matrix. They are not necessarily optimal for computing Stein kernel when the goal is to discriminate different sets of SPD matrices. To address the two issues in one shot, we propose a discriminative Stein kernel, in which an extra parameter vector is defined to adjust the eigenvalues of the input SPD matrices. The optimal parameter values are sought by optimizing a proxy of classification performance. To show the generality of the proposed method, three different kernel learning criteria that are commonly used in the literature are employed respectively as a proxy. A comprehensive experimental study is conducted on a variety of image classification tasks to compare our proposed discriminative Stein kernel with the original Stein kernel and other commonly used methods for evaluating the similarity between SPD matrices. The experimental results demonstrate that, the discriminative Stein kernel can attain greater discrimination and better align with classification tasks by altering the eigenvalues. This makes it produce higher classification performance than the original Stein kernel and other commonly used methods.Comment: 13 page

    Effective 3D Geometric Matching for Data Restoration and Its Forensic Application

    Get PDF
    3D geometric matching is the technique to detect the similar patterns among multiple objects. It is an important and fundamental problem and can facilitate many tasks in computer graphics and vision, including shape comparison and retrieval, data fusion, scene understanding and object recognition, and data restoration. For example, 3D scans of an object from different angles are matched and stitched together to form the complete geometry. In medical image analysis, the motion of deforming organs is modeled and predicted by matching a series of CT images. This problem is challenging and remains unsolved, especially when the similar patterns are 1) small and lack geometric saliency; 2) incomplete due to the occlusion of the scanning and damage of the data. We study the reliable matching algorithm that can tackle the above difficulties and its application in data restoration. Data restoration is the problem to restore the fragmented or damaged model to its original complete state. It is a new area and has direct applications in many scientific fields such as Forensics and Archeology. In this dissertation, we study novel effective geometric matching algorithms, including curve matching, surface matching, pairwise matching, multi-piece matching and template matching. We demonstrate its applications in an integrated digital pipeline of skull reassembly, skull completion, and facial reconstruction, which is developed to facilitate the state-of-the-art forensic skull/facial reconstruction processing pipeline in law enforcement

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task

    Geometric, Feature-based and Graph-based Approaches for the Structural Analysis of Protein Binding Sites : Novel Methods and Computational Analysis

    Get PDF
    In this thesis, protein binding sites are considered. To enable the extraction of information from the space of protein binding sites, these binding sites must be mapped onto a mathematical space. This can be done by mapping binding sites onto vectors, graphs or point clouds. To finally enable a structure on the mathematical space, a distance measure is required, which is introduced in this thesis. This distance measure eventually can be used to extract information by means of data mining techniques

    Euclidean distance geometry and applications

    Full text link
    Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure

    Decomposing and packing polygons / Dania el-Khechen.

    Get PDF
    In this thesis, we study three different problems in the field of computational geometry: the partitioning of a simple polygon into two congruent components, the partitioning of squares and rectangles into equal area components while minimizing the perimeter of the cuts, and the packing of the maximum number of squares in an orthogonal polygon. To solve the first problem, we present three polynomial time algorithms which given a simple polygon P partitions it, if possible, into two congruent and possibly nonsimple components P 1 and P 2 : an O ( n 2 log n ) time algorithm for properly congruent components and an O ( n 3 ) time algorithm for mirror congruent components. In our analysis of the second problem, we experimentally find new bounds on the optimal partitions of squares and rectangles into equal area components. The visualization of the best determined solutions allows us to conjecture some characteristics of a class of optimal solutions. Finally, for the third problem, we present three linear time algorithms for packing the maximum number of unit squares in three subclasses of orthogonal polygons: the staircase polygons, the pyramids and Manhattan skyline polygons. We also study a special case of the problem where the given orthogonal polygon has vertices with integer coordinates and the squares to pack are (2 {604} 2) squares. We model the latter problem with a binary integer program and we develop a system that produces and visualizes optimal solutions. The observation of such solutions aided us in proving some characteristics of a class of optimal solutions

    Alignment-free local structural search by writhe decomposition

    Get PDF
    Motivation: Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare two structures by measuring distance between their projected points. These methods offer a tremendous increase in speed over residue-level structural alignment methods. However, current projection methods are not practical, partly because they are unable to identify local similarities
    corecore