1,707 research outputs found
NASSAM: a server to search for and annotate tertiary interactions and motifs in three-dimensional structures of complex RNA molecules
Similarities in the 3D patterns of RNA base interactions or arrangements can provide insights into their functions and roles in stabilization of the RNA 3D structure. Nucleic Acids Search for Substructures and Motifs (NASSAM) is a graph theoretical program that can search for 3D patterns of base arrangements by representing the bases as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. The input files for NASSAM are PDB formatted 3D coordinates. This web server can be used to identify matches of base arrangement patterns in a query structure to annotated patterns that have been reported in the literature or that have possible functional and structural stabilization implications. The NASSAM program is freely accessible without any login requirement at http://mfrlab.org/grafss/nassam/
Visual and computational analysis of structure-activity relationships in high-throughput screening data
Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets
A data science approach to pattern discovery in complex structures with applications in bioinformatics
Pattern discovery aims to find interesting, non-trivial, implicit, previously unknown and potentially useful patterns in data. This dissertation presents a data science approach for discovering patterns or motifs from complex structures, particularly complex RNA structures. RNA secondary and tertiary structure motifs are very important in biological molecules, which play multiple vital roles in cells. A lot of work has been done on RNA motif annotation. However, pattern discovery in RNA structure is less studied. In the first part of this dissertation, an ab initio algorithm, named DiscoverR, is introduced for pattern discovery in RNA secondary structures. This algorithm works by representing RNA secondary structures as ordered labeled trees and performs tree pattern discovery using a quadratic time dynamic programming algorithm. The algorithm is able to identify and extract the largest common substructures from two RNA molecules of different sizes, without prior knowledge of locations and topologies of these substructures.
One application of DiscoverR is to locate the RNA structural elements in genomes. Experimental results show that this tool complements the currently used approaches for mining conserved structural RNAs in the human genome. DiscoverR can also be extended to find repeated regions in an RNA secondary structure. Specifically, this extended method is used to detect structural repeats in the 3\u27-untranslated region of a protein kinase gene
Analysis of Three-Dimensional Protein Images
A fundamental goal of research in molecular biology is to understand protein
structure. Protein crystallography is currently the most successful method for
determining the three-dimensional (3D) conformation of a protein, yet it
remains labor intensive and relies on an expert's ability to derive and
evaluate a protein scene model. In this paper, the problem of protein structure
determination is formulated as an exercise in scene analysis. A computational
methodology is presented in which a 3D image of a protein is segmented into a
graph of critical points. Bayesian and certainty factor approaches are
described and used to analyze critical point graphs and identify meaningful
substructures, such as alpha-helices and beta-sheets. Results of applying the
methodologies to protein images at low and medium resolution are reported. The
research is related to approaches to representation, segmentation and
classification in vision, as well as to top-down approaches to protein
structure prediction.Comment: See http://www.jair.org/ for any accompanying file
The ARTS web server for aligning RNA tertiary structures
RNA molecules with common structural features may share similar functional properties. Structural comparison of RNAs and detection of common substructures is, thus, a highly important task. Nevertheless, the current available tools in the RNA community provide only a partial solution, since they either work at the 2D level or are suitable for detecting predefined or local contiguous tertiary motifs only. Here, we describe a web server built around ARTS, a method for aligning tertiary structures of nucleic acids (both RNA and DNA). ARTS receives a pair of 3D nucleic acid structures and searches for a priori unknown common substructures. The search is truly 3D and irrespective of the order of the nucleotides on the chain. The identified common substructures can be large global folds with hundreds and even thousands of nucleotides as well as small local motifs with at least two successive base pairs. The method is highly efficient and has been used to conduct an all-against-all comparison of all the RNA structures in the Protein Data Bank. The web server together with a software package for download are freely accessible at
Significant Subgraph Mining with Multiple Testing Correction
The problem of finding itemsets that are statistically significantly enriched
in a class of transactions is complicated by the need to correct for multiple
hypothesis testing. Pruning untestable hypotheses was recently proposed as a
strategy for this task of significant itemset mining. It was shown to lead to
greater statistical power, the discovery of more truly significant itemsets,
than the standard Bonferroni correction on real-world datasets. An open
question, however, is whether this strategy of excluding untestable hypotheses
also leads to greater statistical power in subgraph mining, in which the number
of hypotheses is much larger than in itemset mining. Here we answer this
question by an empirical investigation on eight popular graph benchmark
datasets. We propose a new efficient search strategy, which always returns the
same solution as the state-of-the-art approach and is approximately two orders
of magnitude faster. Moreover, we exploit the dependence between subgraphs by
considering the effective number of tests and thereby further increase the
statistical power.Comment: 18 pages, 5 figure, accepted to the 2015 SIAM International
Conference on Data Mining (SDM15
Mining Maximal Cliques from an Uncertain Graph
We consider mining dense substructures (maximal cliques) from an uncertain
graph, which is a probability distribution on a set of deterministic graphs.
For parameter 0 < {\alpha} < 1, we present a precise definition of an
{\alpha}-maximal clique in an uncertain graph. We present matching upper and
lower bounds on the number of {\alpha}-maximal cliques possible within an
uncertain graph. We present an algorithm to enumerate {\alpha}-maximal cliques
in an uncertain graph whose worst-case runtime is near-optimal, and an
experimental evaluation showing the practical utility of the algorithm.Comment: ICDE 201
- …