19 research outputs found

    fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences

    Get PDF
    There are abundance of transcripts that code for no particular protein and that remain functionally uncharacterized. Some of these transcripts may have novel functions while others might be junk transcripts. Unfortunately, the experimental validation of such transcripts to find functional non-coding RNA candidates is very costly. Therefore, our primary interest is to computationally mine candidate functional transcripts from a pool of uncharacterized transcripts. We introduce fRNAdb: a novel database service that hosts a large collection of non-coding transcripts including annotated/non-annotated sequences from the H-inv database, NONCODE and RNAdb. A set of computational analyses have been performed on the included sequences. These analyses include RNA secondary structure motif discovery, EST support evaluation, cis-regulatory element search, protein homology search, etc. fRNAdb provides an efficient interface to help users filter out particular transcripts under their own criteria to sort out functional RNA candidates. fRNAdb is available a

    Genome Analysis Idiographica: a general-purpose web application to build idiograms on-demand for human, mouse and rat

    No full text
    Summary: We have launched a web server which serves as a general purpose idiogram rendering service, and allows users to generate high-quality idiograms with custom annotation according to their own genome-wide mapping/annotation data through an easy-to-use interface. The generated idiograms are suitable not only for visualizing summaries of genome-wide analysis but also for many types of presentation material including web pages, conference posters, oral presentations, etc. Availability: Idiographica is freely available a

    2004/08/24 16:40 1 Protein Classification via Kernel Matrix Completion

    No full text
    The three-dimensional structure of a protein provides crucial information for predictin

    Pages 1–8 Marginalized Kernels for Biological Sequences

    No full text
    Motivation: Kernel methods such as support vector machines require a kernel function between objects to be defined a priori. Several works have been done to derive kernels from probability distributions, e.g. the Fisher kernel. However, a general methodology to design a kernel is not fully developed. Results: We propose a reasonable way of designing a kernel when objects are generated from latent variable models (e.g. HMM). First of all, a joint kernel is designed for complete data which include both visible and hidden variables. Then a marginalized kernel for visible data is obtained by taking the expectation with respect to hidden variables. We will show that the Fisher kernel is a special case of marginalized kernels, which gives another viewpoint to the Fisher kernel theory. Although our approach can be applied to any object, we particularly derive several marginalized kernels useful for biological sequences (e.g. DNA and proteins). The effectiveness of marginalized kernels is illustrated in the task of classifying bacterial gyrase subunit B (gyrB) amino acid sequences. Contact

    112 Genome Informatics 13: 112–122 (2002) Marginalized Kernels for RNA Sequence Data Analysis

    No full text
    We present novel kernels that measure similarity of two RNA sequences, taking account of their secondary structures. Two types of kernels are presented. One is for RNA sequences with known secondary structures, the other for those without known secondary structures. The latter employs stochastic context-free grammar (SCFG) for estimating the secondary structure. We call the latter the marginalized count kernel (MCK). We show computational experiments for MCK using 74 sets of human tRNA sequence data: (i) kernel principal component analysis (PCA) for visualizing tRNA similarities, (ii) supervised classification with support vector machines (SVMs). Both types of experiment show promising results for MCKs

    A fast structural multiple alignment method for long RNA sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Aligning multiple RNA sequences is essential for analyzing non-coding RNAs. Although many alignment methods for non-coding RNAs, including Sankoff's algorithm for strict structural alignments, have been proposed, they are either inaccurate or computationally too expensive. Faster methods with reasonable accuracies are required for genome-scale analyses.</p> <p>Results</p> <p>We propose a fast algorithm for multiple structural alignments of RNA sequences that is an extension of our pairwise structural alignment method (implemented in SCARNA). The accuracies of the implemented software, MXSCARNA, are at least as favorable as those of state-of-art algorithms that are computationally much more expensive in time and memory.</p> <p>Conclusion</p> <p>The proposed method for structural alignment of multiple RNA sequences is fast enough for large-scale analyses with accuracies at least comparable to those of existing algorithms. The source code of MXSCARNA and its web server are available at <url>http://mxscarna.ncrna.org</url>.</p

    Minimizing the cross validation error to mix kernel matrices of heterogeneous biological data,” Neural Process

    No full text
    Abstract. In biological data, it is often the case that objects are described in two or more representations. In order to perform classification based on such data, we have to combine them in a certain way. In the context of kernel machines, this task amounts to mix several kernel matrices into one. In this paper, we present two ways to mix kernel matrices, where the mixing weights are optimized to minimize the cross validation error. In bacteria classification and gene function prediction experiments, our methods significantly outperformed single kernel classifiers in most cases. Key words. bacteria classification, bioinformatics, kernel machines, mixing kernel matrices 1
    corecore