17 research outputs found

    CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data

    Get PDF
    GO BP Terms for myoblast data. Full table of enriched GO BP terms for each topic in myoblast data. (PDF 36 kb

    SiBIC: A Tool for Generating a Network of Biclusters Captured by Maximal Frequent Itemset Mining

    Get PDF
    Biclustering extracts coexpressed genes under certain experimental conditions, providing more precise insight into the genetic behaviors than one-dimensional clustering. For understanding the biological features of genes in a single bicluster, visualizations such as heatmaps or parallel coordinate plots and tools for enrichment analysis are widely used. However, simultaneously handling many biclusters still remains a challenge. Thus, we developed a web service named SiBIC, which, using maximal frequent itemset mining, exhaustively discovers significant biclusters, which turn into networks of overlapping biclusters, where nodes are gene sets and edges show their overlaps in the detected biclusters. SiBIC provides a graphical user interface for manipulating a gene set network, where users can find target gene sets based on the enriched network. This chapter provides a user guide/instruction of SiBIC with background of having developed this software. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/sibic/faces/index.jsp

    Calpain Cleavage Prediction Using Multiple Kernel Learning

    Get PDF
    Calpain, an intracellular -dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain's proteolytic action. An online implementation of our prediction tool is available at http://calpain.org

    Schematic representation of contact region between calpain and substrate sequence.

    No full text
    <p>Domain II is the protease domain of calpain, while domain III binds . Amino acid sequences of domain III are less conserved than those of domain II, which are highly conserved not only between - and m-calpains but also among all calpain family members.</p

    Schematic structures of major calpain homologues.

    No full text
    <p>ā€œConventionalā€ calpains (- and m-calpain) are composed of larger catalytic subunits (calpain-1 and -2) and a smaller regulatory subunit. Some homologues, such as skeletal muscle-specific calpain (calpain-3/p94) have slightly diverged properties, including unique insertion sequences (NS, IS1 and IS2) and no requirement for a small subunit. Symbols used are: <b>I</b>: N-terminal domain with little homology; <b>IIa</b> and <b>IIb</b>: protease sub-domains containing the active sites Cys and His/Asn, respectively; <b>III</b>: C2-like -binding domain; <b>IV</b> and <b>VI</b>: 5-EF-hand -binding domain; <b>V</b>: Gly-rich hydrophobic domain; <b>NS</b>, <b>IS1</b> and <b>IS2</b>: p94-specific sequences.</p

    AUC as function of cleavage extension length.

    No full text
    <p>AUC values produced by MKL prediction method, when varying extension length for one feature set at a time (all other parameters at their optimal value). See <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0019035#pone-0019035-t002" target="_blank">table 2</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0019035#pone-0019035-t003" target="_blank">3</a> for notations.</p

    MKL weights.

    No full text
    <p>Optimal training weights obtained for each combination of kernels (on full calpain set) using MKL training algorithm described in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0019035#pone.0019035-Sonnenburg1" target="_blank">[39]</a>.</p
    corecore