178 research outputs found

    Sequential Quantiles via Hermite Series Density Estimation

    Full text link
    Sequential quantile estimation refers to incorporating observations into quantile estimates in an incremental fashion thus furnishing an online estimate of one or more quantiles at any given point in time. Sequential quantile estimation is also known as online quantile estimation. This area is relevant to the analysis of data streams and to the one-pass analysis of massive data sets. Applications include network traffic and latency analysis, real time fraud detection and high frequency trading. We introduce new techniques for online quantile estimation based on Hermite series estimators in the settings of static quantile estimation and dynamic quantile estimation. In the static quantile estimation setting we apply the existing Gauss-Hermite expansion in a novel manner. In particular, we exploit the fact that Gauss-Hermite coefficients can be updated in a sequential manner. To treat dynamic quantile estimation we introduce a novel expansion with an exponentially weighted estimator for the Gauss-Hermite coefficients which we term the Exponentially Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time. In doing so we provide a solution to online distribution function and online quantile function estimation on data streams. In particular we derive an analytical expression for the CDF and prove consistency results for the CDF under certain conditions. In addition we analyse the associated quantile estimator. Simulation studies and tests on real data reveal the Gauss-Hermite based algorithms to be competitive with a leading existing algorithm.Comment: 43 pages, 9 figures. Improved version incorporating referee comments, as appears in Electronic Journal of Statistic

    Attend and Interact: Higher-Order Object Interactions for Video Understanding

    Full text link
    Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation or pairwise object relationships. Furthermore, learning interactions across multiple objects in hundreds of frames for video is computationally infeasible and performance may suffer since a large combinatorial space has to be modeled. In this paper, we propose to efficiently learn higher-order interactions between arbitrary subgroups of objects for fine-grained video understanding. We demonstrate that modeling object interactions significantly improves accuracy for both action recognition and video captioning, while saving more than 3-times the computation over traditional pairwise relationships. The proposed method is validated on two large-scale datasets: Kinetics and ActivityNet Captions. Our SINet and SINet-Caption achieve state-of-the-art performances on both datasets even though the videos are sampled at a maximum of 1 FPS. To the best of our knowledge, this is the first work modeling object interactions on open domain large-scale video datasets, and we additionally model higher-order object interactions which improves the performance with low computational costs.Comment: CVPR 201

    Combining classifiers for improved classification of proteins from sequence or structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting a protein's structural or functional class from its amino acid sequence or structure is a fundamental problem in computational biology. Recently, there has been considerable interest in using discriminative learning algorithms, in particular support vector machines (SVMs), for classification of proteins. However, because sufficiently many positive examples are required to train such classifiers, all SVM-based methods are hampered by limited coverage.</p> <p>Results</p> <p>In this study, we develop a hybrid machine learning approach for classifying proteins, and we apply the method to the problem of assigning proteins to structural categories based on their sequences or their 3D structures. The method combines a full-coverage but lower accuracy nearest neighbor method with higher accuracy but reduced coverage multiclass SVMs to produce a full coverage classifier with overall improved accuracy. The hybrid approach is based on the simple idea of "punting" from one method to another using a learned threshold.</p> <p>Conclusion</p> <p>In cross-validated experiments on the SCOP hierarchy, the hybrid methods consistently outperform the individual component methods at all levels of coverage.</p> <p>Code and data sets are available at <url>http://noble.gs.washington.edu/proj/sabretooth</url></p

    Rankprop: a web server for protein remote homology detection

    Get PDF
    Summary: We present a large-scale implementation of the Rankprop protein homology ranking algorithm in the form of an openly accessible web server. We use the NRDB40 PSI-BLAST all-versus-all protein similarity network of 1.1 million proteins to construct the graph for the Rankprop algorithm, whereas previously, results were only reported for a database of 108 000 proteins. We also describe two algorithmic improvements to the original algorithm, including propagation from multiple homologs of the query and better normalization of ranking scores, that lead to higher accuracy and to scores with a probabilistic interpretation

    SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition

    Get PDF
    Background: Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results: We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at http://svm-fold.c2b2.columbia.edu. Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion: By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition

    A versatile approach to multiple gene RNA interference using microRNA-based short hairpin RNAs

    Get PDF
    Background: Effective and stable knockdown of multiple gene targets by RNA interference is often necessary to overcome isoform redundancy, but it remains a technical challenge when working with intractable cell systems. Results: We have developed a flexible platform using RNA polymerase II promoter-driven expression of microRNA-like short hairpin RNAs which permits robust depletion of multiple target genes from a single transcript. Recombination-based subcloning permits expression of multi-shRNA transcripts from a comprehensive range of plasmid or viral vectors. Retroviral delivery of transcripts targeting isoforms of cAMP-dependent protein kinase in the RAW264.7 murine macrophage cell line emphasizes the utility of this approach and provides insight to cAMP-dependent transcription. Conclusion: We demonstrate functional consequences of depleting multiple endogenous target genes using miR-shRNAs, and highlight the versatility of the described vector platform for multiple target gene knockdown in mammalian cells

    Deciphering Signaling Outcomes from a System of Complex Networks

    Get PDF
    Cellular signal transduction machinery integrates information from multiple inputs to actuate discrete cellular behaviors. Interaction complexity exists when an input modulates the output behavior that results from other inputs. To address whether this machinery is iteratively complex—that is, whether increasing numbers of inputs produce exponential increases in discrete cellular behaviors—we examined the modulated secretion of six cytokines from macrophages in response to up to five-way combinations of an agonist of Toll-like receptor 4, three cytokines, and conditions that activated the cyclic adenosine monophosphate pathway. Although all of the selected ligands showed synergy in paired combinations, few examples of nonadditive outputs were found in response to higher-order combinations. This suggests that most potential interactions are not realized and that unique cellular responses are limited to discrete subsets of ligands and pathways that enhance specific cellular functions

    The Alliance for Cellular Signaling Plasmid Collection: A Flexible Resource for Protein Localization Studies and Signaling Pathway Analysis

    Get PDF
    Cellular responses to inputs that vary both temporally and spatially are determined by complex relationships between the components of cell signaling networks. Analysis of these relationships requires access to a wide range of experimental reagents and techniques, including the ability to express the protein components of the model cells in a variety of contexts. As part of the Alliance for Cellular Signaling, we developed a robust method for cloning large numbers of signaling ORFs into Gateway® entry vectors, and we created a wide range of compatible expression platforms for proteomics applications. To date, we have generated over 3000 plasmids that are available to the scientific community via the American Type Culture Collection. We have established a website at www.signaling-gateway.org/data/plasmid/ that allows users to browse, search, and blast Alliance for Cellular Signaling plasmids. The collection primarily contains murine signaling ORFs with an emphasis on kinases and G protein signaling genes. Here we describe the cloning, databasing, and application of this proteomics resource for large scale subcellular localization screens in mammalian cell lines

    Synergistic Ca^(2+) Responses by Gα_i- and Gα_q-coupled G-protein-coupled Receptors Require a Single PLCβ Isoform That Is Sensitive to Both Gβ_γ and Gα_q

    Get PDF
    Cross-talk between Gα_i- and Gα_q-linked G-protein-coupled receptors yields synergistic Ca^(2+) responses in a variety of cell types. Prior studies have shown that synergistic Ca^(2+) responses from macrophage G-protein-coupled receptors are primarily dependent on phospholipase Cβ3 (PLCβ3), with a possible contribution of PLCβ2, whereas signaling through PLCβ4 interferes with synergy. We here show that synergy can be induced by the combination of Gβγ and Gαq activation of a single PLCβ isoform. Synergy was absent in macrophages lacking both PLCβ2 and PLCβ3, but it was fully reconstituted following transduction with PLCβ3 alone. Mechanisms of PLCβ-mediated synergy were further explored in NIH-3T3 cells, which express little if any PLCβ2. RNAi-mediated knockdown of endogenous PLCβs demonstrated that synergy in these cells was dependent on PLCβ3, but PLCβ1 and PLCβ4 did not contribute, and overexpression of either isoform inhibited Ca^(2+) synergy. When synergy was blocked by RNAi of endogenous PLCβ3, it could be reconstituted by expression of either human PLCβ3 or mouse PLCβ2. In contrast, it could not be reconstituted by human PLCβ3 with a mutation of the Y box, which disrupted activation by Gβγ, and it was only partially restored by human PLCβ3 with a mutation of the C terminus, which partly disrupted activation by Gα_q. Thus, both Gβγ and Gα_q contribute to activation of PLCβ3 in cells for Ca^(2+) synergy. We conclude that Ca^(2+) synergy between Gα_i-coupled and Gα_q-coupled receptors requires the direct action of both Gβγ and Gαq on PLCβ and is mediated primarily by PLCβ3, although PLCβ2 is also competent
    corecore