945 research outputs found

    The Bayesian two-sample t-test

    Get PDF
    In this article we show how the pooled-variance two-sample t-statistic arises from a Bayesian formulation of the two-sided point null testing problem, with emphasis on teaching. We identify a reasonable and useful prior giving a closed-form Bayes factor that can be written in terms of the distribution of the two-sample t-statistic under the null and alternative hypotheses respectively. This provides a Bayesian motivation for the two-sample t-statistic, which has heretofore been buried as a special case of more complex linear models, or given only roughly via analytic or Monte Carlo approximations. The resulting formulation of the Bayesian test is easy to apply in practice, and also easy to teach in an introductory course that emphasizes Bayesian methods. The priors are easy to use and simple to elicit, and the posterior probabilities are easily computed using available software, in some cases using spreadsheets

    Coupling multiple views of relations for recommendation

    Full text link
    Š Springer International Publishing Switzerland 2015. Learning user/item relation is a key issue in recommender system, and existing methods mostly measure the user/item relation from one particular aspect, e.g., historical ratings, etc. However, the relations between users/items could be influenced by multifaceted factors, so any single type of measure could get only a partial view of them. Thus it is more advisable to integrate measures from different aspects to estimate the underlying user/item relation. Furthermore, the estimation of underlying user/item relation should be optimal for current task. To this end, we propose a novel model to couple multiple relations measured on different aspects, and determine the optimal user/item relations via learning the optimal way of integrating these relation measures. Specifically, matrix factorization model is extended in this paper by considering the relations between latent factors of different users/items. Experiments are conducted and our method shows good performance and outperforms other baseline methods

    Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

    Full text link
    Virtual screening (VS) is widely used during computational drug discovery to reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to predict new compound-protein interactions (CPIs) from known CPI network data using several methods, including machine learning and data mining. Although CGBVS facilitates highly efficient and accurate CPI prediction, it has poor performance for prediction of new compounds for which CPIs are unknown. The pairwise kernel method (PKM) is a state-of-the-art CGBVS method and shows high accuracy for prediction of new compounds. In this study, on the basis of link mining, we improved the PKM by combining link indicator kernel (LIK) and chemical similarity and evaluated the accuracy of these methods. The proposed method obtained an average area under the precision-recall curve (AUPR) value of 0.562, which was higher than that achieved by the conventional Gaussian interaction profile (GIP) method (0.425), and the calculation time was only increased by a few percent

    A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

    Full text link
    Entity Matching (EM) is a core data cleaning task, aiming to identify different mentions of the same real-world entity. Active learning is one way to address the challenge of scarce labeled data in practice, by dynamically collecting the necessary examples to be labeled by an Oracle and refining the learned model (classifier) upon them. In this paper, we build a unified active learning benchmark framework for EM that allows users to easily combine different learning algorithms with applicable example selection algorithms. The goal of the framework is to enable concrete guidelines for practitioners as to what active learning combinations will work well for EM. Towards this, we perform comprehensive experiments on publicly available EM datasets from product and publication domains to evaluate active learning methods, using a variety of metrics including EM quality, #labels and example selection latencies. Our most surprising result finds that active learning with fewer labels can learn a classifier of comparable quality as supervised learning. In fact, for several of the datasets, we show that there is an active learning combination that beats the state-of-the-art supervised learning result. Our framework also includes novel optimizations that improve the quality of the learned model by roughly 9% in terms of F1-score and reduce example selection latencies by up to 10x without affecting the quality of the model.Comment: accepted for publication in ACM-SIGMOD 2020, 15 page

    Numerous proteins with unique characteristics are degraded by the 26S proteasome following monoubiquitination

    Get PDF
    The "canonical" proteasomal degradation signal is a substrate-anchored polyubiquitin chain. However, a handful of proteins were shown to be targeted following monoubiquitination. In this study, we established-in both human and yeast cells-a systematic approach for the identification of monoubiquitination-dependent proteasomal substrates. The cellular wild-type polymerizable ubiquitin was replaced with ubiquitin that cannot form chains. Using proteomic analysis, we screened for substrates that are nevertheless degraded under these conditions compared with those that are stabilized, and therefore require polyubiquitination for their degradation. For randomly sampled representative substrates, we confirmed that their cellular stability is in agreement with our screening prediction. Importantly, the two groups display unique features: monoubiquitinated substrates are smaller than the polyubiquitinated ones, are enriched in specific pathways, and, in humans, are structurally less disordered. We suggest that monoubiquitination-dependent degradation is more widespread than assumed previously, and plays key roles in various cellular processes

    A human cell atlas of fetal gene expression

    Get PDF
    The gene expression program underlying the specification of human cell types is of fundamental interest. We generated human cell atlases of gene expression and chromatin accessibility in fetal tissues. For gene expression, we applied three-level combinatorial indexing to >110 samples representing 15 organs, ultimately profiling ~4 million single cells. We leveraged the literature and other atlases to identify and annotate hundreds of cell types and subtypes, both within and across tissues. Our analyses focused on organ-specific specializations of broadly distributed cell types (such as blood, endothelial, and epithelial), sites of fetal erythropoiesis (which notably included the adrenal gland), and integration with mouse developmental atlases (such as conserved specification of blood cells). These data represent a rich resource for the exploration of in vivo human gene expression in diverse tissues and cell types

    MiRNA-Mediated Control of HLA-G Expression and Function

    Get PDF
    HLA-G is a non-classical HLA class-Ib molecule expressed mainly by the extravillous cytotrophoblasts (EVT) of the placenta. The expression of HLA-G on these fetal cells protects the EVT cells from immune rejection and is therefore important for a healthy pregnancy. The mechanisms controlling HLA-G expression are largely unknown. Here we demonstrate that miR-148a and miR-152 down-regulate HLA-G expression by binding its 3′UTR and that this down-regulation of HLA-G affects LILRB1 recognition and consequently, abolishes the LILRB1-mediated inhibition of NK cell killing. We further demonstrate that the C/G polymorphism at position +3142 of HLA-G 3′UTR has no effect on the miRNA targeting of HLA-G. We show that in the placenta both miR-148a and miR-152 miRNAs are expressed at relatively low levels, compared to other healthy tissues, and that the mRNA levels of HLA-G are particularly high and we therefore suggest that this might enable the tissue specific expression of HLA-G
    • …
    corecore