2,106,221 research outputs found

    Ridge Fusion in Statistical Learning

    Full text link
    We propose a penalized likelihood method to jointly estimate multiple precision matrices for use in quadratic discriminant analysis and model based clustering. A ridge penalty and a ridge fusion penalty are used to introduce shrinkage and promote similarity between precision matrix estimates. Block-wise coordinate descent is used for optimization, and validation likelihood is used for tuning parameter selection. Our method is applied in quadratic discriminant analysis and semi-supervised model based clustering.Comment: 24 pages and 9 tables, 3 figure

    Statistical relational learning with soft quantifiers

    Get PDF
    Quantification in statistical relational learning (SRL) is either existential or universal, however humans might be more inclined to express knowledge using soft quantifiers, such as ``most'' and ``a few''. In this paper, we define the syntax and semantics of PSL^Q, a new SRL framework that supports reasoning with soft quantifiers, and present its most probable explanation (MPE) inference algorithm. To the best of our knowledge, PSL^Q is the first SRL framework that combines soft quantifiers with first-order logic rules for modelling uncertain relational data. Our experimental results for link prediction in social trust networks demonstrate that the use of soft quantifiers not only allows for a natural and intuitive formulation of domain knowledge, but also improves the accuracy of inferred results

    Learning Arbitrary Statistical Mixtures of Discrete Distributions

    Get PDF
    We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, ϑ\vartheta, is a probability distribution over probability distributions pp, where each such pp is a probability distribution over [n]={1,2,,n}[n] = \{1,2,\dots,n\}. When we sample from ϑ\vartheta, we do not observe pp directly, but only indirectly and in very noisy fashion, by sampling from [n][n] repeatedly, independently KK times from the distribution pp. The problem is to infer ϑ\vartheta to high accuracy in transportation (earthmover) distance. We give the first efficient algorithms for learning this mixture model without making any restricting assumptions on the structure of the distribution ϑ\vartheta. We bound the quality of the solution as a function of the size of the samples KK and the number of samples used. Our model and results have applications to a variety of unsupervised learning scenarios, including learning topic models and collaborative filtering.Comment: 23 pages. Preliminary version in the Proceeding of the 47th ACM Symposium on the Theory of Computing (STOC15

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

    Statistical Mechanics of Time Domain Ensemble Learning

    Full text link
    Conventional ensemble learning combines students in the space domain. On the other hand, in this paper we combine students in the time domain and call it time domain ensemble learning. In this paper, we analyze the generalization performance of time domain ensemble learning in the framework of online learning using a statistical mechanical method. We treat a model in which both the teacher and the student are linear perceptrons with noises. Time domain ensemble learning is twice as effective as conventional space domain ensemble learning.Comment: 10 pages, 10 figure

    Fast rates in statistical and online learning

    Get PDF
    The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure
    corecore