3,062 research outputs found

    Centroid-Based Clustering with ab-Divergences

    Get PDF
    Centroid-based clustering is a widely used technique within unsupervised learning algorithms in many research fields. The success of any centroid-based clustering relies on the choice of the similarity measure under use. In recent years, most studies focused on including several divergence measures in the traditional hard k-means algorithm. In this article, we consider the problem of centroid-based clustering using the family of ab-divergences, which is governed by two parameters, a and b. We propose a new iterative algorithm, ab-k-means, giving closed-form solutions for the computation of the sided centroids. The algorithm can be fine-tuned by means of this pair of values, yielding a wide range of the most frequently used divergences. Moreover, it is guaranteed to converge to local minima for a wide range of values of the pair (a, b). Our theoretical contribution has been validated by several experiments performed with synthetic and real data and exploring the (a, b) plane. The numerical results obtained confirm the quality of the algorithm and its suitability to be used in several practical applications.MINECO TEC2017-82807-

    The Diagonalized Newton Algorithm for Nonnegative Matrix Factorization

    Full text link
    Non-negative matrix factorization (NMF) has become a popular machine learning approach to many problems in text mining, speech and image processing, bio-informatics and seismic data analysis to name a few. In NMF, a matrix of non-negative data is approximated by the low-rank product of two matrices with non-negative entries. In this paper, the approximation quality is measured by the Kullback-Leibler divergence between the data and its low-rank reconstruction. The existence of the simple multiplicative update (MU) algorithm for computing the matrix factors has contributed to the success of NMF. Despite the availability of algorithms showing faster convergence, MU remains popular due to its simplicity. In this paper, a diagonalized Newton algorithm (DNA) is proposed showing faster convergence while the implementation remains simple and suitable for high-rank problems. The DNA algorithm is applied to various publicly available data sets, showing a substantial speed-up on modern hardware.Comment: 8 pages + references; International Conference on Learning Representations, 201

    Automatic Differentiation Variational Inference

    Full text link
    Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference (ADVI). Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models-no conjugacy assumptions are required. We study ADVI across ten different models and apply it to a dataset with millions of observations. ADVI is integrated into Stan, a probabilistic programming system; it is available for immediate use

    Automatic Variational Inference in Stan

    Full text link
    Variational inference is a scalable technique for approximate Bayesian inference. Deriving variational inference algorithms requires tedious model-specific calculations; this makes it difficult to automate. We propose an automatic variational inference algorithm, automatic differentiation variational inference (ADVI). The user only provides a Bayesian model and a dataset; nothing else. We make no conjugacy assumptions and support a broad class of models. The algorithm automatically determines an appropriate variational family and optimizes the variational objective. We implement ADVI in Stan (code available now), a probabilistic programming framework. We compare ADVI to MCMC sampling across hierarchical generalized linear models, nonconjugate matrix factorization, and a mixture model. We train the mixture model on a quarter million images. With ADVI we can use variational inference on any model we write in Stan
    corecore