14,125 research outputs found

    Consistent Learning by Composite Proximal Thresholding

    Full text link
    We investigate the modeling and the numerical solution of machine learning problems with prediction functions which are linear combinations of elements of a possibly infinite-dimensional dictionary. We propose a novel flexible composite regularization model, which makes it possible to incorporate various priors on the coefficients of the prediction function, including sparsity and hard constraints. We show that the estimators obtained by minimizing the regularized empirical risk are consistent in a statistical sense, and we design an error-tolerant composite proximal thresholding algorithm for computing such estimators. New results on the asymptotic behavior of the proximal forward-backward splitting method are derived and exploited to establish the convergence properties of the proposed algorithm. In particular, our method features a o(1/m)o(1/m) convergence rate in objective values

    Regularized Learning Schemes in Feature Banach Spaces

    Full text link
    This paper proposes a unified framework for the investigation of constrained learning theory in reflexive Banach spaces of features via regularized empirical risk minimization. The focus is placed on Tikhonov-like regularization with totally convex functions. This broad class of regularizers provides a flexible model for various priors on the features, including in particular hard constraints and powers of Banach norms. In such context, the main results establish a new general form of the representer theorem and the consistency of the corresponding learning schemes under general conditions on the loss function, the geometry of the feature space, and the modulus of total convexity of the regularizer. In addition, the proposed analysis gives new insight into basic tools such as reproducing Banach spaces, feature maps, and universality. Even when specialized to Hilbert spaces, this framework yields new results that extend the state of the art

    Robust Unsupervised Flexible Auto-weighted Local-Coordinate Concept Factorization for Image Clustering

    Full text link
    We investigate the high-dimensional data clustering problem by proposing a novel and unsupervised representation learning model called Robust Flexible Auto-weighted Local-coordinate Concept Factorization (RFA-LCF). RFA-LCF integrates the robust flexible CF, robust sparse local-coordinate coding and the adaptive reconstruction weighting learning into a unified model. The adaptive weighting is driven by including the joint manifold preserving constraints on the recovered clean data, basis concepts and new representation. Specifically, our RFA-LCF uses a L2,1-norm based flexible residue to encode the mismatch between clean data and its reconstruction, and also applies the robust adaptive sparse local-coordinate coding to represent the data using a few nearby basis concepts, which can make the factorization more accurate and robust to noise. The robust flexible factorization is also performed in the recovered clean data space for enhancing representations. RFA-LCF also considers preserving the local manifold structures of clean data space, basis concept space and the new coordinate space jointly in an adaptive manner way. Extensive comparisons show that RFA-LCF can deliver enhanced clustering results.Comment: Accepted at the 44th IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP 2019

    Bayesian Inference with Posterior Regularization and applications to Infinite Latent SVMs

    Full text link
    Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors can affect posterior distributions through Bayes' rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired post-data posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks whose Bayesian formulation results in hybrid chain graph models. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convex-analysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the large-margin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets, which appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics. Such results were not available until now, and contribute to push forward the interface between these two important subfields, which have been largely treated as isolated in the community.Comment: 49 pages, 11 figure

    Deep Generative Models with Learnable Knowledge Constraints

    Full text link
    The broad set of deep generative models (DGMs) has achieved remarkable advances. However, it is often difficult to incorporate rich structured domain knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a principled framework to impose structured constraints on probabilistic models, but has limited applicability to the diverse DGMs that can lack a Bayesian formulation or even explicit density evaluation. PR also requires constraints to be fully specified a priori, which is impractical or suboptimal for complex knowledge with learnable uncertain parts. In this paper, we establish mathematical correspondence between PR and reinforcement learning (RL), and, based on the connection, expand PR to learn constraints as the extrinsic reward in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is flexible to adapt arbitrary constraints with the model jointly. Experiments on human image generation and templated sentence generation show models with learned knowledge constraints by our algorithm greatly improve over base generative models.Comment: Neural Information Processing Systems (NeurIPS) 201

    Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA), and Sparse Coding Algorithms

    Full text link
    Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet mathematical constraints such as sparse coding and positivity both provide alternate biologically-plausible frameworks for generating brain networks. Non-negative Matrix Factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms (L1L1 Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks for different constraints are used as basis functions to encode the observed functional activity at a given time point. These encodings are decoded using machine learning to compare both the algorithms and their assumptions, using the time series weights to predict whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. For classifying cognitive activity, the sparse coding algorithm of L1L1 Regularized Learning consistently outperformed 4 variations of ICA across different numbers of networks and noise levels (p<<0.001). The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy. Within each algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p<<0.001). The success of sparse coding algorithms may suggest that algorithms which enforce sparse coding, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA

    Deep clustering: On the link between discriminative models and K-means

    Full text link
    In the context of recent deep clustering studies, discriminative models dominate the literature and report the most competitive performances. These models learn a deep discriminative neural network classifier in which the labels are latent. Typically, they use multinomial logistic regression posteriors and parameter regularization, as is very common in supervised learning. It is generally acknowledged that discriminative objective functions (e.g., those based on the mutual information or the KL divergence) are more flexible than generative approaches (e.g., K-means) in the sense that they make fewer assumptions about the data distributions and, typically, yield much better unsupervised deep learning results. On the surface, several recent discriminative models may seem unrelated to K-means. This study shows that these models are, in fact, equivalent to K-means under mild conditions and common posterior models and parameter regularization. We prove that, for the commonly used logistic regression posteriors, maximizing the L2L_2 regularized mutual information via an approximate alternating direction method (ADM) is equivalent to a soft and regularized K-means loss. Our theoretical analysis not only connects directly several recent state-of-the-art discriminative models to K-means, but also leads to a new soft and regularized deep K-means algorithm, which yields competitive performance on several image clustering benchmarks

    Efficient Constrained Tensor Factorization by Alternating Optimization with Primal-Dual Splitting

    Full text link
    Tensor factorization with hard and/or soft constraints has played an important role in signal processing and data analysis. However, existing algorithms for constrained tensor factorization have two drawbacks: (i) they require matrix-inversion; and (ii) they cannot (or at least is very difficult to) handle structured regularizations. We propose a new tensor factorization algorithm that circumvents these drawbacks. The proposed method is built upon alternating optimization, and each subproblem is solved by a primal-dual splitting algorithm, yielding an efficient and flexible algorithmic framework to constrained tensor factorization. The advantages of the proposed method over a state-of-the-art constrained tensor factorization algorithm, called AO-ADMM, are demonstrated on regularized nonnegative tensor factorization.Comment: 5 pages, submitted to ICASSP201

    Harnessing Deep Neural Networks with Logic Rules

    Full text link
    Combining deep neural networks with structured logic rules is desirable to harness flexibility and reduce uninterpretability of the neural models. We propose a general framework capable of enhancing various types of neural networks (e.g., CNNs and RNNs) with declarative first-order logic rules. Specifically, we develop an iterative distillation method that transfers the structured information of logic rules into the weights of neural networks. We deploy the framework on a CNN for sentiment analysis, and an RNN for named entity recognition. With a few highly intuitive rules, we obtain substantial improvements and achieve state-of-the-art or comparable results to previous best-performing systems.Comment: Fix typos in appendix. ACL 201

    Accelerated Parallel and Distributed Algorithm using Limited Internal Memory for Nonnegative Matrix Factorization

    Full text link
    Nonnegative matrix factorization (NMF) is a powerful technique for dimension reduction, extracting latent factors and learning part-based representation. For large datasets, NMF performance depends on some major issues: fast algorithms, fully parallel distributed feasibility and limited internal memory. This research aims to design a fast fully parallel and distributed algorithm using limited internal memory to reach high NMF performance for large datasets. In particular, we propose a flexible accelerated algorithm for NMF with all its L1L_1 L2L_2 regularized variants based on full decomposition, which is a combination of an anti-lopsided algorithm and a fast block coordinate descent algorithm. The proposed algorithm takes advantages of both these algorithms to achieve a linear convergence rate of O(1−1∣∣Q∣∣2)k\mathcal{O}(1-\frac{1}{||Q||_2})^k in optimizing each factor matrix when fixing the other factor one in the sub-space of passive variables, where rr is the number of latent components; where r≤∣∣Q∣∣2≤r\sqrt{r} \leq ||Q||_2 \leq r. In addition, the algorithm can exploit the data sparseness to run on large datasets with limited internal memory of machines. Furthermore, our experimental results are highly competitive with 7 state-of-the-art methods about three significant aspects of convergence, optimality and average of the iteration number. Therefore, the proposed algorithm is superior to fast block coordinate descent methods and accelerated methods
    • …
    corecore