14,125 research outputs found
Consistent Learning by Composite Proximal Thresholding
We investigate the modeling and the numerical solution of machine learning
problems with prediction functions which are linear combinations of elements of
a possibly infinite-dimensional dictionary. We propose a novel flexible
composite regularization model, which makes it possible to incorporate various
priors on the coefficients of the prediction function, including sparsity and
hard constraints. We show that the estimators obtained by minimizing the
regularized empirical risk are consistent in a statistical sense, and we design
an error-tolerant composite proximal thresholding algorithm for computing such
estimators. New results on the asymptotic behavior of the proximal
forward-backward splitting method are derived and exploited to establish the
convergence properties of the proposed algorithm. In particular, our method
features a convergence rate in objective values
Regularized Learning Schemes in Feature Banach Spaces
This paper proposes a unified framework for the investigation of constrained
learning theory in reflexive Banach spaces of features via regularized
empirical risk minimization. The focus is placed on Tikhonov-like
regularization with totally convex functions. This broad class of regularizers
provides a flexible model for various priors on the features, including in
particular hard constraints and powers of Banach norms. In such context, the
main results establish a new general form of the representer theorem and the
consistency of the corresponding learning schemes under general conditions on
the loss function, the geometry of the feature space, and the modulus of total
convexity of the regularizer. In addition, the proposed analysis gives new
insight into basic tools such as reproducing Banach spaces, feature maps, and
universality. Even when specialized to Hilbert spaces, this framework yields
new results that extend the state of the art
Robust Unsupervised Flexible Auto-weighted Local-Coordinate Concept Factorization for Image Clustering
We investigate the high-dimensional data clustering problem by proposing a
novel and unsupervised representation learning model called Robust Flexible
Auto-weighted Local-coordinate Concept Factorization (RFA-LCF). RFA-LCF
integrates the robust flexible CF, robust sparse local-coordinate coding and
the adaptive reconstruction weighting learning into a unified model. The
adaptive weighting is driven by including the joint manifold preserving
constraints on the recovered clean data, basis concepts and new representation.
Specifically, our RFA-LCF uses a L2,1-norm based flexible residue to encode the
mismatch between clean data and its reconstruction, and also applies the robust
adaptive sparse local-coordinate coding to represent the data using a few
nearby basis concepts, which can make the factorization more accurate and
robust to noise. The robust flexible factorization is also performed in the
recovered clean data space for enhancing representations. RFA-LCF also
considers preserving the local manifold structures of clean data space, basis
concept space and the new coordinate space jointly in an adaptive manner way.
Extensive comparisons show that RFA-LCF can deliver enhanced clustering
results.Comment: Accepted at the 44th IEEE International Conference on Acoustics,
Speech, and Signal Processing(ICASSP 2019
Bayesian Inference with Posterior Regularization and applications to Infinite Latent SVMs
Existing Bayesian models, especially nonparametric Bayesian methods, rely on
specially conceived priors to incorporate domain knowledge for discovering
improved latent representations. While priors can affect posterior
distributions through Bayes' rule, imposing posterior regularization is
arguably more direct and in some cases more natural and general. In this paper,
we present regularized Bayesian inference (RegBayes), a novel computational
framework that performs posterior inference with a regularization term on the
desired post-data posterior distribution under an information theoretical
formulation. RegBayes is more flexible than the procedure that elicits expert
knowledge via priors, and it covers both directed Bayesian networks and
undirected Markov networks whose Bayesian formulation results in hybrid chain
graph models. When the regularization is induced from a linear operator on the
posterior distributions, such as the expectation operator, we present a general
convex-analysis theorem to characterize the solution of RegBayes. Furthermore,
we present two concrete examples of RegBayes, infinite latent support vector
machines (iLSVM) and multi-task infinite latent support vector machines
(MT-iLSVM), which explore the large-margin idea in combination with a
nonparametric Bayesian model for discovering predictive latent features for
classification and multi-task learning, respectively. We present efficient
inference methods and report empirical studies on several benchmark datasets,
which appear to demonstrate the merits inherited from both large-margin
learning and Bayesian nonparametrics. Such results were not available until
now, and contribute to push forward the interface between these two important
subfields, which have been largely treated as isolated in the community.Comment: 49 pages, 11 figure
Deep Generative Models with Learnable Knowledge Constraints
The broad set of deep generative models (DGMs) has achieved remarkable
advances. However, it is often difficult to incorporate rich structured domain
knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a
principled framework to impose structured constraints on probabilistic models,
but has limited applicability to the diverse DGMs that can lack a Bayesian
formulation or even explicit density evaluation. PR also requires constraints
to be fully specified a priori, which is impractical or suboptimal for complex
knowledge with learnable uncertain parts. In this paper, we establish
mathematical correspondence between PR and reinforcement learning (RL), and,
based on the connection, expand PR to learn constraints as the extrinsic reward
in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is
flexible to adapt arbitrary constraints with the model jointly. Experiments on
human image generation and templated sentence generation show models with
learned knowledge constraints by our algorithm greatly improve over base
generative models.Comment: Neural Information Processing Systems (NeurIPS) 201
Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA), and Sparse Coding Algorithms
Brain networks in fMRI are typically identified using spatial independent
component analysis (ICA), yet mathematical constraints such as sparse coding
and positivity both provide alternate biologically-plausible frameworks for
generating brain networks. Non-negative Matrix Factorization (NMF) would
suppress negative BOLD signal by enforcing positivity. Spatial sparse coding
algorithms ( Regularized Learning and K-SVD) would impose local
specialization and a discouragement of multitasking, where the total observed
activity in a single voxel originates from a restricted number of possible
brain networks.
The assumptions of independence, positivity, and sparsity to encode
task-related brain networks are compared; the resulting brain networks for
different constraints are used as basis functions to encode the observed
functional activity at a given time point. These encodings are decoded using
machine learning to compare both the algorithms and their assumptions, using
the time series weights to predict whether a subject is viewing a video,
listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects.
For classifying cognitive activity, the sparse coding algorithm of
Regularized Learning consistently outperformed 4 variations of ICA across
different numbers of networks and noise levels (p0.001). The NMF algorithms,
which suppressed negative BOLD signal, had the poorest accuracy. Within each
algorithm, encodings using sparser spatial networks (containing more
zero-valued voxels) had higher classification accuracy (p0.001). The success
of sparse coding algorithms may suggest that algorithms which enforce sparse
coding, discourage multitasking, and promote local specialization may capture
better the underlying source processes than those which allow inexhaustible
local processes such as ICA
Deep clustering: On the link between discriminative models and K-means
In the context of recent deep clustering studies, discriminative models
dominate the literature and report the most competitive performances. These
models learn a deep discriminative neural network classifier in which the
labels are latent. Typically, they use multinomial logistic regression
posteriors and parameter regularization, as is very common in supervised
learning. It is generally acknowledged that discriminative objective functions
(e.g., those based on the mutual information or the KL divergence) are more
flexible than generative approaches (e.g., K-means) in the sense that they make
fewer assumptions about the data distributions and, typically, yield much
better unsupervised deep learning results. On the surface, several recent
discriminative models may seem unrelated to K-means. This study shows that
these models are, in fact, equivalent to K-means under mild conditions and
common posterior models and parameter regularization. We prove that, for the
commonly used logistic regression posteriors, maximizing the regularized
mutual information via an approximate alternating direction method (ADM) is
equivalent to a soft and regularized K-means loss. Our theoretical analysis not
only connects directly several recent state-of-the-art discriminative models to
K-means, but also leads to a new soft and regularized deep K-means algorithm,
which yields competitive performance on several image clustering benchmarks
Efficient Constrained Tensor Factorization by Alternating Optimization with Primal-Dual Splitting
Tensor factorization with hard and/or soft constraints has played an
important role in signal processing and data analysis. However, existing
algorithms for constrained tensor factorization have two drawbacks: (i) they
require matrix-inversion; and (ii) they cannot (or at least is very difficult
to) handle structured regularizations. We propose a new tensor factorization
algorithm that circumvents these drawbacks. The proposed method is built upon
alternating optimization, and each subproblem is solved by a primal-dual
splitting algorithm, yielding an efficient and flexible algorithmic framework
to constrained tensor factorization. The advantages of the proposed method over
a state-of-the-art constrained tensor factorization algorithm, called AO-ADMM,
are demonstrated on regularized nonnegative tensor factorization.Comment: 5 pages, submitted to ICASSP201
Harnessing Deep Neural Networks with Logic Rules
Combining deep neural networks with structured logic rules is desirable to
harness flexibility and reduce uninterpretability of the neural models. We
propose a general framework capable of enhancing various types of neural
networks (e.g., CNNs and RNNs) with declarative first-order logic rules.
Specifically, we develop an iterative distillation method that transfers the
structured information of logic rules into the weights of neural networks. We
deploy the framework on a CNN for sentiment analysis, and an RNN for named
entity recognition. With a few highly intuitive rules, we obtain substantial
improvements and achieve state-of-the-art or comparable results to previous
best-performing systems.Comment: Fix typos in appendix. ACL 201
Accelerated Parallel and Distributed Algorithm using Limited Internal Memory for Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a powerful technique for dimension
reduction, extracting latent factors and learning part-based representation.
For large datasets, NMF performance depends on some major issues: fast
algorithms, fully parallel distributed feasibility and limited internal memory.
This research aims to design a fast fully parallel and distributed algorithm
using limited internal memory to reach high NMF performance for large datasets.
In particular, we propose a flexible accelerated algorithm for NMF with all its
regularized variants based on full decomposition, which is a
combination of an anti-lopsided algorithm and a fast block coordinate descent
algorithm. The proposed algorithm takes advantages of both these algorithms to
achieve a linear convergence rate of in
optimizing each factor matrix when fixing the other factor one in the sub-space
of passive variables, where is the number of latent components; where
. In addition, the algorithm can exploit the data
sparseness to run on large datasets with limited internal memory of machines.
Furthermore, our experimental results are highly competitive with 7
state-of-the-art methods about three significant aspects of convergence,
optimality and average of the iteration number. Therefore, the proposed
algorithm is superior to fast block coordinate descent methods and accelerated
methods
- …