4,648 research outputs found
Unsupervised Transductive Domain Adaptation
Supervised learning with large scale labeled datasets and deep layered models
has made a paradigm shift in diverse areas in learning and recognition.
However, this approach still suffers generalization issues under the presence
of a domain shift between the training and the test data distribution. In this
regard, unsupervised domain adaptation algorithms have been proposed to
directly address the domain shift problem. In this paper, we approach the
problem from a transductive perspective. We incorporate the domain shift and
the transductive target inference into our framework by jointly solving for an
asymmetric similarity metric and the optimal transductive target label
assignment. We also show that our model can easily be extended for deep feature
learning in order to learn features which are discriminative in the target
domain. Our experiments show that the proposed method significantly outperforms
state-of-the-art algorithms in both object recognition and digit classification
experiments by a large margin
Domain Adaptation in Highly Imbalanced and Overlapping Datasets
In many machine learning domains, datasets are characterized by highly
imbalanced and overlapping classes. Particularly in the medical domain, a
specific list of symptoms can be labeled as one of various different
conditions. Some of these conditions may be more prevalent than others by
several orders of magnitude. Here we present a novel unsupervised domain
adaptation scheme for such datasets. The scheme, based on a specific type of
Quantification, is designed to work under both label and conditional shifts. It
is demonstrated on datasets generated from electronic health records and
provides high quality results for both Quantification and Domain Adaptation in
very challenging scenarios. Potential benefits of using this scheme in the
current COVID-19 outbreak, for estimation of prevalence and probability of
infection are discussed.Comment: 16 pages, 4 figure
On the achievability of blind source separation for high-dimensional nonlinear source mixtures
For many years, a combination of principal component analysis (PCA) and
independent component analysis (ICA) has been used for blind source separation
(BSS). However, it remains unclear why these linear methods work well with
real-world data that involve nonlinear source mixtures. This work theoretically
validates that a cascade of linear PCA and ICA can solve a nonlinear BSS
problem accurately---when the sensory inputs are generated from hidden sources
via the nonlinear mapping with sufficient dimensionality. Our proposed theorem,
termed the asymptotic linearization theorem, theoretically guarantees that
applying linear PCA to the inputs can reliably extract a subspace spanned by
the linear projections from every hidden source as the major components---and
thus projecting the inputs onto their major eigenspace can effectively recover
a linear transformation of the hidden sources. Then, subsequent application of
linear ICA can separate all the true independent hidden sources accurately.
Zero-element-wise-error nonlinear BSS is asymptotically attained when the
source dimensionality is large and the input dimensionality is larger than the
source dimensionality. Our proposed theorem is validated analytically and
numerically. Moreover, the same computation can be performed by using
Hebbian-like plasticity rules, implying the biological plausibility of this
nonlinear BSS strategy. Our results highlight the utility of linear PCA and ICA
for accurately and reliably recovering nonlinearly mixed sources---and further
suggest the importance of employing sensors with sufficient dimensionality to
identify true hidden sources of real-world data
Spectral-graph Based Classifications: Linear Regression for Classification and Normalized Radial Basis Function Network
Spectral graph theory has been widely applied in unsupervised and
semi-supervised learning. In this paper, we find for the first time, to our
knowledge, that it also plays a concrete role in supervised classification. It
turns out that two classifiers are inherently related to the theory: linear
regression for classification (LRC) and normalized radial basis function
network (nRBFN), corresponding to linear and nonlinear kernel respectively. The
spectral graph theory provides us with a new insight into a fundamental aspect
of classification: the tradeoff between fitting error and overfitting risk.
With the theory, ideal working conditions for LRC and nRBFN are presented,
which ensure not only zero fitting error but also low overfitting risk. For
quantitative analysis, two concepts, the fitting error and the spectral risk
(indicating overfitting), have been defined. Their bounds for nRBFN and LRC are
derived. A special result shows that the spectral risk of nRBFN is lower
bounded by the number of classes and upper bounded by the size of radial basis.
When the conditions are not met exactly, the classifiers will pursue the
minimum fitting error, running into the risk of overfitting. It turns out that
-norm regularization can be applied to control overfitting. Its effect
is explored under the spectral context. It is found that the two terms in the
-regularized objective are one-one correspondent to the fitting error
and the spectral risk, revealing a tradeoff between the two quantities.
Concerning practical performance, we devise a basis selection strategy to
address the main problem hindering the applications of (n)RBFN. With the
strategy, nRBFN is easy to implement yet flexible. Experiments on 14 benchmark
data sets show the performance of nRBFN is comparable to that of SVM, whereas
the parameter tuning of nRBFN is much easier, leading to reduction of model
selection time
Quantifying Mental Health from Social Media with Neural User Embeddings
Mental illnesses adversely affect a significant proportion of the population
worldwide. However, the methods traditionally used for estimating and
characterizing the prevalence of mental health conditions are time-consuming
and expensive. Consequently, best-available estimates concerning the prevalence
of mental health conditions are often years out of date. Automated approaches
to supplement these survey methods with broad, aggregated information derived
from social media content provides a potential means for near real-time
estimates at scale. These may, in turn, provide grist for supporting,
evaluating and iteratively improving upon public health programs and
interventions.
We propose a novel model for automated mental health status quantification
that incorporates user embeddings. This builds upon recent work exploring
representation learning methods that induce embeddings by leveraging social
media post histories. Such embeddings capture latent characteristics of
individuals (e.g., political leanings) and encode a soft notion of homophily.
In this paper, we investigate whether user embeddings learned from twitter post
histories encode information that correlates with mental health statuses. To
this end, we estimated user embeddings for a set of users known to be affected
by depression and post-traumatic stress disorder (PTSD), and for a set of
demographically matched `control' users. We then evaluated these embeddings
with respect to: (i) their ability to capture homophilic relations with respect
to mental health status; and (ii) the performance of downstream mental health
prediction models based on these features. Our experimental results demonstrate
that the user embeddings capture similarities between users with respect to
mental conditions, and are predictive of mental health
Learning for Multi-Model and Multi-Type Fitting
Multi-model fitting has been extensively studied from the random sampling and
clustering perspectives. Most assume that only a single type/class of model is
present and their generalizations to fitting multiple types of
models/structures simultaneously are non-trivial. The inherent challenges
include choice of types and numbers of models, sampling imbalance and parameter
tuning, all of which render conventional approaches ineffective. In this work,
we formulate the multi-model multi-type fitting problem as one of learning deep
feature embedding that is clustering-friendly. In other words, points of the
same clusters are embedded closer together through the network. For inference,
we apply K-means to cluster the data in the embedded feature space and model
selection is enabled by analyzing the K-means residuals. Experiments are
carried out on both synthetic and real world multi-type fitting datasets,
producing state-of-the-art results. Comparisons are also made on single-type
multi-model fitting tasks with promising results as well
Towards a theory of machine learning
We define a neural network as a septuple consisting of (1) a state vector,
(2) an input projection, (3) an output projection, (4) a weight matrix, (5) a
bias vector, (6) an activation map and (7) a loss function. We argue that the
loss function can be imposed either on the boundary (i.e. input and/or output
neurons) or in the bulk (i.e. hidden neurons) for both supervised and
unsupervised systems. We apply the principle of maximum entropy to derive a
canonical ensemble of the state vectors subject to a constraint imposed on the
bulk loss function by a Lagrange multiplier (or an inverse temperature
parameter). We show that in an equilibrium the canonical partition function
must be a product of two factors: a function of the temperature and a function
of the bias vector and weight matrix. Consequently, the total Shannon entropy
consists of two terms which represent respectively a thermodynamic entropy and
a complexity of the neural network. We derive the first and second laws of
learning: during learning the total entropy must decrease until the system
reaches an equilibrium (i.e. the second law), and the increment in the loss
function must be proportional to the increment in the thermodynamic entropy
plus the increment in the complexity (i.e. the first law). We calculate the
entropy destruction to show that the efficiency of learning is given by the
Laplacian of the total free energy which is to be maximized in an optimal
neural architecture, and explain why the optimization condition is better
satisfied in a deep network with a large number of hidden layers. The key
properties of the model are verified numerically by training a supervised
feedforward neural network using the method of stochastic gradient descent. We
also discuss a possibility that the entire universe on its most fundamental
level is a neural network.Comment: 32 pages, 6 figures, accepted for publication in Machine Learning:
Science and Technolog
A Plug&Play P300 BCI Using Information Geometry
This paper presents a new classification methods for Event Related Potentials
(ERP) based on an Information geometry framework. Through a new estimation of
covariance matrices, this work extend the use of Riemannian geometry, which was
previously limited to SMR-based BCI, to the problem of classification of ERPs.
As compared to the state-of-the-art, this new method increases performance,
reduces the number of data needed for the calibration and features good
generalisation across sessions and subjects. This method is illustrated on data
recorded with the P300-based game brain invaders. Finally, an online and
adaptive implementation is described, where the BCI is initialized with generic
parameters derived from a database and continuously adapt to the individual,
allowing the user to play the game without any calibration while keeping a high
accuracy
Robust Multiple Manifolds Structure Learning
We present a robust multiple manifolds structure learning (RMMSL) scheme to
robustly estimate data structures under the multiple low intrinsic dimensional
manifolds assumption. In the local learning stage, RMMSL efficiently estimates
local tangent space by weighted low-rank matrix factorization. In the global
learning stage, we propose a robust manifold clustering method based on local
structure learning results. The proposed clustering method is designed to get
the flattest manifolds clusters by introducing a novel curved-level similarity
function. Our approach is evaluated and compared to state-of-the-art methods on
synthetic data, handwritten digit images, human motion capture data and
motorbike videos. We demonstrate the effectiveness of the proposed approach,
which yields higher clustering accuracy, and produces promising results for
challenging tasks of human motion segmentation and motion flow learning from
videos.Comment: ICML201
Subspace Network: Deep Multi-Task Censored Regression for Modeling Neurodegenerative Diseases
Over the past decade a wide spectrum of machine learning models have been
developed to model the neurodegenerative diseases, associating biomarkers,
especially non-intrusive neuroimaging markers, with key clinical scores
measuring the cognitive status of patients. Multi-task learning (MTL) has been
commonly utilized by these studies to address high dimensionality and small
cohort size challenges. However, most existing MTL approaches are based on
linear models and suffer from two major limitations: 1) they cannot explicitly
consider upper/lower bounds in these clinical scores; 2) they lack the
capability to capture complicated non-linear interactions among the variables.
In this paper, we propose Subspace Network, an efficient deep modeling approach
for non-linear multi-task censored regression. Each layer of the subspace
network performs a multi-task censored regression to improve upon the
predictions from the last layer via sketching a low-dimensional subspace to
perform knowledge transfer among learning tasks. Under mild assumptions, for
each layer the parametric subspace can be recovered using only one pass of
training data. Empirical results demonstrate that the proposed subspace network
quickly picks up the correct parameter subspaces, and outperforms
state-of-the-arts in predicting neurodegenerative clinical scores using
information in brain imaging
- …