122 research outputs found
Dynamic sampling schemes for optimal noise learning under multiple nonsmooth constraints
We consider the bilevel optimisation approach proposed by De Los Reyes,
Sch\"onlieb (2013) for learning the optimal parameters in a Total Variation
(TV) denoising model featuring for multiple noise distributions. In
applications, the use of databases (dictionaries) allows an accurate estimation
of the parameters, but reflects in high computational costs due to the size of
the databases and to the nonsmooth nature of the PDE constraints. To overcome
this computational barrier we propose an optimisation algorithm that by
sampling dynamically from the set of constraints and using a quasi-Newton
method, solves the problem accurately and in an efficient way
A latent variable ranking model for content-based retrieval
34th European Conference on IR Research, ECIR 2012, Barcelona, Spain, April 1-5, 2012. ProceedingsSince their introduction, ranking SVM models [11] have become a powerful tool for training content-based retrieval systems. All we need for training a model are retrieval examples in the form of triplet constraints, i.e. examples specifying that relative to some query, a database item a should be ranked higher than database item b. These types of constraints could be obtained from feedback of users of the retrieval system. Most previous ranking models learn either a global combination of elementary similarity functions or a combination defined with respect to a single database item. Instead, we propose a “coarse to fine” ranking model where given a query we first compute a distribution over “coarse” classes and then use the linear combination that has been optimized for queries of that class. These coarse classes are hidden and need to be induced by the training algorithm. We propose a latent variable ranking model that induces both the latent classes and the weights of the linear combination for each class from ranking triplets. Our experiments over two large image datasets and a text retrieval dataset show the advantages of our model over learning a global combination as well as a combination for each test point (i.e. transductive setting). Furthermore, compared to the transductive approach our model has a clear computational advantages since it does not need to be retrained for each test query.Spanish Ministry of Science and Innovation (JCI-2009-04240)EU PASCAL2 Network of Excellence (FP7-ICT-216886
A Neural Networks Committee for the Contextual Bandit Problem
This paper presents a new contextual bandit algorithm, NeuralBandit, which
does not need hypothesis on stationarity of contexts and rewards. Several
neural networks are trained to modelize the value of rewards knowing the
context. Two variants, based on multi-experts approach, are proposed to choose
online the parameters of multi-layer perceptrons. The proposed algorithms are
successfully tested on a large dataset with and without stationarity of
rewards.Comment: 21st International Conference on Neural Information Processin
DeepWalk: Online Learning of Social Representations
We present DeepWalk, a novel approach for learning latent representations of
vertices in a network. These latent representations encode social relations in
a continuous vector space, which is easily exploited by statistical models.
DeepWalk generalizes recent advancements in language modeling and unsupervised
feature learning (or deep learning) from sequences of words to graphs. DeepWalk
uses local information obtained from truncated random walks to learn latent
representations by treating walks as the equivalent of sentences. We
demonstrate DeepWalk's latent representations on several multi-label network
classification tasks for social networks such as BlogCatalog, Flickr, and
YouTube. Our results show that DeepWalk outperforms challenging baselines which
are allowed a global view of the network, especially in the presence of missing
information. DeepWalk's representations can provide scores up to 10%
higher than competing methods when labeled data is sparse. In some experiments,
DeepWalk's representations are able to outperform all baseline methods while
using 60% less training data. DeepWalk is also scalable. It is an online
learning algorithm which builds useful incremental results, and is trivially
parallelizable. These qualities make it suitable for a broad class of real
world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees
The rising volume of datasets has made training machine learning (ML) models
a major computational cost in the enterprise. Given the iterative nature of
model and parameter tuning, many analysts use a small sample of their entire
data during their initial stage of analysis to make quick decisions (e.g., what
features or hyperparameters to use) and use the entire dataset only in later
stages (i.e., when they have converged to a specific model). This sampling,
however, is performed in an ad-hoc fashion. Most practitioners cannot precisely
capture the effect of sampling on the quality of their model, and eventually on
their decision-making process during the tuning phase. Moreover, without
systematic support for sampling operators, many optimizations and reuse
opportunities are lost.
In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML
training. BlinkML allows users to make error-computation tradeoffs: instead of
training a model on their full data (i.e., full model), BlinkML can quickly
train an approximate model with quality guarantees using a sample. The quality
guarantees ensure that, with high probability, the approximate model makes the
same predictions as the full model. BlinkML currently supports any ML model
that relies on maximum likelihood estimation (MLE), which includes Generalized
Linear Models (e.g., linear regression, logistic regression, max entropy
classifier, Poisson regression) as well as PPCA (Probabilistic Principal
Component Analysis). Our experiments show that BlinkML can speed up the
training of large-scale ML tasks by 6.26x-629x while guaranteeing the same
predictions, with 95% probability, as the full model.Comment: 22 pages, SIGMOD 201
Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm
As intrinsic structures, like the number of clusters, is, for real data, a major issue of the clustering problem, we propose, in this paper, CHyGA (Clustering Hybrid Genetic Algorithm) an hybrid genetic algorithm for clustering. CHyGA treats the clustering problem as an optimization problem and searches for an optimal number of clusters characterized by an optimal distribution of instances into the clusters. CHyGA introduces a new representation of solutions and uses dedicated operators, such as one iteration of K-means as a mutation operator. In order to deal with nominal data, we propose a new definition of the cluster center concept and demonstrate its properties. Experimental results on classical benchmarks are given
Statistically Significant Detection of Linguistic Change
We propose a new computational approach for tracking and detecting
statistically significant linguistic shifts in the meaning and usage of words.
Such linguistic shifts are especially prevalent on the Internet, where the
rapid exchange of ideas can quickly change a word's meaning. Our meta-analysis
approach constructs property time series of word usage, and then uses
statistically sound change point detection algorithms to identify significant
linguistic shifts.
We consider and analyze three approaches of increasing complexity to generate
such linguistic property time series, the culmination of which uses
distributional characteristics inferred from word co-occurrences. Using
recently proposed deep neural language models, we first train vector
representations of words for each time period. Second, we warp the vector
spaces into one unified coordinate system. Finally, we construct a
distance-based distributional time series for each word to track it's
linguistic displacement over time.
We demonstrate that our approach is scalable by tracking linguistic change
across years of micro-blogging using Twitter, a decade of product reviews using
a corpus of movie reviews from Amazon, and a century of written books using the
Google Book-ngrams. Our analysis reveals interesting patterns of language usage
change commensurate with each medium.Comment: 11 pages, 7 figures, 4 table
The need for open source software in machine learning
Open source tools have recently reached a level of maturity which makes them suitable for building
large-scale real-world systems. At the same time, the field of machine learning has developed a
large body of powerful learning algorithms for diverse applications. However, the true potential of
these methods is not used, since existing implementations are not openly shared, resulting in software
with low usability, and weak interoperability. We argue that this situation can be significantly
improved by increasing incentives for researchers to publish their software under an open source
model. Additionally, we outline the problems authors are faced with when trying to publish algorithmic
implementations of machine learning methods. We believe that a resource of peer reviewed
software accompanied by short articles would be highly valuable to both the machine learning and
the general scientific community
Regularizing Portfolio Optimization
The optimization of large portfolios displays an inherent instability to
estimation error. This poses a fundamental problem, because solutions that are
not stable under sample fluctuations may look optimal for a given sample, but
are, in effect, very far from optimal with respect to the average risk. In this
paper, we approach the problem from the point of view of statistical learning
theory. The occurrence of the instability is intimately related to over-fitting
which can be avoided using known regularization methods. We show how
regularized portfolio optimization with the expected shortfall as a risk
measure is related to support vector regression. The budget constraint dictates
a modification. We present the resulting optimization problem and discuss the
solution. The L2 norm of the weight vector is used as a regularizer, which
corresponds to a diversification "pressure". This means that diversification,
besides counteracting downward fluctuations in some assets by upward
fluctuations in others, is also crucial because it improves the stability of
the solution. The approach we provide here allows for the simultaneous
treatment of optimization and diversification in one framework that enables the
investor to trade-off between the two, depending on the size of the available
data set
- …