104 research outputs found
Relative Pairwise Relationship Constrained Non-negative Matrix Factorisation
Non-negative Matrix Factorisation (NMF) has been extensively used in machine
learning and data analytics applications. Most existing variations of NMF only
consider how each row/column vector of factorised matrices should be shaped,
and ignore the relationship among pairwise rows or columns. In many cases, such
pairwise relationship enables better factorisation, for example, image
clustering and recommender systems. In this paper, we propose an algorithm
named, Relative Pairwise Relationship constrained Non-negative Matrix
Factorisation (RPR-NMF), which places constraints over relative pairwise
distances amongst features by imposing penalties in a triplet form. Two
distance measures, squared Euclidean distance and Symmetric divergence, are
used, and exponential and hinge loss penalties are adopted for the two measures
respectively. It is well known that the so-called "multiplicative update rules"
result in a much faster convergence than gradient descend for matrix
factorisation. However, applying such update rules to RPR-NMF and also proving
its convergence is not straightforward. Thus, we use reasonable approximations
to relax the complexity brought by the penalties, which are practically
verified. Experiments on both synthetic datasets and real datasets demonstrate
that our algorithms have advantages on gaining close approximation, satisfying
a high proportion of expected constraints, and achieving superior performance
compared with other algorithms.Comment: 13 pages, 10 figure
Distributionally Robust Optimization
This chapter presents a class of distributionally robust optimization problems in which a decision-maker has to choose an action in an uncertain environment. The decision-maker has a continuous action space and aims to learn her optimal strategy. The true distribution of the uncertainty is unknown to the decision-maker. This chapter provides alternative ways to select a distribution based on empirical observations of the decision-maker. This leads to a distributionally robust optimization problem. Simple algorithms, whose dynamics are inspired from the gradient flows, are proposed to find local optima. The method is extended to a class of optimization problems with orthogonal constraints and coupled constraints over the simplex set and polytopes. The designed dynamics do not use the projection operator and are able to satisfy both upper- and lower-bound constraints. The convergence rate of the algorithm to generalized evolutionarily stable strategy is derived using a mean regret estimate. Illustrative examples are provided
Initializing Models with Larger Ones
Weight initialization plays an important role in neural network training.
Widely used initialization methods are proposed and evaluated for networks that
are trained from scratch. However, the growing number of pretrained models now
offers new opportunities for tackling this classical problem of weight
initialization. In this work, we introduce weight selection, a method for
initializing smaller models by selecting a subset of weights from a pretrained
larger model. This enables the transfer of knowledge from pretrained weights to
smaller models. Our experiments demonstrate that weight selection can
significantly enhance the performance of small models and reduce their training
time. Notably, it can also be used together with knowledge distillation. Weight
selection offers a new approach to leverage the power of pretrained models in
resource-constrained settings, and we hope it can be a useful tool for training
small models in the large-model era. Code is available at
https://github.com/OscarXZQ/weight-selection
- …