74,667 research outputs found
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Recommended from our members
A Goal-Directed Bayesian Framework for Categorization
Categorization is a fundamental ability for efficient behavioral control. It allows organisms to remember the correct responses to categorical cues and not for every stimulus encountered (hence eluding computational cost or complexity), and to generalize appropriate responses to novel stimuli dependant on category assignment. Assuming the brain performs Bayesian inference, based on a generative model of the external world and future goals, we propose a computational model of categorization in which important properties emerge. These properties comprise the ability to infer latent causes of sensory experience, a hierarchical organization of latent causes, and an explicit inclusion of context and action representations. Crucially, these aspects derive from considering the environmental statistics that are relevant to achieve goals, and from the fundamental Bayesian principle that any generative model should be preferred over alternative models based on an accuracy-complexity trade-off. Our account is a step toward elucidating computational principles of categorization and its role within the Bayesian brain hypothesis
Multilayer Aggregation with Statistical Validation: Application to Investor Networks
Multilayer networks are attracting growing attention in many fields,
including finance. In this paper, we develop a new tractable procedure for
multilayer aggregation based on statistical validation, which we apply to
investor networks. Moreover, we propose two other improvements to their
analysis: transaction bootstrapping and investor categorization. The
aggregation procedure can be used to integrate security-wise and time-wise
information about investor trading networks, but it is not limited to finance.
In fact, it can be used for different applications, such as gene,
transportation, and social networks, were they inferred or observable.
Additionally, in the investor network inference, we use transaction
bootstrapping for better statistical validation. Investor categorization allows
for constant size networks and having more observations for each node, which is
important in the inference especially for less liquid securities. Furthermore,
we observe that the window size used for averaging has a substantial effect on
the number of inferred relationships. We apply this procedure by analyzing a
unique data set of Finnish shareholders during the period 2004-2009. We find
that households in the capital have high centrality in investor networks,
which, under the theory of information channels in investor networks suggests
that they are well-informed investors
A probabilistic model of cross-categorization
Most natural domains can be represented in multiple ways: we can categorize foods in terms of their nutritional content or social role, animals in terms of their taxonomic groupings or their ecological niches, and musical instruments in terms of their taxonomic categories or social uses. Previous approaches to modeling human categorization have largely ignored the problem of cross-categorization, focusing on learning just a single system of categories that explains all of the features. Cross-categorization presents a difficult problem: how can we infer categories without first knowing which features the categories are meant to explain? We present a novel model that suggests that human cross-categorization is a result of joint inference about multiple systems of categories and the features that they explain. We also formalize two commonly proposed alternative explanations for cross-categorization behavior: a features-first and an objects-first approach. The features-first approach suggests that cross-categorization is a consequence of attentional processes, where features are selected by an attentional mechanism first and categories are derived second. The objects-first approach suggests that cross-categorization is a consequence of repeated, sequential attempts to explain features, where categories are derived first, then features that are poorly explained are recategorized. We present two sets of simulations and experiments testing the models’ predictions about human categorization. We find that an approach based on joint inference provides the best fit to human categorization behavior, and we suggest that a full account of human category learning will need to incorporate something akin to these capabilities
Algorithms for item categorization based on ordinal ranking data
We present a new method for identifying the latent categorization of items
based on their rankings. Complimenting a recent work that uses a Dirichlet
prior on preference vectors and variational inference, we show that this
problem can be effectively dealt with using existing community detection
algorithms, with the communities corresponding to item categories. In
particular we convert the bipartite ranking data to a unipartite graph of item
affinities, and apply community detection algorithms. In this context we modify
an existing algorithm - namely the label propagation algorithm to a variant
that uses the distance between the nodes for weighting the label propagation -
to identify the categories. We propose and analyze a synthetic ordinal ranking
model and show its relation to the recently much studied stochastic block
model. We test our algorithms on synthetic data and compare performance with
several popular community detection algorithms. We also test the method on real
data sets of movie categorization from the Movie Lens database. In all of the
cases our algorithm is able to identify the categories for a suitable choice of
tuning parameter.Comment: To appear in IEEE Allerton conference on computing, communications
and control, 201
- …