12,461 research outputs found
RankMerging: A supervised learning-to-rank framework to predict links in large social network
Uncovering unknown or missing links in social networks is a difficult task
because of their sparsity and because links may represent different types of
relationships, characterized by different structural patterns. In this paper,
we define a simple yet efficient supervised learning-to-rank framework, called
RankMerging, which aims at combining information provided by various
unsupervised rankings. We illustrate our method on three different kinds of
social networks and show that it substantially improves the performances of
unsupervised metrics of ranking. We also compare it to other combination
strategies based on standard methods. Finally, we explore various aspects of
RankMerging, such as feature selection and parameter estimation and discuss its
area of relevance: the prediction of an adjustable number of links on large
networks.Comment: 43 pages, published in Machine Learning Journa
Unsupervised Graph-based Rank Aggregation for Improved Retrieval
This paper presents a robust and comprehensive graph-based rank aggregation
approach, used to combine results of isolated ranker models in retrieval tasks.
The method follows an unsupervised scheme, which is independent of how the
isolated ranks are formulated. Our approach is able to combine arbitrary
models, defined in terms of different ranking criteria, such as those based on
textual, image or hybrid content representations.
We reformulate the ad-hoc retrieval problem as a document retrieval based on
fusion graphs, which we propose as a new unified representation model capable
of merging multiple ranks and expressing inter-relationships of retrieval
results automatically. By doing so, we claim that the retrieval system can
benefit from learning the manifold structure of datasets, thus leading to more
effective results. Another contribution is that our graph-based aggregation
formulation, unlike existing approaches, allows for encapsulating contextual
information encoded from multiple ranks, which can be directly used for
ranking, without further computations and post-processing steps over the
graphs. Based on the graphs, a novel similarity retrieval score is formulated
using an efficient computation of minimum common subgraphs. Finally, another
benefit over existing approaches is the absence of hyperparameters.
A comprehensive experimental evaluation was conducted considering diverse
well-known public datasets, composed of textual, image, and multimodal
documents. Performed experiments demonstrate that our method reaches top
performance, yielding better effectiveness scores than state-of-the-art
baseline methods and promoting large gains over the rankers being fused, thus
demonstrating the successful capability of the proposal in representing queries
based on a unified graph-based model of rank fusions
ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks
Hash codes are efficient data representations for coping with the ever
growing amounts of data. In this paper, we introduce a random forest semantic
hashing scheme that embeds tiny convolutional neural networks (CNN) into
shallow random forests, with near-optimal information-theoretic code
aggregation among trees. We start with a simple hashing scheme, where random
trees in a forest act as hashing functions by setting `1' for the visited tree
leaf, and `0' for the rest. We show that traditional random forests fail to
generate hashes that preserve the underlying similarity between the trees,
rendering the random forests approach to hashing challenging. To address this,
we propose to first randomly group arriving classes at each tree split node
into two groups, obtaining a significantly simplified two-class classification
problem, which can be handled using a light-weight CNN weak learner. Such
random class grouping scheme enables code uniqueness by enforcing each class to
share its code with different classes in different trees. A non-conventional
low-rank loss is further adopted for the CNN weak learners to encourage code
consistency by minimizing intra-class variations and maximizing inter-class
distance for the two random class groups. Finally, we introduce an
information-theoretic approach for aggregating codes of individual trees into a
single hash code, producing a near-optimal unique hash for each class. The
proposed approach significantly outperforms state-of-the-art hashing methods
for image retrieval tasks on large-scale public datasets, while performing at
the level of other state-of-the-art image classification techniques while
utilizing a more compact and efficient scalable representation. This work
proposes a principled and robust procedure to train and deploy in parallel an
ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201
HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation
Recently, crowdsourcing has emerged as an effective paradigm for
human-powered large scale problem solving in various domains. However, task
requester usually has a limited amount of budget, thus it is desirable to have
a policy to wisely allocate the budget to achieve better quality. In this
paper, we study the principle of information maximization for active sampling
strategies in the framework of HodgeRank, an approach based on Hodge
Decomposition of pairwise ranking data with multiple workers. The principle
exhibits two scenarios of active sampling: Fisher information maximization that
leads to unsupervised sampling based on a sequential maximization of graph
algebraic connectivity without considering labels; and Bayesian information
maximization that selects samples with the largest information gain from prior
to posterior, which gives a supervised sampling involving the labels collected.
Experiments show that the proposed methods boost the sampling efficiency as
compared to traditional sampling schemes and are thus valuable to practical
crowdsourcing experiments.Comment: Accepted by AAAI201
Typical Phone Use Habits: Intense Use Does Not Predict Negative Well-Being
Not all smartphone owners use their device in the same way. In this work, we
uncover broad, latent patterns of mobile phone use behavior. We conducted a
study where, via a dedicated logging app, we collected daily mobile phone
activity data from a sample of 340 participants for a period of four weeks.
Through an unsupervised learning approach and a methodologically rigorous
analysis, we reveal five generic phone use profiles which describe at least 10%
of the participants each: limited use, business use, power use, and
personality- & externally induced problematic use. We provide evidence that
intense mobile phone use alone does not predict negative well-being. Instead,
our approach automatically revealed two groups with tendencies for lower
well-being, which are characterized by nightly phone use sessions.Comment: 10 pages, 6 figures, conference pape
- …