44,979 research outputs found
Soft ranking in clustering
Due to the diffusion of large-dimensional data sets (e.g., in DNA microarray or document organization and retrieval applications), there is a growing interest in clustering methods based on a proximity matrix. These have the advantage of being based on a data structure whose size only depends on cardinality, not dimensionality. In this paper, we propose a clustering technique based on fuzzy ranks. The use of ranks helps to overcome several issues of large-dimensional data sets, whereas the fuzzy formulation is useful in encoding the information contained in the smallest entries of the proximity matrix. Comparative experiments are presented, using several standard hierarchical clustering techniques as a
reference
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
We have witnessed rapid evolution of deep neural network architecture design
in the past years. These latest progresses greatly facilitate the developments
in various areas such as computer vision and natural language processing.
However, along with the extraordinary performance, these state-of-the-art
models also bring in expensive computational cost. Directly deploying these
models into applications with real-time requirement is still infeasible.
Recently, Hinton etal. have shown that the dark knowledge within a powerful
teacher model can significantly help the training of a smaller and faster
student network. These knowledge are vastly beneficial to improve the
generalization ability of the student model. Inspired by their work, we
introduce a new type of knowledge -- cross sample similarities for model
compression and acceleration. This knowledge can be naturally derived from deep
metric learning model. To transfer them, we bring the "learning to rank"
technique into deep metric learning formulation. We test our proposed DarkRank
method on various metric learning tasks including pedestrian re-identification,
image retrieval and image clustering. The results are quite encouraging. Our
method can improve over the baseline method by a large margin. Moreover, it is
fully compatible with other existing methods. When combined, the performance
can be further boosted
A nonuniform popularity-similarity optimization (nPSO) model to efficiently generate realistic complex networks with communities
The hidden metric space behind complex network topologies is a fervid topic
in current network science and the hyperbolic space is one of the most studied,
because it seems associated to the structural organization of many real complex
systems. The Popularity-Similarity-Optimization (PSO) model simulates how
random geometric graphs grow in the hyperbolic space, reproducing strong
clustering and scale-free degree distribution, however it misses to reproduce
an important feature of real complex networks, which is the community
organization. The Geometrical-Preferential-Attachment (GPA) model was recently
developed to confer to the PSO also a community structure, which is obtained by
forcing different angular regions of the hyperbolic disk to have variable level
of attractiveness. However, the number and size of the communities cannot be
explicitly controlled in the GPA, which is a clear limitation for real
applications. Here, we introduce the nonuniform PSO (nPSO) model that,
differently from GPA, forces heterogeneous angular node attractiveness by
sampling the angular coordinates from a tailored nonuniform probability
distribution, for instance a mixture of Gaussians. The nPSO differs from GPA in
other three aspects: it allows to explicitly fix the number and size of
communities; it allows to tune their mixing property through the network
temperature; it is efficient to generate networks with high clustering. After
several tests we propose the nPSO as a valid and efficient model to generate
networks with communities in the hyperbolic space, which can be adopted as a
realistic benchmark for different tasks such as community detection and link
prediction
Learning Task Relatedness in Multi-Task Learning for Images in Context
Multimedia applications often require concurrent solutions to multiple tasks.
These tasks hold clues to each-others solutions, however as these relations can
be complex this remains a rarely utilized property. When task relations are
explicitly defined based on domain knowledge multi-task learning (MTL) offers
such concurrent solutions, while exploiting relatedness between multiple tasks
performed over the same dataset. In most cases however, this relatedness is not
explicitly defined and the domain expert knowledge that defines it is not
available. To address this issue, we introduce Selective Sharing, a method that
learns the inter-task relatedness from secondary latent features while the
model trains. Using this insight, we can automatically group tasks and allow
them to share knowledge in a mutually beneficial way. We support our method
with experiments on 5 datasets in classification, regression, and ranking tasks
and compare to strong baselines and state-of-the-art approaches showing a
consistent improvement in terms of accuracy and parameter counts. In addition,
we perform an activation region analysis showing how Selective Sharing affects
the learned representation.Comment: To appear in ICMR 2019 (Oral + Lightning Talk + Poster
Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval
Semantic similarity based retrieval is playing an increasingly important role
in many IR systems such as modern web search, question-answering, similar
document retrieval etc. Improvements in retrieval of semantically similar
content are very significant to applications like Quora, Stack Overflow, Siri
etc. We propose a novel unsupervised model for semantic similarity based
content retrieval, where we construct semantic flow graphs for each query, and
introduce the concept of "soft seeding" in graph based semi-supervised learning
(SSL) to convert this into an unsupervised model.
We demonstrate the effectiveness of our model on an equivalent question
retrieval problem on the Stack Exchange QA dataset, where our unsupervised
approach significantly outperforms the state-of-the-art unsupervised models,
and produces comparable results to the best supervised models. Our research
provides a method to tackle semantic similarity based retrieval without any
training data, and allows seamless extension to different domain QA
communities, as well as to other semantic equivalence tasks.Comment: Published in Proceedings of the 2017 ACM Conference on Information
and Knowledge Management (CIKM '17
- …