22,780 research outputs found
Temporal Ordered Clustering in Dynamic Networks: Unsupervised and Semi-supervised Learning Algorithms
In temporal ordered clustering, given a single snapshot of a dynamic network
in which nodes arrive at distinct time instants, we aim at partitioning its
nodes into ordered clusters such that for , nodes in cluster arrived
before nodes in cluster , with being a data-driven parameter
and not known upfront. Such a problem is of considerable significance in many
applications ranging from tracking the expansion of fake news to mapping the
spread of information. We first formulate our problem for a general dynamic
graph, and propose an integer programming framework that finds the optimal
clustering, represented as a strict partial order set, achieving the best
precision (i.e., fraction of successfully ordered node pairs) for a fixed
density (i.e., fraction of comparable node pairs). We then develop a sequential
importance procedure and design unsupervised and semi-supervised algorithms to
find temporal ordered clusters that efficiently approximate the optimal
solution. To illustrate the techniques, we apply our methods to the vertex
copying (duplication-divergence) model which exhibits some edge-case challenges
in inferring the clusters as compared to other network models. Finally, we
validate the performance of the proposed algorithms on synthetic and real-world
networks.Comment: 14 pages, 9 figures, and 3 tables. This version is submitted to a
journal. A shorter version of this work is published in the proceedings of
IEEE International Symposium on Information Theory (ISIT), 2020. The first
two authors contributed equall
Learned versus Hand-Designed Feature Representations for 3d Agglomeration
For image recognition and labeling tasks, recent results suggest that machine
learning methods that rely on manually specified feature representations may be
outperformed by methods that automatically derive feature representations based
on the data. Yet for problems that involve analysis of 3d objects, such as mesh
segmentation, shape retrieval, or neuron fragment agglomeration, there remains
a strong reliance on hand-designed feature descriptors. In this paper, we
evaluate a large set of hand-designed 3d feature descriptors alongside features
learned from the raw data using both end-to-end and unsupervised learning
techniques, in the context of agglomeration of 3d neuron fragments. By
combining unsupervised learning techniques with a novel dynamic pooling scheme,
we show how pure learning-based methods are for the first time competitive with
hand-designed 3d shape descriptors. We investigate data augmentation strategies
for dramatically increasing the size of the training set, and show how
combining both learned and hand-designed features leads to the highest
accuracy
Unsupervised Graph-based Rank Aggregation for Improved Retrieval
This paper presents a robust and comprehensive graph-based rank aggregation
approach, used to combine results of isolated ranker models in retrieval tasks.
The method follows an unsupervised scheme, which is independent of how the
isolated ranks are formulated. Our approach is able to combine arbitrary
models, defined in terms of different ranking criteria, such as those based on
textual, image or hybrid content representations.
We reformulate the ad-hoc retrieval problem as a document retrieval based on
fusion graphs, which we propose as a new unified representation model capable
of merging multiple ranks and expressing inter-relationships of retrieval
results automatically. By doing so, we claim that the retrieval system can
benefit from learning the manifold structure of datasets, thus leading to more
effective results. Another contribution is that our graph-based aggregation
formulation, unlike existing approaches, allows for encapsulating contextual
information encoded from multiple ranks, which can be directly used for
ranking, without further computations and post-processing steps over the
graphs. Based on the graphs, a novel similarity retrieval score is formulated
using an efficient computation of minimum common subgraphs. Finally, another
benefit over existing approaches is the absence of hyperparameters.
A comprehensive experimental evaluation was conducted considering diverse
well-known public datasets, composed of textual, image, and multimodal
documents. Performed experiments demonstrate that our method reaches top
performance, yielding better effectiveness scores than state-of-the-art
baseline methods and promoting large gains over the rankers being fused, thus
demonstrating the successful capability of the proposal in representing queries
based on a unified graph-based model of rank fusions
Unsupervised learning of human motion
An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts (a moving human body in our examples) automatically from unlabeled training data is presented. The training data include both useful "foreground" features as well as features that arise from irrelevant background clutter - the correspondence between parts and detected features is unknown. The joint probability density function of the parts is represented by a mixture of decomposable triangulated graphs which allow for fast detection. To learn the model structure as well as model parameters, an EM-like algorithm is developed where the labeling of the data (part assignments) is treated as hidden variables. The unsupervised learning technique is not limited to decomposable triangulated graphs. The efficiency and effectiveness of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences
Robust Temporally Coherent Laplacian Protrusion Segmentation of 3D Articulated Bodies
In motion analysis and understanding it is important to be able to fit a
suitable model or structure to the temporal series of observed data, in order
to describe motion patterns in a compact way, and to discriminate between them.
In an unsupervised context, i.e., no prior model of the moving object(s) is
available, such a structure has to be learned from the data in a bottom-up
fashion. In recent times, volumetric approaches in which the motion is captured
from a number of cameras and a voxel-set representation of the body is built
from the camera views, have gained ground due to attractive features such as
inherent view-invariance and robustness to occlusions. Automatic, unsupervised
segmentation of moving bodies along entire sequences, in a temporally-coherent
and robust way, has the potential to provide a means of constructing a
bottom-up model of the moving body, and track motion cues that may be later
exploited for motion classification. Spectral methods such as locally linear
embedding (LLE) can be useful in this context, as they preserve "protrusions",
i.e., high-curvature regions of the 3D volume, of articulated shapes, while
improving their separation in a lower dimensional space, making them in this
way easier to cluster. In this paper we therefore propose a spectral approach
to unsupervised and temporally-coherent body-protrusion segmentation along time
sequences. Volumetric shapes are clustered in an embedding space, clusters are
propagated in time to ensure coherence, and merged or split to accommodate
changes in the body's topology. Experiments on both synthetic and real
sequences of dense voxel-set data are shown. This supports the ability of the
proposed method to cluster body-parts consistently over time in a totally
unsupervised fashion, its robustness to sampling density and shape quality, and
its potential for bottom-up model constructionComment: 31 pages, 26 figure
Identification of functionally related enzymes by learning-to-rank methods
Enzyme sequences and structures are routinely used in the biological sciences
as queries to search for functionally related enzymes in online databases. To
this end, one usually departs from some notion of similarity, comparing two
enzymes by looking for correspondences in their sequences, structures or
surfaces. For a given query, the search operation results in a ranking of the
enzymes in the database, from very similar to dissimilar enzymes, while
information about the biological function of annotated database enzymes is
ignored.
In this work we show that rankings of that kind can be substantially improved
by applying kernel-based learning algorithms. This approach enables the
detection of statistical dependencies between similarities of the active cleft
and the biological function of annotated enzymes. This is in contrast to
search-based approaches, which do not take annotated training data into
account. Similarity measures based on the active cleft are known to outperform
sequence-based or structure-based measures under certain conditions. We
consider the Enzyme Commission (EC) classification hierarchy for obtaining
annotated enzymes during the training phase. The results of a set of sizeable
experiments indicate a consistent and significant improvement for a set of
similarity measures that exploit information about small cavities in the
surface of enzymes
HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation
Recently, crowdsourcing has emerged as an effective paradigm for
human-powered large scale problem solving in various domains. However, task
requester usually has a limited amount of budget, thus it is desirable to have
a policy to wisely allocate the budget to achieve better quality. In this
paper, we study the principle of information maximization for active sampling
strategies in the framework of HodgeRank, an approach based on Hodge
Decomposition of pairwise ranking data with multiple workers. The principle
exhibits two scenarios of active sampling: Fisher information maximization that
leads to unsupervised sampling based on a sequential maximization of graph
algebraic connectivity without considering labels; and Bayesian information
maximization that selects samples with the largest information gain from prior
to posterior, which gives a supervised sampling involving the labels collected.
Experiments show that the proposed methods boost the sampling efficiency as
compared to traditional sampling schemes and are thus valuable to practical
crowdsourcing experiments.Comment: Accepted by AAAI201
- …