50,406 research outputs found
Graph Laplacian for Semi-Supervised Learning
Semi-supervised learning is highly useful in common scenarios where labeled
data is scarce but unlabeled data is abundant. The graph (or nonlocal)
Laplacian is a fundamental smoothing operator for solving various learning
tasks. For unsupervised clustering, a spectral embedding is often used, based
on graph-Laplacian eigenvectors. For semi-supervised problems, the common
approach is to solve a constrained optimization problem, regularized by a
Dirichlet energy, based on the graph-Laplacian. However, as supervision
decreases, Dirichlet optimization becomes suboptimal. We therefore would like
to obtain a smooth transition between unsupervised clustering and
low-supervised graph-based classification. In this paper, we propose a new type
of graph-Laplacian which is adapted for Semi-Supervised Learning (SSL)
problems. It is based on both density and contrastive measures and allows the
encoding of the labeled data directly in the operator. Thus, we can perform
successfully semi-supervised learning using spectral clustering. The benefits
of our approach are illustrated for several SSL problems.Comment: 12 pages, 6 figure
Evolutionary constraints on the complexity of genetic regulatory networks allow predictions of the total number of genetic interactions
Genetic regulatory networks (GRNs) have been widely studied, yet there is a
lack of understanding with regards to the final size and properties of these
networks, mainly due to no network currently being complete. In this study, we
analyzed the distribution of GRN structural properties across a large set of
distinct prokaryotic organisms and found a set of constrained characteristics
such as network density and number of regulators. Our results allowed us to
estimate the number of interactions that complete networks would have, a
valuable insight that could aid in the daunting task of network curation,
prediction, and validation. Using state-of-the-art statistical approaches, we
also provided new evidence to settle a previously stated controversy that
raised the possibility of complete biological networks being random and
therefore attributing the observed scale-free properties to an artifact
emerging from the sampling process during network discovery. Furthermore, we
identified a set of properties that enabled us to assess the consistency of the
connectivity distribution for various GRNs against different alternative
statistical distributions. Our results favor the hypothesis that highly
connected nodes (hubs) are not a consequence of network incompleteness.
Finally, an interaction coverage computed for the GRNs as a proxy for
completeness revealed that high-throughput based reconstructions of GRNs could
yield biased networks with a low average clustering coefficient, showing that
classical targeted discovery of interactions is still needed.Comment: 28 pages, 5 figures, 12 pages supplementary informatio
Clustering and Community Detection with Imbalanced Clusters
Spectral clustering methods which are frequently used in clustering and
community detection applications are sensitive to the specific graph
constructions particularly when imbalanced clusters are present. We show that
ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced cluster sizes since they tend to emphasize cut sizes over cut
values. We propose a graph partitioning problem that seeks minimum cut
partitions under minimum size constraints on partitions to deal with imbalanced
cluster sizes. Our approach parameterizes a family of graphs by adaptively
modulating node degrees on a fixed node set, yielding a set of parameter
dependent cuts reflecting varying levels of imbalance. The solution to our
problem is then obtained by optimizing over these parameters. We present
rigorous limit cut analysis results to justify our approach and demonstrate the
superiority of our method through experiments on synthetic and real datasets
for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted
to IEEE TSIP
Spectral Clustering with Imbalanced Data
Spectral clustering is sensitive to how graphs are constructed from data
particularly when proximal and imbalanced clusters are present. We show that
Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced data since they tend to emphasize cut sizes over cut values. We
propose a graph partitioning problem that seeks minimum cut partitions under
minimum size constraints on partitions to deal with imbalanced data. Our
approach parameterizes a family of graphs, by adaptively modulating node
degrees on a fixed node set, to yield a set of parameter dependent cuts
reflecting varying levels of imbalance. The solution to our problem is then
obtained by optimizing over these parameters. We present rigorous limit cut
analysis results to justify our approach. We demonstrate the superiority of our
method through unsupervised and semi-supervised experiments on synthetic and
real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1302.513
Modularity-Based Clustering for Network-Constrained Trajectories
We present a novel clustering approach for moving object trajectories that
are constrained by an underlying road network. The approach builds a similarity
graph based on these trajectories then uses modularity-optimization hiearchical
graph clustering to regroup trajectories with similar profiles. Our
experimental study shows the superiority of the proposed approach over classic
hierarchical clustering and gives a brief insight to visualization of the
clustering results.Comment: 20-th European Symposium on Artificial Neural Networks, Computational
Intelligence and Machine Learning (ESANN 2012), Bruges : Belgium (2012
Co-Clustering Network-Constrained Trajectory Data
Recently, clustering moving object trajectories kept gaining interest from
both the data mining and machine learning communities. This problem, however,
was studied mainly and extensively in the setting where moving objects can move
freely on the euclidean space. In this paper, we study the problem of
clustering trajectories of vehicles whose movement is restricted by the
underlying road network. We model relations between these trajectories and road
segments as a bipartite graph and we try to cluster its vertices. We
demonstrate our approaches on synthetic data and show how it could be useful in
inferring knowledge about the flow dynamics and the behavior of the drivers
using the road network
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
- …