920 research outputs found
A systematic density-based clustering method using anchor points
National Research Foundation (NRF) Singapore under its AI Singapore Programme; Singapore Ministry of Health under its National Innovation Challenge on Active and Confident Agein
Mining topological structure in graphs through forest representations
We consider the problem of inferring simplified topological substructures—which we term backbones—in metric and non-metric graphs. Intuitively, these are subgraphs with ‘few’ nodes, multifurcations, and cycles, that model the topology of the original graph well. We present a multistep procedure for inferring these backbones. First, we encode local (geometric) information of each vertex in the original graph by means of the boundary coefficient (BC) to identify ‘core’ nodes in the graph. Next, we construct a forest representation of the graph, termed an f-pine, that connects every node of the graph to a local ‘core’ node. The final backbone is then inferred from the f-pine through CLOF (Constrained Leaves Optimal subForest), a novel graph optimization problem we introduce in this paper. On a theoretical level, we show that CLOF is NP-hard for general graphs. However, we prove that CLOF can be efficiently solved for forest graphs, a surprising fact given that CLOF induces a nontrivial monotone submodular set function maximization problem on tree graphs. This result is the basis of our method for mining backbones in graphs through forest representation. We qualitatively and quantitatively confirm the applicability, effectiveness, and scalability of our method for discovering backbones in a variety of graph-structured data, such as social networks, earthquake locations scattered across the Earth, and high-dimensional cell trajectory dat
Networking - A Statistical Physics Perspective
Efficient networking has a substantial economic and societal impact in a
broad range of areas including transportation systems, wired and wireless
communications and a range of Internet applications. As transportation and
communication networks become increasingly more complex, the ever increasing
demand for congestion control, higher traffic capacity, quality of service,
robustness and reduced energy consumption require new tools and methods to meet
these conflicting requirements. The new methodology should serve for gaining
better understanding of the properties of networking systems at the macroscopic
level, as well as for the development of new principled optimization and
management algorithms at the microscopic level. Methods of statistical physics
seem best placed to provide new approaches as they have been developed
specifically to deal with non-linear large scale systems. This paper aims at
presenting an overview of tools and methods that have been developed within the
statistical physics community and that can be readily applied to address the
emerging problems in networking. These include diffusion processes, methods
from disordered systems and polymer physics, probabilistic inference, which
have direct relevance to network routing, file and frequency distribution, the
exploration of network structures and vulnerability, and various other
practical networking applications.Comment: (Review article) 71 pages, 14 figure
Energy-Based Clustering: Fast and Robust Clustering of Data with Known Likelihood Functions
Clustering has become an indispensable tool in the presence of increasingly
large and complex data sets. Most clustering algorithms depend, either
explicitly or implicitly, on the sampled density. However, estimated densities
are fragile due to the curse of dimensionality and finite sampling effects, for
instance in molecular dynamics simulations. To avoid the dependence on
estimated densities, an energy-based clustering (EBC) algorithm based on the
Metropolis acceptance criterion is developed in this work. In the proposed
formulation, EBC can be considered a generalization of spectral clustering in
the limit of large temperatures. Taking the potential energy of a sample
explicitly into account alleviates requirements regarding the distribution of
the data. In addition, it permits the subsampling of densely sampled regions,
which can result in significant speed-ups and sublinear scaling. The algorithm
is validated on a range of test systems including molecular dynamics
trajectories of alanine dipeptide and the Trp-cage miniprotein. Our results
show that including information about the potential-energy surface can largely
decouple clustering from the sampling density
A GDP-driven model for the binary and weighted structure of the International Trade Network
Recent events such as the global financial crisis have renewed the interest
in the topic of economic networks. One of the main channels of shock
propagation among countries is the International Trade Network (ITN). Two
important models for the ITN structure, the classical gravity model of trade
(more popular among economists) and the fitness model (more popular among
networks scientists), are both limited to the characterization of only one
representation of the ITN. The gravity model satisfactorily predicts the volume
of trade between connected countries, but cannot reproduce the observed missing
links (i.e. the topology). On the other hand, the fitness model can
successfully replicate the topology of the ITN, but cannot predict the volumes.
This paper tries to make an important step forward in the unification of those
two frameworks, by proposing a new GDP-driven model which can simultaneously
reproduce the binary and the weighted properties of the ITN. Specifically, we
adopt a maximum-entropy approach where both the degree and the strength of each
node is preserved. We then identify strong nonlinear relationships between the
GDP and the parameters of the model. This ultimately results in a weighted
generalization of the fitness model of trade, where the GDP plays the role of a
`macroeconomic fitness' shaping the binary and the weighted structure of the
ITN simultaneously. Our model mathematically highlights an important asymmetry
in the role of binary and weighted network properties, namely the fact that
binary properties can be inferred without the knowledge of weighted ones, while
the opposite is not true
Describing Images by Semantic Modeling using Attributes and Tags
This dissertation addresses the problem of describing images using visual attributes and textual tags, a fundamental task that narrows down the semantic gap between the visual reasoning of humans and machines. Automatic image annotation assigns relevant textual tags to the images. In this dissertation, we propose a query-specific formulation based on Weighted Multi-view Non-negative Matrix Factorization to perform automatic image annotation. Our proposed technique seamlessly adapt to the changes in training data, naturally solves the problem of feature fusion and handles the challenge of the rare tags. Unlike tags, attributes are category-agnostic, hence their combination models an exponential number of semantic labels. Motivated by the fact that most attributes describe local properties, we propose exploiting localization cues, through semantic parsing of human face and body to improve person-related attribute prediction. We also demonstrate that image-level attribute labels can be effectively used as weak supervision for the task of semantic segmentation. Next, we analyze the Selfie images by utilizing tags and attributes. We collect the first large-scale Selfie dataset and annotate it with different attributes covering characteristics such as gender, age, race, facial gestures, and hairstyle. We then study the popularity and sentiments of the selfies given an estimated appearance of various semantic concepts. In brief, we automatically infer what makes a good selfie. Despite its extensive usage, the deep learning literature falls short in understanding the characteristics and behavior of the Batch Normalization. We conclude this dissertation by providing a fresh view, in light of information geometry and Fisher kernels to why the batch normalization works. We propose Mixture Normalization that disentangles modes of variation in the underlying distribution of the layer outputs and confirm that it effectively accelerates training of different batch-normalized architectures including Inception-V3, Densely Connected Networks, and Deep Convolutional Generative Adversarial Networks while achieving better generalization error
- …