920 research outputs found

    A systematic density-based clustering method using anchor points

    Get PDF
    National Research Foundation (NRF) Singapore under its AI Singapore Programme; Singapore Ministry of Health under its National Innovation Challenge on Active and Confident Agein

    Mining topological structure in graphs through forest representations

    Get PDF
    We consider the problem of inferring simplified topological substructures—which we term backbones—in metric and non-metric graphs. Intuitively, these are subgraphs with ‘few’ nodes, multifurcations, and cycles, that model the topology of the original graph well. We present a multistep procedure for inferring these backbones. First, we encode local (geometric) information of each vertex in the original graph by means of the boundary coefficient (BC) to identify ‘core’ nodes in the graph. Next, we construct a forest representation of the graph, termed an f-pine, that connects every node of the graph to a local ‘core’ node. The final backbone is then inferred from the f-pine through CLOF (Constrained Leaves Optimal subForest), a novel graph optimization problem we introduce in this paper. On a theoretical level, we show that CLOF is NP-hard for general graphs. However, we prove that CLOF can be efficiently solved for forest graphs, a surprising fact given that CLOF induces a nontrivial monotone submodular set function maximization problem on tree graphs. This result is the basis of our method for mining backbones in graphs through forest representation. We qualitatively and quantitatively confirm the applicability, effectiveness, and scalability of our method for discovering backbones in a variety of graph-structured data, such as social networks, earthquake locations scattered across the Earth, and high-dimensional cell trajectory dat

    Topological inference in graphs and images

    Get PDF

    Networking - A Statistical Physics Perspective

    Get PDF
    Efficient networking has a substantial economic and societal impact in a broad range of areas including transportation systems, wired and wireless communications and a range of Internet applications. As transportation and communication networks become increasingly more complex, the ever increasing demand for congestion control, higher traffic capacity, quality of service, robustness and reduced energy consumption require new tools and methods to meet these conflicting requirements. The new methodology should serve for gaining better understanding of the properties of networking systems at the macroscopic level, as well as for the development of new principled optimization and management algorithms at the microscopic level. Methods of statistical physics seem best placed to provide new approaches as they have been developed specifically to deal with non-linear large scale systems. This paper aims at presenting an overview of tools and methods that have been developed within the statistical physics community and that can be readily applied to address the emerging problems in networking. These include diffusion processes, methods from disordered systems and polymer physics, probabilistic inference, which have direct relevance to network routing, file and frequency distribution, the exploration of network structures and vulnerability, and various other practical networking applications.Comment: (Review article) 71 pages, 14 figure

    Energy-Based Clustering: Fast and Robust Clustering of Data with Known Likelihood Functions

    Full text link
    Clustering has become an indispensable tool in the presence of increasingly large and complex data sets. Most clustering algorithms depend, either explicitly or implicitly, on the sampled density. However, estimated densities are fragile due to the curse of dimensionality and finite sampling effects, for instance in molecular dynamics simulations. To avoid the dependence on estimated densities, an energy-based clustering (EBC) algorithm based on the Metropolis acceptance criterion is developed in this work. In the proposed formulation, EBC can be considered a generalization of spectral clustering in the limit of large temperatures. Taking the potential energy of a sample explicitly into account alleviates requirements regarding the distribution of the data. In addition, it permits the subsampling of densely sampled regions, which can result in significant speed-ups and sublinear scaling. The algorithm is validated on a range of test systems including molecular dynamics trajectories of alanine dipeptide and the Trp-cage miniprotein. Our results show that including information about the potential-energy surface can largely decouple clustering from the sampling density

    A GDP-driven model for the binary and weighted structure of the International Trade Network

    Get PDF
    Recent events such as the global financial crisis have renewed the interest in the topic of economic networks. One of the main channels of shock propagation among countries is the International Trade Network (ITN). Two important models for the ITN structure, the classical gravity model of trade (more popular among economists) and the fitness model (more popular among networks scientists), are both limited to the characterization of only one representation of the ITN. The gravity model satisfactorily predicts the volume of trade between connected countries, but cannot reproduce the observed missing links (i.e. the topology). On the other hand, the fitness model can successfully replicate the topology of the ITN, but cannot predict the volumes. This paper tries to make an important step forward in the unification of those two frameworks, by proposing a new GDP-driven model which can simultaneously reproduce the binary and the weighted properties of the ITN. Specifically, we adopt a maximum-entropy approach where both the degree and the strength of each node is preserved. We then identify strong nonlinear relationships between the GDP and the parameters of the model. This ultimately results in a weighted generalization of the fitness model of trade, where the GDP plays the role of a `macroeconomic fitness' shaping the binary and the weighted structure of the ITN simultaneously. Our model mathematically highlights an important asymmetry in the role of binary and weighted network properties, namely the fact that binary properties can be inferred without the knowledge of weighted ones, while the opposite is not true

    Describing Images by Semantic Modeling using Attributes and Tags

    Get PDF
    This dissertation addresses the problem of describing images using visual attributes and textual tags, a fundamental task that narrows down the semantic gap between the visual reasoning of humans and machines. Automatic image annotation assigns relevant textual tags to the images. In this dissertation, we propose a query-specific formulation based on Weighted Multi-view Non-negative Matrix Factorization to perform automatic image annotation. Our proposed technique seamlessly adapt to the changes in training data, naturally solves the problem of feature fusion and handles the challenge of the rare tags. Unlike tags, attributes are category-agnostic, hence their combination models an exponential number of semantic labels. Motivated by the fact that most attributes describe local properties, we propose exploiting localization cues, through semantic parsing of human face and body to improve person-related attribute prediction. We also demonstrate that image-level attribute labels can be effectively used as weak supervision for the task of semantic segmentation. Next, we analyze the Selfie images by utilizing tags and attributes. We collect the first large-scale Selfie dataset and annotate it with different attributes covering characteristics such as gender, age, race, facial gestures, and hairstyle. We then study the popularity and sentiments of the selfies given an estimated appearance of various semantic concepts. In brief, we automatically infer what makes a good selfie. Despite its extensive usage, the deep learning literature falls short in understanding the characteristics and behavior of the Batch Normalization. We conclude this dissertation by providing a fresh view, in light of information geometry and Fisher kernels to why the batch normalization works. We propose Mixture Normalization that disentangles modes of variation in the underlying distribution of the layer outputs and confirm that it effectively accelerates training of different batch-normalized architectures including Inception-V3, Densely Connected Networks, and Deep Convolutional Generative Adversarial Networks while achieving better generalization error
    • …
    corecore