6,376 research outputs found

    Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

    Get PDF
    We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest neighbor or symmetric k-nearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order log n) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest neighbor graph occurs when one attempts to detect the most significant cluster only.Comment: 31 pages, 2 figure

    Recurrence-based time series analysis by means of complex network methods

    Full text link
    Complex networks are an important paradigm of modern complex systems sciences which allows quantitatively assessing the structural properties of systems composed of different interacting entities. During the last years, intensive efforts have been spent on applying network-based concepts also for the analysis of dynamically relevant higher-order statistical properties of time series. Notably, many corresponding approaches are closely related with the concept of recurrence in phase space. In this paper, we review recent methodological advances in time series analysis based on complex networks, with a special emphasis on methods founded on recurrence plots. The potentials and limitations of the individual methods are discussed and illustrated for paradigmatic examples of dynamical systems as well as for real-world time series. Complex network measures are shown to provide information about structural features of dynamical systems that are complementary to those characterized by other methods of time series analysis and, hence, substantially enrich the knowledge gathered from other existing (linear as well as nonlinear) approaches.Comment: To be published in International Journal of Bifurcation and Chaos (2011

    Algorithms for Stable Matching and Clustering in a Grid

    Full text link
    We study a discrete version of a geometric stable marriage problem originally proposed in a continuous setting by Hoffman, Holroyd, and Peres, in which points in the plane are stably matched to cluster centers, as prioritized by their distances, so that each cluster center is apportioned a set of points of equal area. We show that, for a discretization of the problem to an n×nn\times n grid of pixels with kk centers, the problem can be solved in time O(n2log5n)O(n^2 \log^5 n), and we experiment with two slower but more practical algorithms and a hybrid method that switches from one of these algorithms to the other to gain greater efficiency than either algorithm alone. We also show how to combine geometric stable matchings with a kk-means clustering algorithm, so as to provide a geometric political-districting algorithm that views distance in economic terms, and we experiment with weighted versions of stable kk-means in order to improve the connectivity of the resulting clusters.Comment: 23 pages, 12 figures. To appear (without the appendices) at the 18th International Workshop on Combinatorial Image Analysis, June 19-21, 2017, Plovdiv, Bulgari

    Outlier Edge Detection Using Random Graph Generation Models and Applications

    Get PDF
    Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape

    Analysis of a large-scale weighted network of one-to-one human communication

    Get PDF
    We construct a connected network of 3.9 million nodes from mobile phone call records, which can be regarded as a proxy for the underlying human communication network at the societal level. We assign two weights on each edge to reflect the strength of social interaction, which are the aggregate call duration and the cumulative number of calls placed between the individuals over a period of 18 weeks. We present a detailed analysis of this weighted network by examining its degree, strength, and weight distributions, as well as its topological assortativity and weighted assortativity, clustering and weighted clustering, together with correlations between these quantities. We give an account of motif intensity and coherence distributions and compare them to a randomized reference system. We also use the concept of link overlap to measure the number of common neighbors any two adjacent nodes have, which serves as a useful local measure for identifying the interconnectedness of communities. We report a positive correlation between the overlap and weight of a link, thus providing strong quantitative evidence for the weak ties hypothesis, a central concept in social network analysis. The percolation properties of the network are found to depend on the type and order of removed links, and they can help understand how the local structure of the network manifests itself at the global level. We hope that our results will contribute to modeling weighted large-scale social networks, and believe that the systematic approach followed here can be adopted to study other weighted networks.Comment: 25 pages, 17 figures, 2 table

    One-class classifiers based on entropic spanning graphs

    Get PDF
    One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the α\alpha-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad
    corecore