9 research outputs found

    Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

    Get PDF
    We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest neighbor or symmetric k-nearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order log n) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest neighbor graph occurs when one attempts to detect the most significant cluster only.Comment: 31 pages, 2 figure

    A Flexible Outlier Detector Based on a Topology Given by Graph Communities

    Get PDF
    Acord transformatiu CRUE-CSICOutlier detection is essential for optimal performance of machine learning methods and statistical predictive models. Their detection is especially determinant in small sample size unbalanced problems, since in such settings outliers become highly influential and significantly bias models. This particular experimental settings are usual in medical applications, like diagnosis of rare pathologies, outcome of experimental personalized treatments or pandemic emergencies. In contrast to population-based methods, neighborhood based local approaches compute an outlier score from the neighbors of each sample, are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. A main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters, like the number of neighbors. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world and synthetic data sets show that our approach outperforms, both, local and global strategies in multi and single view settings

    Automatically Selecting Parameters for Graph-Based Clustering

    Get PDF
    Data streams present a number of challenges, caused by change in stream concepts over time. In this thesis we present a novel method for detection of concept drift within data streams by analysing geometric features of the clustering algorithm, RepStream. Further, we present novel methods for automatically adjusting critical input parameters over time, and generating self-organising nearest-neighbour graphs, improving robustness and decreasing the need to domain-specific knowledge in the face of stream evolution

    Spectrum sharing and management techniques in mobile networks

    Get PDF
    Το φάσμα συχνοτήτων αποδεικνύεται σπάνιο κομμάτι για τους πόρους ενός κινητού δικτύου το οποίο πρέπει να ληφθεί υπόψιν στη σχεδίαση τηλεπικοινωνιακών συστημάτων 5ης γενιάς. Επιπλέον οι πάροχοι κινητών δικτύων θα πρέπει να επαναπροσδιορίσουν επιχειρησιακά μοντέλα τα οποία μέχρι τώρα δεν θεωρούνταν αναγκαία (π.χ., γνωσιακά ραδιοδίκτυα), ή να εξετάσουν την υιοθέτηση νέων μοντέλων που αναδεικνύονται (π.χ., αδειοδοτούμενη από κοινού πρόσβαση) ώστε να καλύψουν τις ολοένα αυξανόμενες ανάγκες για εύρος ζώνης. Ο μερισμός φάσματος θεωρείται αναπόφευκτος για συστήματα 5G και η διατριβή παρέχει λύση για προσαρμοστικό μερισμό φάσματος με πολλαπλά καθεστώτα εξουσιοδότησης, βάσει ενός καινοτόμου αρχιτεκτονικού πλαισίου το οποίο επιτρέπει στα δικτυακά στοιχεία να λαμβάνουν αποφάσεις για απόκτηση φάσματος. Η προτεινόμενη διαδικασία λήψης αποφάσεων είναι μία καινοτόμα τεχνική προσαρμοστικού μερισμού φάσματος βασιζόμενη σε ελεγκτές ασαφούς λογικής που καθορίζονν το καταλληλότερο σχήμα μερισμού φάσματος και σε ενισχυμένη μάθηση που ρυθμίζει τους κανόνες ασαφούς λογικής, στοχεύοντας να βρει τη βέλτιστη πολιτική που πρέπει να ακολουθεί ο πάροχος ώστε να προσφέρει την επιθυμητή ποιότητα υπηρεσιών στους χρήστες, διατηρώντας πόρους (οικονομικούς ή ραδιοπόρους) όπου είναι εφικτό. Η τελευταία συνεισφορά της διατριβής είναι ένας μηχανισμός που εξασφαλίζει δίκαιη πρόσβαση σε φάσμα ανάμεσα σε χρήστες σε σενάρια στα οποία η εκχώρηση άδειας χρήσης φάσματος δεν είναι προαπαιτούμενη.Radio spectrum has loomed out to be a scarce resource that needs to be carefully considered when designing 5G communication systems and Mobile Network Operators (MNOs) will need to revisit business models that were not of their prior interest (e.g. Cognitive Radio) or consider adopting new business models that emerge (e.g. Licensed Shared Access) so as to cover the extended capacity needs. Spectrum sharing is considered unavoidable for 5G systems and this thesis provides a solution for adaptive spectrum sharing under multiple authorization regimes based on a novel architecture framework that enables network elements to proceed in decisions for spectrum acquisition. The decision making process for spectrum acquisition proposed is a novel Adaptive Spectrum Sharing technique that uses Fuzzy Logic controllers to determine the most suitable spectrum sharing option and reinforcement learning to tune the fuzzy logic rules, aiming to find an optimal policy that MNO should follow in order to offer the desirable Quality of Service to its users, while preserving resources (either economical, or radio) when possible. The final contribution of this thesis is a mechanism that ensures fair access to spectrum among the users in scenarios in which conveying spectrum license is not prerequisite

    Clustering with neighborhood graphs

    Get PDF
    Graph clustering methods are defined for general weighted graphs. If data is given in the form of points and distances between them, a neighborhood graph, such as the r-graph or kNN-graphs, is constructed and graph clustering is applied to this graph. We investigate the influence of the type and parameter of the neighborhood graph on the clustering results, when n sample points are drawn independently from a density in Euclidean space. In Chapter 2 we study "cluster identification';: the true clusters are the connected components of density level sets and a cluster is identified if its points are a connected component of the graph. We compare (modifications of) the mutual and the symmetric kNN-graph. They behave differently if the goal is to identify the "most significant'; clusters, whereas there is no difference if the goal is to identify all clusters. We give the range of k for which the clusters are identified in the graphs and derive the optimal choice of k, which, surprisingly, is linear in n. In Chapter 3 we study the convergence of the normalized cut (Ncut) and the ratio cut as n -> for cuts in the kNN- and the r-graph induced by a hyperplane. The limits differ; consequently Ncut on a kNN-graph does something systematically different than Ncut on an r-graph! This can be experimentally observed on toy and real data sets. Therefore, graph clustering criteria cannot be studied independently of the type of graph to which they are applied.Graphclustering ist für gewichtete Graphen definiert. Liegen Daten jedoch in Form von Punkten und Abständen zwischen ihnen vor, wird zuerst ein Nachbarschaftsgraph wie der r-Graph oder kNN-Graphen konstruiert, auf den dann Graphclustering angewandt wird. In dieser Arbeit wird der Einfluss des Nachbarschaftsgraphen auf die Clusteringergebnisse untersucht, wenn n Punkte unabhängig voneinander von einer Dichte im euklidischen Raum gezogen werden. In Kapitel 2 wird das Problem der "Clusteridentifizierung'; betrachtet: die Cluster sind die Zusammenhangskomponenten einer Dichteniveaumenge. Ein Cluster wird identifiziert, wenn seine Punkte eine Zusammenhangskomponente des Graphen bilden. Modifikationen verschiedener kNN-Graphen werden verglichen. Sollen nur die "signifikantesten'; Cluster gefunden werden, unterscheidet sich ihr erhalten, nicht jedoch für die Identifizierung aller Cluster. Es wird gezeigt, für welche k die Cluster identifiziert werden und dass die optimale Wahl von k linear in n ist. In Kapitel 3 wird die Konvergenz der Kriterien "normalized cut'; (Ncut) und "ratio cut'; für Schnitte im kNN- und r-Graphen, die von einer Hyperebene induziert werden, gezeigt. Die Grenzwerte unterscheiden sich. Folglich bewirkt Ncut auf einem kNNGraphen etwas anderes als Ncut auf einem r-Graphen. Dieser Effekt kann experimentell beobachtet werden. Daraus folgt, dass Graphclusteringkriterien nicht getrennt vom Graphtyp betrachtet werden können

    Cluster Identification in Nearest-Neighbor Graphs

    No full text
    Assume we are given a sample of points from some underlying distribution which contains several distinct clusters. Our goal is to construct a neighborhood graph on the sample points such that clusters are ``identified‘‘: that is, the subgraph induced by points from the same cluster is connected, while subgraphs corresponding to different clusters are not connected to each other. We derive bounds on the probability that cluster identification is successful, and use them to predict ``optimal‘‘ values of k for the mutual and symmetric k-nearest-neighbor graphs. We point out different properties of the mutual and symmetric nearest-neighbor graphs related to the cluster identification problem
    corecore