19 research outputs found

    Decision Manifolds: Classification Inspired by Self-Organization

    Get PDF
    We present a classifier algorithm that approximates the decision surface of labeled data by a patchwork of separating hyperplanes. The hyperplanes are arranged in a way inspired by how Self-Organizing Maps are trained. We take advantage of the fact that the boundaries can often be approximated by linear ones connected by a low-dimensional nonlinear manifold. The resulting classifier allows for a voting scheme that averages over the classifiction results of neighboring hyperplanes. Our algorithm is computationally efficient both in terms of training and classification. Further, we present a model selection framework for estimation of the paratmeters of the classification boundary, and show results for artificial and real-world data sets

    Decision Manifolds: Classification Inspired by Self-Organization

    Get PDF
    We present a classifier algorithm that approximates the decision surface of labeled data by a patchwork of separating hyperplanes. The hyperplanes are arranged in a way inspired by how Self-Organizing Maps are trained. We take advantage of the fact that the boundaries can often be approximated by linear ones connected by a low-dimensional nonlinear manifold. The resulting classifier allows for a voting scheme that averages over the classifiction results of neighboring hyperplanes. Our algorithm is computationally efficient both in terms of training and classification. Further, we present a model selection framework for estimation of the paratmeters of the classification boundary, and show results for artificial and real-world data sets

    Advanced data exploration methods based on Self-Organizing Maps

    No full text
    Self-Organizing Maps (SOMs) sind ein wichtiges Data Mining Verfahren um Informationen aus großen Datenmengen herauszufiltern. In dieser Arbeit werden drei auf SOMs aufbauende Methoden vorgestellt, die beim Verständnis solcher großer Datenmengen helfen sollen. Zwei dieser Methoden sind Visualisierungsverfahren für SOMs, die dritte ist eine vom SOM Trainingsalgorithmus inspirierte Klassifizierungsmethode für Zweiklassenprobleme. Die erste der vorgestellten Methoden zeigt den Zusammenhang zwischen dem Datenset, auf dem eine SOM trainiert worden ist, und den Codebookvektoren, aus denen diese SOM besteht. Ausgehend von einem Graphen, der den gegenseitigen Abstand zwischen Datenvektoren darstellt, werden Linien auf einer SOM Visualisierung gezeichnet. Dies zeigt die Dichte einzelner Bereiche der Karte, durch den projektionsbedingten Dimensionsverlust entstandene Topologieverletzungen und die Positionen von Ausreißern. Die zweite Methode ist ein Visualisierungsverfahren, das die Clusterstruktur einer SOM in verschiedenen Detailliertheitsgraden zeigt. Ein Parameter dient zur Adjustierung der gewünschten Granularität der dargestellten Information. Zur Darstellung der Ergebnisse wird eine Vektorfeldrepräsentation gewählt und eine Metapher für Spezialisten mit ingenieurswissenschaftlichem Hintergrund erzeugt. Diese Methode wird dahingehend erweitert, Gruppen von Variablen gegenüberstellen zu können und somit den Einfluss einzelner Dimensionen auf die Clusterstruktur festzustellen. Die dritte Methode ist ein Machine Learning Verfahren für binäre Klassifikationsprobleme. Es besteht aus einem Ensemble linearer Klassifikatoren, die jeweils einen Bereich des Eingaberaums abdecken. Der Trainingsalgorithmus, der diese lokalen Klassifikatoren platziert, ist vom SOM Algorithmus abgeleitet. Er baut auf dem von SOMs bekannten Prinzip auf, dass in einer vordefinierten Topologiestruktur benachbarte Einheiten einander beeinflussen. In dieser Dissertation wird der theoretische Hintergrund dieser Methoden beschrieben. Empirische Evaluierungen werden anhand einer Reihe künstlicher Datensets sowie Benchmark- und Real-World-Datensets durchgeführt. Weiters wird der Nutzen der Methoden aufzeigt, sowie deren Stärken und Schwächen analysiert. Besonderer Wert ist auf die Erstellung aussagekräftiger, die spezifischen Eigenschaften überwachter und unüberwachter Lernverfahren adressierender Datensets gelegt worden.Self-Organizing Maps are an important data mining method for extracting information from a data set. In this thesis, three techniques that are based on SOMs are introduced for helping to understand large amounts of data. Two of them are visualization techniques for SOMs, while the third is a classification method for two-class problems inspired by the SOM training algorithm. The first of the proposed methods is based on putting the data set a SOM has been trained with in relation with the codebook vectors that define this SOM. Starting from a graph that reflects the mutual distance between data vectors, a set of lines is plotted on top of the output space visualization of the SOM. This shows the density of the areas of the map, violations of the topology due to the projection-induced dimensionality loss, and the location of outliers. The second contribution is a visualization technique that shows the clustering structure of a SOM on various levels of detail. A parameter is provided to adjust the desired granularity of information that is to be shown. For displaying the results, a vector field representation has been chosen in order to provide a metaphor that appeals to specialists with engineering backgrounds. This method is extended to a setting that contrasts groups of contributing variables in order to single out their influence on the clustering structure. The third contribution is a machine learning method for binary classification problems. This technique consists of an ensemble of linear classifiers that each cover a portion of the input space. The training algorithm that actually places these local classifiers is influenced by the SOM algorithm. It exploits the SOM principle of aligning nearby units according to a super-imposed topology structure. The theoretical background for these methods is described in this thesis. Empirical evaluation on a series on artificial, benchmark, and real-world data sets show their applicability, and their strengths and weaknesses are discussed. Much effort has been dedicated at designing meaningful artificial data sets that address specific abilities of supervised and unsupervised learning methods.Georg PölzlbauerZsfassung in dt. SpracheWien, Techn. Univ., Diss., 2008OeBB(VLID)159483

    Survey and Comparison of Quality Measures for Self-Organizing Maps

    No full text
    Abstract. Self-Organizing Maps have a wide range of beneficial properties for data mining, like vector quantization and projection. Several measures exist that quantify the quality of either of these properties. The scope of this work is to describe and compare some of the most well-known measures. This is done by conducting a series of experiments for different map topologies with several well-known data sets. The measures are examined whether they are suited to determine hyperparameters like the optimal map size, how well the measure itself is suited to compare different maps, and if they allow comparison to other algorithms similar to the SOM (e.g. Sammons Mapping).
    corecore