810 research outputs found
An Optimization Framework for Generalized Relevance Learning Vector Quantization with Application to Z-Wave Device Fingerprinting
Z-Wave is low-power, low-cost Wireless Personal Area Network (WPAN) technology supporting Critical Infrastructure (CI) systems that are interconnected by government-to-internet pathways. Given that Z-wave is a relatively unsecure technology, Radio Frequency Distinct Native Attribute (RF-DNA) Fingerprinting is considered here to augment security by exploiting statistical features from selected signal responses. Related RF-DNA efforts include use of Multiple Discriminant Analysis (MDA) and Generalized Relevance Learning Vector Quantization-Improved (GRLVQI) classifiers, with GRLVQI outperforming MDA using empirically determined parameters. GRLVQI is optimized here for Z-Wave using a full factorial experiment with spreadsheet search and response surface methods. Two optimization measures are developed for assessing Z-Wave discrimination: 1) Relative Accuracy Percentage (RAP) for device classification, and 2) Mean Area Under the Curve (AUCM) for device identity (ID) verification. Primary benefits of the approach include: 1) generalizability to other wireless device technologies, and 2) improvement in GRLVQI device classification and device ID verification performance
Integration of Auxiliary Data Knowledge in Prototype Based Vector Quantization and Classification Models
This thesis deals with the integration of auxiliary data knowledge into machine learning methods especially prototype based classification models. The problem of classification is diverse and evaluation of the result by using only the accuracy is not adequate in many applications. Therefore, the classification tasks are analyzed more deeply. Possibilities to extend prototype based methods to integrate extra knowledge about the data or the classification goal is presented to obtain problem adequate models. One of the proposed extensions is Generalized Learning Vector Quantization for direct optimization of statistical measurements besides the classification accuracy. But also modifying the metric adaptation of the Generalized Learning Vector Quantization for functional data, i. e. data with lateral dependencies in the features, is considered.:Symbols and Abbreviations
1 Introduction
1.1 Motivation and Problem Description . . . . . . . . . . . . . . . . . 1
1.2 Utilized Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Prototype Based Methods 19
2.1 Unsupervised Vector Quantization . . . . . . . . . . . . . . . . . . 22
2.1.1 C-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.2 Self-Organizing Map . . . . . . . . . . . . . . . . . . . . . . 25
2.1.3 Neural Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.4 Common Generalizations . . . . . . . . . . . . . . . . . . . 30
2.2 Supervised Vector Quantization . . . . . . . . . . . . . . . . . . . . 35
2.2.1 The Family of Learning Vector Quantizers - LVQ . . . . . . 36
2.2.2 Generalized Learning Vector Quantization . . . . . . . . . 38
2.3 Semi-Supervised Vector Quantization . . . . . . . . . . . . . . . . 42
2.3.1 Learning Associations by Self-Organization . . . . . . . . . 42
2.3.2 Fuzzy Labeled Self-Organizing Map . . . . . . . . . . . . . 43
2.3.3 Fuzzy Labeled Neural Gas . . . . . . . . . . . . . . . . . . 45
2.4 Dissimilarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.1 Differentiable Kernels in Generalized LVQ . . . . . . . . . 52
2.4.2 Dissimilarity Adaptation for Performance Improvement . 56
3 Deeper Insights into Classification Problems
- From the Perspective of Generalized LVQ- 81
3.1 Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2 The Classification Task . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.3 Evaluation of Classification Results . . . . . . . . . . . . . . . . . . 88
3.4 The Classification Task as an Ill-Posed Problem . . . . . . . . . . . 92
4 Auxiliary Structure Information and Appropriate Dissimilarity
Adaptation in Prototype Based Methods 93
4.1 Supervised Vector Quantization for Functional Data . . . . . . . . 93
4.1.1 Functional Relevance/Matrix LVQ . . . . . . . . . . . . . . 95
4.1.2 Enhancement Generalized Relevance/Matrix LVQ . . . . 109
4.2 Fuzzy Information About the Labels . . . . . . . . . . . . . . . . . 121
4.2.1 Fuzzy Semi-Supervised Self-Organizing Maps . . . . . . . 122
4.2.2 Fuzzy Semi-Supervised Neural Gas . . . . . . . . . . . . . 123
5 Variants of Classification Costs and Class Sensitive Learning 137
5.1 Border Sensitive Learning in Generalized LVQ . . . . . . . . . . . 137
5.1.1 Border Sensitivity by Additive Penalty Function . . . . . . 138
5.1.2 Border Sensitivity by Parameterized Transfer Function . . 139
5.2 Optimizing Different Validation Measures by the Generalized LVQ 147
5.2.1 Attention Based Learning Strategy . . . . . . . . . . . . . . 148
5.2.2 Optimizing Statistical Validation Measurements for
Binary Class Problems in the GLVQ . . . . . . . . . . . . . 155
5.3 Integration of Structural Knowledge about the Labeling in Fuzzy
Supervised Neural Gas . . . . . . . . . . . . . . . . . . . . . . . . . 160
6 Conclusion and Future Work 165
My Publications 168
A Appendix 173
A.1 Stochastic Gradient Descent (SGD) . . . . . . . . . . . . . . . . . . 173
A.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . 175
A.3 Fuzzy Supervised Neural Gas Algorithm Solved by SGD . . . . . 179
Bibliography 182
Acknowledgements 20
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Breaking the waves: asymmetric random periodic features for low-bitrate kernel machines
Many signal processing and machine learning applications are built from
evaluating a kernel on pairs of signals, e.g. to assess the similarity of an
incoming query to a database of known signals. This nonlinear evaluation can be
simplified to a linear inner product of the random Fourier features of those
signals: random projections followed by a periodic map, the complex
exponential. It is known that a simple quantization of those features
(corresponding to replacing the complex exponential by a different periodic map
that takes binary values, which is appealing for their transmission and
storage), distorts the approximated kernel, which may be undesirable in
practice. Our take-home message is that when the features of only one of the
two signals are quantized, the original kernel is recovered without distortion;
its practical interest appears in several cases where the kernel evaluations
are asymmetric by nature, such as a client-server scheme. Concretely, we
introduce the general framework of asymmetric random periodic features, where
the two signals of interest are observed through random periodic features:
random projections followed by a general periodic map, which is allowed to be
different for both signals. We derive the influence of those periodic maps on
the approximated kernel, and prove uniform probabilistic error bounds holding
for all signal pairs from an infinite low-complexity set. Interestingly, our
results allow the periodic maps to be discontinuous, thanks to a new
mathematical tool, i.e. the mean Lipschitz smoothness. We then apply this
generic framework to semi-quantized kernel machines (where only one signal has
quantized features and the other has classical random Fourier features), for
which we show theoretically that the approximated kernel remains unchanged
(with the associated error bound), and confirm the power of the approach with
numerical simulations
Machine learning: statistical physics based theory and smart industry applications
The increasing computational power and the availability of data have made it possible to train ever-bigger artificial neural networks. These so-called deep neural networks have been used for impressive applications, like advanced driver assistance and support in medical diagnoses. However, various vulnerabilities have been revealed and there are many open questions concerning the workings of neural networks. Theoretical analyses are therefore essential for further progress. One current question is: why is it that networks with Rectified Linear Unit (ReLU) activation seemingly perform better than networks with sigmoidal activation?We contribute to the answer to this question by comparing ReLU networks with sigmoidal networks in diverse theoretical learning scenarios. In contrast to analysing specific datasets, we use a theoretical modelling using methods from statistical physics. They give the typical learning behaviour for chosen model scenarios. We analyse both the learning behaviour on a fixed dataset and on a data stream in the presence of a changing task. The emphasis is on the analysis of the networkâs transition to a state wherein specific concepts have been learnt. We find significant benefits of ReLU networks: they exhibit continuous increases of their performance and adapt more quickly to changing tasks.In the second part of the thesis we treat applications of machine learning: we design a quick quality control method for material in a production line and study the relationship with product faults. Furthermore, we introduce a methodology for the interpretable classification of time series data
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
- âŠ