21 research outputs found

    A Bi-level Nonlinear Eigenvector Algorithm for Wasserstein Discriminant Analysis

    Full text link
    Much like the classical Fisher linear discriminant analysis, Wasserstein discriminant analysis (WDA) is a supervised linear dimensionality reduction method that seeks a projection matrix to maximize the dispersion of different data classes and minimize the dispersion of same data classes. However, in contrast, WDA can account for both global and local inter-connections between data classes using a regularized Wasserstein distance. WDA is formulated as a bi-level nonlinear trace ratio optimization. In this paper, we present a bi-level nonlinear eigenvector (NEPv) algorithm, called WDA-nepv. The inner kernel of WDA-nepv for computing the optimal transport matrix of the regularized Wasserstein distance is formulated as an NEPv, and meanwhile the outer kernel for the trace ratio optimization is also formulated as another NEPv. Consequently, both kernels can be computed efficiently via self-consistent-field iterations and modern solvers for linear eigenvalue problems. Comparing with the existing algorithms for WDA, WDA-nepv is derivative-free and surrogate-model-free. The computational efficiency and applications in classification accuracy of WDA-nepv are demonstrated using synthetic and real-life datasets

    Unsupervised Clustering Pipeline to Obtain Diversified Light Spectra for Subject Studies and Correlation Analyses

    Get PDF
    Featured Application: Selection of most diverse light spectra from a larger set of possible candidates to be used in subject studies or for machine learning to find correlations between photometric and other parameters such as psychological, physiological, or preference-based outcome measures. Abstract: Current subject studies and data-driven approaches in lighting research often use manually selected light spectra, which usually exhibit a large bias due to the applied selection criteria. This paper, therefore, presents a novel approach to minimize this bias by using a data-driven framework for selecting the most diverse candidates from a given larger set of possible light spectra. The spectral information per wavelength is first reduced by applying a convolutional autoencoder. The relevant features are then selected based on Laplacian Scores and transformed to a two-dimensional embedded space for subsequent clustering. The low dimensional embedding, from which the required diversity follows, is done with respect to the locality of the features. In a second step, photometric parameters are considered and a second clustering is performed. As a result of this algorithmic pipeline, the most diverse selection of light spectra complying with a given set of relevant photometric parameters can be extracted and used for further experiments or applications

    Symmetric Subspace Learning for Image Analysis

    Get PDF

    Face recognition using multiple features in different color spaces

    Get PDF
    Face recognition as a particular problem of pattern recognition has been attracting substantial attention from researchers in computer vision, pattern recognition, and machine learning. The recent Face Recognition Grand Challenge (FRGC) program reveals that uncontrolled illumination conditions pose grand challenges to face recognition performance. Most of the existing face recognition methods use gray-scale face images, which have been shown insufficient to tackle these challenges. To overcome this challenging problem in face recognition, this dissertation applies multiple features derived from the color images instead of the intensity images only. First, this dissertation presents two face recognition methods, which operate in different color spaces, using frequency features by means of Discrete Fourier Transform (DFT) and spatial features by means of Local Binary Patterns (LBP), respectively. The DFT frequency domain consists of the real part, the imaginary part, the magnitude, and the phase components, which provide the different interpretations of the input face images. The advantage of LBP in face recognition is attributed to its robustness in terms of intensity-level monotonic transformation, as well as its operation in the various scale image spaces. By fusing the frequency components or the multi-resolution LBP histograms, the complementary feature sets can be generated to enhance the capability of facial texture description. This dissertation thus uses the fused DFT and LBP features in two hybrid color spaces, the RIQ and the VIQ color spaces, respectively, for improving face recognition performance. Second, a method that extracts multiple features in the CID color space is presented for face recognition. As different color component images in the CID color space display different characteristics, three different image encoding methods, namely, the patch-based Gabor image representation, the multi-resolution LBP feature fusion, and the DCT-based multiple face encodings, are presented to effectively extract features from the component images for enhancing pattern recognition performance. To further improve classification performance, the similarity scores due to the three color component images are fused for the final decision making. Finally, a novel image representation is also discussed in this dissertation. Unlike a traditional intensity image that is directly derived from a linear combination of the R, G, and B color components, the novel image representation adapted to class separability is generated through a PCA plus FLD learning framework from the hybrid color space instead of the RGB color space. Based upon the novel image representation, a multiple feature fusion method is proposed to address the problem of face recognition under the severe illumination conditions. The aforementioned methods have been evaluated using two large-scale databases, namely, the Face Recognition Grand Challenge (FRGC) version 2 database and the FERET face database. Experimental results have shown that the proposed methods improve face recognition performance upon the traditional methods using the intensity images by large margins and outperform some state-of-the-art methods

    Data Reduction Algorithms in Machine Learning and Data Science

    Get PDF
    Raw data are usually required to be pre-processed for better representation or discrimination of classes. This pre-processing can be done by data reduction, i.e., either reduction in dimensionality or numerosity (cardinality). Dimensionality reduction can be used for feature extraction or data visualization. Numerosity reduction is useful for ranking data points or finding the most and least important data points. This thesis proposes several algorithms for data reduction, known as dimensionality and numerosity reduction, in machine learning and data science. Dimensionality reduction tackles feature extraction and feature selection methods while numerosity reduction includes prototype selection and prototype generation approaches. This thesis focuses on feature extraction and prototype selection for data reduction. Dimensionality reduction methods can be divided into three categories, i.e., spectral, probabilistic, and neural network-based methods. The spectral methods have a geometrical point of view and are mostly reduced to the generalized eigenvalue problem. Probabilistic and network-based methods have stochastic and information theoretic foundations, respectively. Numerosity reduction methods can be divided into methods based on variance, geometry, and isolation. For dimensionality reduction, under the spectral category, I propose weighted Fisher discriminant analysis, Roweis discriminant analysis, and image quality aware embedding. I also propose quantile-quantile embedding as a probabilistic method where the distribution of embedding is chosen by the user. Backprojection, Fisher losses, and dynamic triplet sampling using Bayesian updating are other proposed methods in the neural network-based category. Backprojection is for training shallow networks with a projection-based perspective in manifold learning. Two Fisher losses are proposed for training Siamese triplet networks for increasing and decreasing the inter- and intra-class variances, respectively. Two dynamic triplet mining methods, which are based on Bayesian updating to draw triplet samples stochastically, are proposed. For numerosity reduction, principal sample analysis and instance ranking by matrix decomposition are the proposed variance-based methods; these methods rank instances using inter-/intra-class variances and matrix factorization, respectively. Curvature anomaly detection, in which the points are assumed to be the vertices of polyhedron, and isolation Mondrian forest are the proposed methods based on geometry and isolation, respectively. To assess the proposed tools developed for data reduction, I apply them to some applications in medical image analysis, image processing, and computer vision. Data reduction, used as a pre-processing tool, has different applications because it provides various ways of feature extraction and prototype selection for applying to different types of data. Dimensionality reduction extracts informative features and prototype selection selects the most informative data instances. For example, for medical image analysis, I use Fisher losses and dynamic triplet sampling for embedding histopathology image patches and demonstrating how different the tumorous cancer tissue types are from the normal ones. I also propose offline/online triplet mining using extreme distances for this embedding. In image processing and computer vision application, I propose Roweisfaces and Roweisposes for face recognition and 3D action recognition, respectively, using my proposed Roweis discriminant analysis method. I also introduce the concepts of anomaly landscape and anomaly path using the proposed curvature anomaly detection and use them to denoise images and video frames. I report extensive experiments, on different datasets, to show the effectiveness of the proposed algorithms. By experiments, I demonstrate that the proposed methods are useful for extracting informative features and instances for better accuracy, representation, prediction, class separation, data reduction, and embedding. I show that the proposed dimensionality reduction methods can extract informative features for better separation of classes. An example is obtaining an embedding space for separating cancer histopathology patches from the normal patches which helps hospitals diagnose cancers more easily in an automatic way. I also show that the proposed numerosity reduction methods are useful for ranking data instances based on their importance and reducing data volumes without a significant drop in performance of machine learning and data science algorithms

    Towards Video Transformers for Automatic Human Analysis

    Full text link
    [eng] With the aim of creating artificial systems capable of mirroring the nuanced understanding and interpretative powers inherent to human cognition, this thesis embarks on an exploration of the intersection between human analysis and Video Transformers. The objective is to harness the potential of Transformers, a promising architectural paradigm, to comprehend the intricacies of human interaction, thus paving the way for the development of empathetic and context-aware intelligent systems. In order to do so, we explore the whole Computer Vision pipeline, from data gathering, to deeply analyzing recent developments, through model design and experimentation. Central to this study is the creation of UDIVA, an expansive multi-modal, multi-view dataset capturing dyadic face-to-face human interactions. Comprising 147 participants across 188 sessions, UDIVA integrates audio-visual recordings, heart-rate measurements, personality assessments, socio- demographic metadata, and conversational transcripts, establishing itself as the largest dataset for dyadic human interaction analysis up to this date. This dataset provides a rich context for probing the capabilities of Transformers within complex environments. In order to validate its utility, as well as to elucidate Transformers' ability to assimilate diverse contextual cues, we focus on addressing the challenge of personality regression within interaction scenarios. We first adapt an existing Video Transformer to handle multiple contextual sources and conduct rigorous experimentation. We empirically observe a progressive enhancement in model performance as more context is added, reinforcing the potential of Transformers to decode intricate human dynamics. Building upon these findings, the Dyadformer emerges as a novel architecture, adept at long-range modeling of dyadic interactions. By jointly modeling both participants in the interaction, as well as embedding multi- modal integration into the model itself, the Dyadformer surpasses the baseline and other concurrent approaches, underscoring Transformers' aptitude in deciphering multifaceted, noisy, and challenging tasks such as the analysis of human personality in interaction. Nonetheless, these experiments unveil the ubiquitous challenges when training Transformers, particularly in managing overfitting due to their demand for extensive datasets. Consequently, we conclude this thesis with a comprehensive investigation into Video Transformers, analyzing topics ranging from architectural designs and training strategies, to input embedding and tokenization, traversing through multi-modality and specific applications. Across these, we highlight trends which optimally harness spatio-temporal representations that handle video redundancy and high dimensionality. A culminating performance comparison is conducted in the realm of video action classification, spotlighting strategies that exhibit superior efficacy, even compared to traditional CNN-based methods.[cat] Aquesta tesi busca crear sistemes artificials que reflecteixin les habilitats de comprensió i interpretació humanes a través de l'ús de Transformers per a vídeo. L'objectiu és utilitzar aquestes arquitectures per comprendre millor la interacció humana i desenvolupar sistemes intel·ligents i conscients de l'entorn. Això implica explorar àmplies àrees de la Visió per Computador, des de la recopilació de dades fins a l'anàlisi de l'estat de l'art i la prova experimental d'aquests models. Una part essencial d'aquest estudi és la creació d'UDIVA, un ampli conjunt de dades multimodal i multivista que enregistra interaccions humanes cara a cara. Amb 147 participants i 188 sessions, UDIVA inclou contingut audiovisual, freqüència cardíaca, perfils de personalitat, dades sociodemogràfiques i transcripcions de les converses. És el conjunt de dades més gran conegut per a l'anàlisi de la interacció humana diàdica i proporciona un context ric per a l'estudi de les capacitats dels Transformers en entorns complexos. Per tal de validar la seva utilitat i les habilitats dels Transformers, ens centrem en la regressió de la personalitat. Inicialment, adaptem un Transformer de vídeo per integrar diverses fonts de context. Mitjançant experiments exhaustius, observem millores progressives en els resultats amb la inclusió de més context, confirmant la capacitat dels Transformers. Motivats per aquests resultats, desenvolupem el Dyadformer, una arquitectura per interaccions diàdiques de llarga duració. Aquesta nova arquitectura considera simultàniament els dos participants en la interacció i incorpora la multimodalitat en un sol model. El Dyadformer supera la nostra proposta inicial i altres treballs similars, destacant la capacitat dels Transformers per abordar tasques complexes. No obstant això, aquestos experiments revelen reptes d'entrenament dels Transformers, com el sobreajustament, per la seva necessitat de grans conjunts de dades. La tesi conclou amb una anàlisi profunda dels Transformers per a vídeo, incloent dissenys arquitectònics, estratègies d'entrenament, preprocessament de vídeos, tokenització i multimodalitat. S'identifiquen tendències per gestionar la redundància i alta dimensionalitat de vídeos i es realitza una comparació de rendiment en la classificació d'accions a vídeo, destacant estratègies d'eficàcia superior als mètodes tradicionals basats en convolucions

    A Revision of Procedural Knowledge in the conML Framework

    Get PDF
    Machine learning methods have been used very successfully for quite some time to recognize patterns, model correlations and generate hypotheses. However, the possibilities for weighing and evaluating the resulting models and hypotheses, and the search for alternatives and contradictions are still predominantly reserved for humans. For this purpose, the novel concept of constructivist machine learning (conML) formalizes limitations of model validity and employs constructivist learning theory to enable doubting of new and existing models with the possibility of integrating, discarding, combining, and abstracting knowledge. The present work identifies issues that impede the systems capability to abstract knowledge from generated models for tasks that lie in the domain of procedural knowledge, and proposes and implements identified solutions. To this end, the conML framework has been reimplemented in the Julia programming language and subsequently been extended. Using a synthetic dataset of impedance spectra of modeled epithelia that has previously been analyzed with an existing implementation of conML, existing and new implementations are tested for consistency and proposed algorithmic changes are evaluated with respect to changes in model generation and abstraction ability when exploring unknown data. Recommendations for specific settings and suggestions for further research are derived from the results. In terms of performance, flexibility and extensibility, the new implementation of conML in Julia provides a good starting point for further research and application of the system.:Contents Abstract . . . . . III Zusammenfassung . . . . . IV Danksagung . . . . . V Selbstständigkeitserklärung . . . . . V 1. Introduction 1.1. Research Questions . . . . . 2 2. Related Work 2.1. Hybrid AI Systems . . . . . 5 2.2. Constructivist Machine Learning (conML) . . . . . 6 2.3. Implemented Methods . . . . . 9 2.3.1. Unsupervised Machine Learning . . . . . 9 2.3.2. Supervised Machine Learning . . . . . 11 2.3.3. Supervised Feature Selection . . . . . 13 2.3.4. Unsupervised Feature Selection . . . . . 17 3. Methods and Implementation 3.1. Notable Algorithmic Changes . . . . . 19 3.1.1. Rescaling of Target Values . . . . . 19 3.1.2. ExtendedWinner Selection . . . . . 21 3.2. Package Structure . . . . . 23 3.3. Interfaces and Implementation of Specific Methods . . . . . 29 3.4. Datasets . . . . . 41 4. Results 4.1. Validation Against the conML Prototype . . . . . 43 4.2. Change in Abstraction Capability . . . . . 49 4.2.1. Influence of Target Scaling . . . . . 49 4.2.2. Influence of the Parameter kappa_p . . . . . 55 4.2.3. Influence of the Winner Selection Procedure . . . . . 61 5. Discussion 5.1. Reproduction Results . . . . . 67 5.2. Rescaling of Constructed Targets . . . . . 69 5.3. kappa_p and the Selection of Winner Models . . . . . 71 6. Conclusions 6.1. Contributions of this Work . . . . . 77 6.2. Future Work . . . . . 78 A. Julia Language Reference . . . . . 81 B. Additional Code Listings . . . . . 91 C. Available Parameters . . . . . 99 C.1. Block Processing . . . . . 105 D. Configurations Reference . . . . . 107 D.1. Unsupervised Methods . . . . . 107 D.2. Supervised Methods . . . . . 108 D.3. Feature Selection . . . . . 109 D.4. Winner Selection . . . . . 110 D.5. General Settings . . . . . 110 E. Supplemental Figures . . . . . 113 E.1. Replacing MAPE with RMSE for Z-Transform Target Scaling . . . . . 113 E.2. Combining Target Rescaling, Winner Selection and High kappa_p . . . . . 119 Bibliography . . . . . 123 List of Figures . . . . . 129 List of Listings . . . . . 133 List of Tables . . . . . 135Maschinelle Lernverfahren werden seit geraumer Zeit sehr erfolgreich zum Erkennen von Mustern, Abbilden von Zusammenhängen und Generieren von Hypothesen eingesetzt. Die Möglichkeiten zum Abwägen und Bewerten der entstandenen Modelle und Hypothesen, und die Suche nach Alternativen und Widersprüchen sind jedoch noch überwiegend dem Menschen vorbehalten. Das neuartige Konzept des konstruktivistischen maschinellen Lernens (conML) formalisiert dazu die Grenzen der Gültigkeit von Modellen und ermöglicht mittels konstruktivistischer Lerntheorie ein Zweifeln über neue und bestehende Modelle mit der Möglichkeit zum Integrieren, Verwerfen, Kombinieren und Abstrahieren von Wissen. Die vorliegende Arbeit identifiziert Probleme, die die Abstraktionsfähigkeit des Systems bei Aufgabenstellungen in der Prozeduralen Wissensdomäne einschränken, bietet Lösungsvorschläge und beschreibt deren Umsetzung. Das algorithmische Framework conML ist dazu in der Programmiersprache Julia reimplementiert und anschließend erweitert worden. Anhand eines synthetischen Datensatzes von Impedanzspektren modellierter Epithelien, der bereits mit einem Prototypen des conML Systems analysiert worden ist, werden bestehende und neue Implementierung auf Konsistenz geprüft und die vorgeschlagenen algorithmischen Änderungen im Hinblick auf Veränderungen beim Erzeugen von Modellen und der Abstraktionsfähigkeit bei der Exploration unbekannter Daten untersucht. Aus den Ergebnissen werden Empfehlungen zu konkreten Einstellungen sowie Vorschläge für weitere Untersuchungen abgeleitet. Die neue Implementierung von conML in Julia bietet im Hinblick auf Performanz, Flexibilität und Erweiterbarkeit einen guten Ausgangspunkt für weitere Forschung und Anwendung des Systems.:Contents Abstract . . . . . III Zusammenfassung . . . . . IV Danksagung . . . . . V Selbstständigkeitserklärung . . . . . V 1. Introduction 1.1. Research Questions . . . . . 2 2. Related Work 2.1. Hybrid AI Systems . . . . . 5 2.2. Constructivist Machine Learning (conML) . . . . . 6 2.3. Implemented Methods . . . . . 9 2.3.1. Unsupervised Machine Learning . . . . . 9 2.3.2. Supervised Machine Learning . . . . . 11 2.3.3. Supervised Feature Selection . . . . . 13 2.3.4. Unsupervised Feature Selection . . . . . 17 3. Methods and Implementation 3.1. Notable Algorithmic Changes . . . . . 19 3.1.1. Rescaling of Target Values . . . . . 19 3.1.2. ExtendedWinner Selection . . . . . 21 3.2. Package Structure . . . . . 23 3.3. Interfaces and Implementation of Specific Methods . . . . . 29 3.4. Datasets . . . . . 41 4. Results 4.1. Validation Against the conML Prototype . . . . . 43 4.2. Change in Abstraction Capability . . . . . 49 4.2.1. Influence of Target Scaling . . . . . 49 4.2.2. Influence of the Parameter kappa_p . . . . . 55 4.2.3. Influence of the Winner Selection Procedure . . . . . 61 5. Discussion 5.1. Reproduction Results . . . . . 67 5.2. Rescaling of Constructed Targets . . . . . 69 5.3. kappa_p and the Selection of Winner Models . . . . . 71 6. Conclusions 6.1. Contributions of this Work . . . . . 77 6.2. Future Work . . . . . 78 A. Julia Language Reference . . . . . 81 B. Additional Code Listings . . . . . 91 C. Available Parameters . . . . . 99 C.1. Block Processing . . . . . 105 D. Configurations Reference . . . . . 107 D.1. Unsupervised Methods . . . . . 107 D.2. Supervised Methods . . . . . 108 D.3. Feature Selection . . . . . 109 D.4. Winner Selection . . . . . 110 D.5. General Settings . . . . . 110 E. Supplemental Figures . . . . . 113 E.1. Replacing MAPE with RMSE for Z-Transform Target Scaling . . . . . 113 E.2. Combining Target Rescaling, Winner Selection and High kappa_p . . . . . 119 Bibliography . . . . . 123 List of Figures . . . . . 129 List of Listings . . . . . 133 List of Tables . . . . . 13
    corecore