64 research outputs found

    Deep fisher discriminant analysis

    Full text link
    Fisher Discriminant Analysis’ linear nature and the usual eigen-analysis approach to its solution have limited the application of its underlying elegant idea. In this work we will take advantage of some recent partially equivalent formulations based on standard least squares regression to develop a simple Deep Neural Network (DNN) extension of Fisher’s analysis that greatly improves on its ability to cluster sample projections around their class means while keeping these apart. This is shown by the much better accuracies and g scores of class mean classifiers when applied to the features provided by simple DNN architectures than what can be achieved using Fisher’s linear onesWith partial support from Spain's grants TIN2013-42351- P, TIN2016-76406-P, TIN2015-70308-REDT and S2013/ICE-2845 CASI-CAMCM. Work supported also by project FACIL{Ayudas Fundaci on BBVA a Equipos de Investigación Científica 2016, the UAM{ADIC Chair for Data Science and Machine Learning and Instituto de Ingeniería del Conocimiento. The third author is also supported by the FPU{MEC grant AP-2012-5163. We gratefully acknowledge the use of the facilities of Centro de Computacón Científi ca (CCC) at UA

    Improving ductal carcinoma in situ classification by convolutional neural network with exponential linear unit and rank-based weighted pooling

    Get PDF
    Ductal carcinoma in situ (DCIS) is a pre-cancerous lesion in the ducts of the breast, and early diagnosis is crucial for optimal therapeutic intervention. Thermography imaging is a non-invasive imaging tool that can be utilized for detection of DCIS and although it has high accuracy (~88%), it is sensitivity can still be improved. Hence, we aimed to develop an automated artificial intelligence-based system for improved detection of DCIS in thermographs. This study proposed a novel artificial intelligence based system based on convolutional neural network (CNN) termed CNN-BDER on a multisource dataset containing 240 DCIS images and 240 healthy breast images. Based on CNN, batch normalization, dropout, exponential linear unit and rank-based weighted pooling were integrated, along with L-way data augmentation. Ten runs of tenfold cross validation were chosen to report the unbiased performances. Our proposed method achieved a sensitivity of 94.08±1.22%, a specificity of 93.58±1.49 and an accuracy of 93.83±0.96. The proposed method gives superior performance than eight state-of-theart approaches and manual diagnosis. The trained model could serve as a visual question answering system and improve diagnostic accuracy.British Heart Foundation Accelerator Award, UKRoyal Society International Exchanges Cost Share Award, UK RP202G0230Hope Foundation for Cancer Research, UK RM60G0680Medical Research Council Confidence in Concept Award, UK MC_PC_17171MINECO/FEDER, Spain/Europe RTI2018-098913-B100 A-TIC-080-UGR1

    The new method of Extraction and Analysis of Non-linear Features for face recognition

    Get PDF
    In this paper, we introduce the new method of Extraction and Analysis of Non-linear Features (EANF) for face recognition based on extraction and analysis of nonlinear features i.e. Locality Preserving Analysis. In our proposed algorithm, EANF removes disadvantages such as the length of search space, different sizes and qualities of imagees due to various conditions of imaging time that has led to problems in the previous algorithms and removes the disadvantages of ELPDA methods (local neighborhood separator analysis) using the Scatter matrix in the form of a between-class scatter that this matrix introduces and displayes the nearest neighbors to K of the outer class by the samples. In addition, another advantage of EANF is high-speed in the face recognition through miniaturizing the size of feature matrix by NLPCA (Non-Linear Locality Preserving Analysis). Finally, the results of tests on FERET Dataset show the impact of the proposed method on the face recognition.DOI:http://dx.doi.org/10.11591/ijece.v2i6.177

    SSDA-YOLO: Semi-supervised Domain Adaptive YOLO for Cross-Domain Object Detection

    Full text link
    Domain adaptive object detection (DAOD) aims to alleviate transfer performance degradation caused by the cross-domain discrepancy. However, most existing DAOD methods are dominated by outdated and computationally intensive two-stage Faster R-CNN, which is not the first choice for industrial applications. In this paper, we propose a novel semi-supervised domain adaptive YOLO (SSDA-YOLO) based method to improve cross-domain detection performance by integrating the compact one-stage stronger detector YOLOv5 with domain adaptation. Specifically, we adapt the knowledge distillation framework with the Mean Teacher model to assist the student model in obtaining instance-level features of the unlabeled target domain. We also utilize the scene style transfer to cross-generate pseudo images in different domains for remedying image-level differences. In addition, an intuitive consistency loss is proposed to further align cross-domain predictions. We evaluate SSDA-YOLO on public benchmarks including PascalVOC, Clipart1k, Cityscapes, and Foggy Cityscapes. Moreover, to verify its generalization, we conduct experiments on yawning detection datasets collected from various real classrooms. The results show considerable improvements of our method in these DAOD tasks, which reveals both the effectiveness of proposed adaptive modules and the urgency of applying more advanced detectors in DAOD. Our code is available on \url{https://github.com/hnuzhy/SSDA-YOLO}.Comment: submitted to CVI

    Discriminative preprocessing of speech : towards improving biometric authentication

    Get PDF
    Im Rahmen des "SecurePhone-Projektes" wurde ein multimodales System zur Benutzerauthentifizierung entwickelt, das auf ein PDA implementiert wurde. Bei der vollzogenen Erweiterung dieses Systems wurde der Möglichkeit nachgegangen, die Benutzerauthentifizierung durch eine auf biometrischen Parametern (E.: "feature enhancement") basierende Unterscheidung zwischen Sprechern sowie durch eine Kombination mehrerer Parameter zu verbessern. In der vorliegenden Dissertation wird ein allgemeines Bezugssystem zur Verbesserung der Parameter präsentiert, das ein mehrschichtiges neuronales Netz (E.: "MLP: multilayer perceptron") benutzt, um zu einer optimalen Sprecherdiskrimination zu gelangen. In einem ersten Schritt wird beim Trainieren des MLPs eine Teilmenge der Sprecher (Sprecherbasis) berücksichtigt, um die zugrundeliegenden Charakteristika des vorhandenen akustischen Parameterraums darzustellen. Am Ende eines zweiten Schrittes steht die Erkenntnis, dass die Größe der verwendeten Sprecherbasis die Leistungsfähigkeit eines Sprechererkennungssystems entscheidend beeinflussen kann. Ein dritter Schritt führt zur Feststellung, dass sich die Selektion der Sprecherbasis ebenfalls auf die Leistungsfähigkeit des Systems auswirken kann. Aufgrund dieser Beobachtung wird eine automatische Selektionsmethode für die Sprecher auf der Basis des maximalen Durchschnittswertes der Zwischenklassenvariation (between-class variance) vorgeschlagen. Unter Rückgriff auf verschiedene sprachliche Produktionssituationen (Sprachproduktion mit und ohne Hintergrundgeräusche; Sprachproduktion beim Telefonieren) wird gezeigt, dass diese Methode die Leistungsfähigkeit des Erkennungssystems verbessern kann. Auf der Grundlage dieser Ergebnisse wird erwartet, dass sich die hier für die Sprechererkennung verwendete Methode auch für andere biometrische Modalitäten als sinnvoll erweist. Zusätzlich wird in der vorliegenden Dissertation eine alternative Parameterrepräsentation vorgeschlagen, die aus der sog. "Sprecher-Stimme-Signatur" (E.: "SVS: speaker voice signature") abgeleitet wird. Die SVS besteht aus Trajektorien in einem Kohonennetz (E.: "SOM: self-organising map"), das den akustischen Raum repräsentiert. Als weiteres Ergebnis der Arbeit erweist sich diese Parameterrepräsentation als Ergänzung zu dem zugrundeliegenden Parameterset. Deshalb liegt eine Kombination beider Parametersets im Sinne einer Verbesserung der Leistungsfähigkeit des Erkennungssystems nahe. Am Ende der Arbeit sind schließlich einige potentielle Erweiterungsmöglichkeiten zu den vorgestellten Methoden zu finden. Schlüsselwörter: Feature Enhancement, MLP, SOM, Sprecher-Basis-Selektion, SprechererkennungIn the context of the SecurePhone project, a multimodal user authentication system was developed for implementation on a PDA. Extending this system, we investigate biometric feature enhancement and multi-feature fusion with the aim of improving user authentication accuracy. In this dissertation, a general framework for feature enhancement is proposed which uses a multilayer perceptron (MLP) to achieve optimal speaker discrimination. First, to train this MLP a subset of speakers (speaker basis) is used to represent the underlying characteristics of the given acoustic feature space. Second, the size of the speaker basis is found to be among the crucial factors affecting the performance of a speaker recognition system. Third, it is found that the selection of the speaker basis can also influence system performance. Based on this observation, an automatic speaker selection approach is proposed on the basis of the maximal average between-class variance. Tests in a variety of conditions, including clean and noisy as well as telephone speech, show that this approach can improve the performance of speaker recognition systems. This approach, which is applied here to feature enhancement for speaker recognition, can be expected to also be effective with other biometric modalities besides speech. Further, an alternative feature representation is proposed in this dissertation, which is derived from what we call speaker voice signatures (SVS). These are trajectories in a Kohonen self organising map (SOM) which has been trained to represent the acoustic space. This feature representation is found to be somewhat complementary to the baseline feature set, suggesting that they can be fused to achieve improved performance in speaker recognition. Finally, this dissertation finishes with a number of potential extensions of the proposed approaches. Keywords: feature enhancement, MLP, SOM, speaker basis selection, speaker recognition, biometric, authentication, verificatio

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Multimodal Adversarial Learning

    Get PDF
    Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art

    Discriminative connectionist approaches for automatic speech recognition in cars

    Get PDF
    The first part of this thesis is devoted to the evaluation of approaches which exploit the inherent redundancy of the speech signal to improve the noise robustness. On the basis of this evaluation on the AURORA 2000 database, we further study in detail two of the evaluated approaches. The first of these approaches is the hybrid RBF/HMM approach, which is an attempt to combine the superior classification performance of radial basis functions (RBFs) with the ability of HMMs to model time variation. The second approach is using neural networks to non-linearly reduce the dimensionality of large feature vectors including context frames. We propose the use of different MLP topologies for that purpose. Experiments on the AURORA 2000 database reveal that the performance of the first approach is similar to the performance of systems based on SCHMMs. The second approach cannot outperform the performance of linear discriminant analysis (LDA) on a database recorded in real car environments, but it is on average significantly better than LDA on the AURORA 2000 database.Im ersten Teil dieser Arbeit werden bestehende Verfahren zur Erhöhung der Robustheit von Spracherkennungssystemen in lauten Umgebungen evaluiert, die auf der Ausnutzung der Redundanz im Sprachsignal basieren. Auf der Grundlage dieser Evaluation auf der AURORA 2000 Datenbank werden zwei spezielle Ansätze weiter ausgearbeitet und detalliert analysiert. Der erste dieser Ansätze verbindet die herausragende Klassifikationsleistung von neuronalen Netzen mit radialen Basisfunktionen (RBF) mit der Fähigkeit von Hidden-Markov-Modellen (HMM), Zeitveränderlichkeiten zu modellieren. In einem zweiten Ansatz werden NN zur nichtlinearen Dimensionsreduktion hochdimensionaler Kontextvektoren in unterschiedlichen Netzwerk-Topologien untersucht. In Experimenten konnte gezeigt werden, dass der erste dieser Ansätze für die AURORA-Datenbank eine ähnliche Leistungsfähigkeit wie semikontinuierliche HMM (SCHMM) aufweist. Der zweite Ansatz erzielt auf einer im Kraftfahrzeug aufgenommenen Datenbank keine Verbesserung gegenüber den klassischen linearen Ansätzen zu Dimensionsreduktion (LDA), erweist sich aber auf der AURORA-Datenbank als signifikan

    Unsupervised representation learning with recognition-parametrised probabilistic models

    Full text link
    We introduce a new approach to probabilistic unsupervised learning based on the recognition-parametrised model (RPM): a normalised semi-parametric hypothesis class for joint distributions over observed and latent variables. Under the key assumption that observations are conditionally independent given latents, the RPM combines parametric prior and observation-conditioned latent distributions with non-parametric observation marginals. This approach leads to a flexible learnt recognition model capturing latent dependence between observations, without the need for an explicit, parametric generative model. The RPM admits exact maximum-likelihood learning for discrete latents, even for powerful neural-network-based recognition. We develop effective approximations applicable in the continuous-latent case. Experiments demonstrate the effectiveness of the RPM on high-dimensional data, learning image classification from weak indirect supervision; direct image-level latent Dirichlet allocation; and recognition-parametrised Gaussian process factor analysis (RP-GPFA) applied to multi-factorial spatiotemporal datasets. The RPM provides a powerful framework to discover meaningful latent structure underlying observational data, a function critical to both animal and artificial intelligence
    • …
    corecore