37 research outputs found

    Learning models for semantic classification of insufficient plantar pressure images

    Get PDF
    Establishing a reliable and stable model to predict a target by using insufficient labeled samples is feasible and effective, particularly, for a sensor-generated data-set. This paper has been inspired with insufficient data-set learning algorithms, such as metric-based, prototype networks and meta-learning, and therefore we propose an insufficient data-set transfer model learning method. Firstly, two basic models for transfer learning are introduced. A classification system and calculation criteria are then subsequently introduced. Secondly, a dataset of plantar pressure for comfort shoe design is acquired and preprocessed through foot scan system; and by using a pre-trained convolution neural network employing AlexNet and convolution neural network (CNN)- based transfer modeling, the classification accuracy of the plantar pressure images is over 93.5%. Finally, the proposed method has been compared to the current classifiers VGG, ResNet, AlexNet and pre-trained CNN. Also, our work is compared with known-scaling and shifting (SS) and unknown-plain slot (PS) partition methods on the public test databases: SUN, CUB, AWA1, AWA2, and aPY with indices of precision (tr, ts, H) and time (training and evaluation). The proposed method for the plantar pressure classification task shows high performance in most indices when comparing with other methods. The transfer learning-based method can be applied to other insufficient data-sets of sensor imaging fields

    Learning Models for Semantic Classification of Insufficient Plantar Pressure Images

    Get PDF
    Establishing a reliable and stable model to predict a target by using insufficient labeled samples is feasible and effective, particularly, for a sensor-generated data-set. This paper has been inspired with insufficient data-set learning algorithms, such as metric-based, prototype networks and meta-learning, and therefore we propose an insufficient data-set transfer model learning method. Firstly, two basic models for transfer learning are introduced. A classification system and calculation criteria are then subsequently introduced. Secondly, a dataset of plantar pressure for comfort shoe design is acquired and preprocessed through foot scan system; and by using a pre-trained convolution neural network employing AlexNet and convolution neural network (CNN)- based transfer modeling, the classification accuracy of the plantar pressure images is over 93.5%. Finally, the proposed method has been compared to the current classifiers VGG, ResNet, AlexNet and pre-trained CNN. Also, our work is compared with known-scaling and shifting (SS) and unknown-plain slot (PS) partition methods on the public test databases: SUN, CUB, AWA1, AWA2, and aPY with indices of precision (tr, ts, H) and time (training and evaluation). The proposed method for the plantar pressure classification task shows high performance in most indices when comparing with other methods. The transfer learning-based method can be applied to other insufficient data-sets of sensor imaging fields

    UNCOVERING PATTERNS IN COMPLEX DATA WITH RESERVOIR COMPUTING AND NETWORK ANALYTICS: A DYNAMICAL SYSTEMS APPROACH

    Get PDF
    In this thesis, we explore methods of uncovering underlying patterns in complex data, and making predictions, through machine learning and network science. With the availability of more data, machine learning for data analysis has advanced rapidly. However, there is a general lack of approaches that might allow us to 'open the black box'. In the machine learning part of this thesis, we primarily use an architecture called Reservoir Computing for time-series prediction and image classification, while exploring how information is encoded in the reservoir dynamics. First, we investigate the ways in which a Reservoir Computer (RC) learns concepts such as 'similar' and 'different', and relationships such as 'blurring', 'rotation' etc. between image pairs, and generalizes these concepts to different classes unseen during training. We observe that the high dimensional reservoir dynamics display different patterns for different relationships. This clustering allows RCs to perform significantly better in generalization with limited training compared with state-of-the-art pair-based convolutional/deep Siamese Neural Networks. Second, we demonstrate the utility of an RC in the separation of superimposed chaotic signals. We assume no knowledge of the dynamical equations that produce the signals, and require only that the training data consist of finite time samples of the component signals. We find that our method significantly outperforms the optimal linear solution to the separation problem, the Wiener filter. To understand how representations of signals are encoded in an RC during learning, we study its dynamical properties when trained to predict chaotic Lorenz signals. We do so by using a novel, mathematical fixed-point-finding technique called directional fibers. We find that, after training, the high dimensional RC dynamics includes fixed points that map to the known Lorenz fixed points, but the RC also has spurious fixed points, which are relevant to how its predictions break down. While machine learning is a useful data processing tool, its success often relies on a useful representation of the system's information. In contrast, systems with a large numbers of interacting components may be better analyzed by modeling them as networks. While numerous advances in network science have helped us analyze such systems, tools that identify properties on networks modeling multi-variate time-evolving data (such as disease data) are limited. We close this gap by introducing a novel data-driven, network-based Trajectory Profile Clustering (TPC) algorithm for 1) identification of disease subtypes and 2) early prediction of subtype/disease progression patterns. TPC identifies subtypes by clustering patients with similar disease trajectory profiles derived from bipartite patient-variable networks. Applying TPC to a Parkinson’s dataset, we identify 3 distinct subtypes. Additionally, we show that TPC predicts disease subtype 4 years in advance with 74% accuracy

    Texture and Colour in Image Analysis

    Get PDF
    Research in colour and texture has experienced major changes in the last few years. This book presents some recent advances in the field, specifically in the theory and applications of colour texture analysis. This volume also features benchmarks, comparative evaluations and reviews

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Data-Driven Query by Vocal Percussion

    Get PDF
    The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Query by Vocal Percussion (QVP) is a subfield in Music Information Retrieval (MIR) that explores techniques to query percussive sounds using vocal imitations as input, usually plosive consonant sounds. In this way, fully automated QVP systems can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. This project explores the potential usefulness of recent data-driven neural network models in two of the most important tasks in QVP. Algorithms relative to Vocal Percussion Transcription (VPT) detect and classify vocal percussion sound events in a beatbox-like performance so to trigger individual drum samples. Algorithms relative to Drum Sample Retrieval by Vocalisation (DSRV) use input vocal imitations to pick appropriate drum samples from a sound library via timbral similarity. Our experiments with several kinds of data-driven deep neural networks suggest that these achieve better results in both VPT and DSRV compared to traditional data-informed approaches based on heuristic audio features. We also find that these networks, when paired with strong regularisation techniques, can still outperform data-informed approaches when data is scarce. Finally, we gather several insights relative to people’s approach to vocal percussion and how user-based algorithms are essential to better model individual differences in vocalisation styles

    Automatic Recognition of Non-Verbal Acoustic Communication Events With Neural Networks

    Get PDF
    Non-verbal acoustic communication is of high importance to humans and animals: Infants use the voice as a primary communication tool. Animals of all kinds employ acoustic communication, such as chimpanzees, which use pant-hoot vocalizations for long-distance communication. Many applications require the assessment of such communication for a variety of analysis goals. Computational systems can support these areas through automatization of the assessment process. This is of particular importance in monitoring scenarios over large spatial and time scales, which are infeasible to perform manually. Algorithms for sound recognition have traditionally been based on conventional machine learning approaches. In recent years, so-called representation learning approaches have gained increasing popularity. This particularly includes deep learning approaches that feed raw data to deep neural networks. However, there remain open challenges in applying these approaches to automatic recognition of non-verbal acoustic communication events, such as compensating for small data set sizes. The leading question of this thesis is: How can we apply deep learning more effectively to automatic recognition of non-verbal acoustic communication events? The target communication types were specifically (1) infant vocalizations and (2) chimpanzee long-distance calls. This thesis comprises four studies that investigated aspects of this question: Study (A) investigated the assessment of infant vocalizations by laypersons. The central goal was to derive an infant vocalization classification scheme based on the laypersons' perception. The study method was based on the Nijmegen Protocol, where participants rated vocalization recordings through various items, such as affective ratings and class labels. Results showed a strong association between valence ratings and class labels, which was used to derive a classification scheme. Study (B) was a comparative study on various neural network types for the automatic classification of infant vocalizations. The goal was to determine the best performing network type among the currently most prevailing ones, while considering the influence of their architectural configuration. Results showed that convolutional neural networks outperformed recurrent neural networks and that the choice of the frequency and time aggregation layer inside the network is the most important architectural choice. Study (C) was a detailed investigation on computer vision-like convolutional neural networks for infant vocalization classification. The goal was to determine the most important architectural properties for increasing classification performance. Results confirmed the importance of the aggregation layer and additionally identified the input size of the fully-connected layers and the accumulated receptive field to be of major importance. Study (D) was an investigation on compensating class imbalance for chimpanzee call detection in naturalistic long-term recordings. The goal was to determine which compensation method among a selected group improved performance the most for a deep learning system. Results showed that spectrogram denoising was most effective, while methods for compensating relative imbalance either retained or decreased performance.:1. Introduction 2. Foundations in Automatic Recognition of Acoustic Communication 3. State of Research 4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons 5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations 6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations 7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks 8. Conclusion and Collected Discussion 9. AppendixNonverbale akustische Kommunikation ist für Menschen und Tiere von großer Bedeutung: Säuglinge nutzen die Stimme als primäres Kommunikationsmittel. Schimpanse verwenden sogenannte 'Pant-hoots' und Trommeln zur Kommunikation über weite Entfernungen. Viele Anwendungen erfordern die Beurteilung solcher Kommunikation für verschiedenste Analyseziele. Algorithmen können solche Bereiche durch die Automatisierung der Beurteilung unterstützen. Dies ist besonders wichtig beim Monitoring langer Zeitspannen oder großer Gebiete, welche manuell nicht durchführbar sind. Algorithmen zur Geräuscherkennung verwendeten bisher größtenteils konventionelle Ansätzen des maschinellen Lernens. In den letzten Jahren hat eine alternative Herangehensweise Popularität gewonnen, das sogenannte Representation Learning. Dazu gehört insbesondere Deep Learning, bei dem Rohdaten in tiefe neuronale Netze eingespeist werden. Jedoch gibt es bei der Anwendung dieser Ansätze auf die automatische Erkennung von nonverbaler akustischer Kommunikation ungelöste Herausforderungen, wie z.B. die Kompensation der relativ kleinen Datenmengen. Die Leitfrage dieser Arbeit ist: Wie können wir Deep Learning effektiver zur automatischen Erkennung nonverbaler akustischer Kommunikation verwenden? Diese Arbeit konzentriert sich speziell auf zwei Kommunikationsarten: (1) vokale Laute von Säuglingen (2) Langstreckenrufe von Schimpansen. Diese Arbeit umfasst vier Studien, welche Aspekte dieser Frage untersuchen: Studie (A) untersuchte die Beurteilung von Säuglingslauten durch Laien. Zentrales Ziel war die Ableitung eines Klassifikationsschemas für Säuglingslaute auf der Grundlage der Wahrnehmung von Laien. Die Untersuchungsmethode basierte auf dem sogenannten Nijmegen-Protokoll. Hier beurteilten die Teilnehmenden Lautaufnahmen von Säuglingen anhand verschiedener Variablen, wie z.B. affektive Bewertungen und Klassenbezeichnungen. Die Ergebnisse zeigten eine starke Assoziation zwischen Valenzbewertungen und Klassenbezeichnungen, die zur Ableitung eines Klassifikationsschemas verwendet wurde. Studie (B) war eine vergleichende Studie verschiedener Typen neuronaler Netzwerke für die automatische Klassifizierung von Säuglingslauten. Ziel war es, den leistungsfähigsten Netzwerktyp unter den momentan verbreitetsten Typen zu ermitteln. Hierbei wurde der Einfluss verschiedener architektonischer Konfigurationen innerhalb der Typen berücksichtigt. Die Ergebnisse zeigten, dass Convolutional Neural Networks eine höhere Performance als Recurrent Neural Networks erreichten. Außerdem wurde gezeigt, dass die Wahl der Frequenz- und Zeitaggregationsschicht die wichtigste architektonische Entscheidung ist. Studie (C) war eine detaillierte Untersuchung von Computer Vision-ähnlichen Convolutional Neural Networks für die Klassifizierung von Säuglingslauten. Ziel war es, die wichtigsten architektonischen Eigenschaften zur Steigerung der Erkennungsperformance zu bestimmen. Die Ergebnisse bestätigten die Bedeutung der Aggregationsschicht. Zusätzlich Eigenschaften, die als wichtig identifiziert wurden, waren die Eingangsgröße der vollständig verbundenen Schichten und das akkumulierte rezeptive Feld. Studie (D) war eine Untersuchung zur Kompensation der Klassenimbalance zur Erkennung von Schimpansenrufen in Langzeitaufnahmen. Ziel war es, herauszufinden, welche Kompensationsmethode aus einer Menge ausgewählter Methoden die Performance eines Deep Learning Systems am meisten verbessert. Die Ergebnisse zeigten, dass Spektrogrammentrauschen am effektivsten war, während Methoden zur Kompensation des relativen Ungleichgewichts die Performance entweder gleichhielten oder verringerten.:1. Introduction 2. Foundations in Automatic Recognition of Acoustic Communication 3. State of Research 4. Study (A): Investigation of the Assessment of Infant Vocalizations by Laypersons 5. Study (B): Comparison of Neural Network Types for Automatic Classification of Infant Vocalizations 6. Study (C): Detailed Investigation of CNNs for Automatic Classification of Infant Vocalizations 7. Study (D): Compensating Class Imbalance for Acoustic Chimpanzee Detection With Convolutional Recurrent Neural Networks 8. Conclusion and Collected Discussion 9. Appendi
    corecore