12,611 research outputs found

    Learning Audio Sequence Representations for Acoustic Event Classification

    Full text link
    Acoustic Event Classification (AEC) has become a significant task for machines to perceive the surrounding auditory scene. However, extracting effective representations that capture the underlying characteristics of the acoustic events is still challenging. Previous methods mainly focused on designing the audio features in a 'hand-crafted' manner. Interestingly, data-learnt features have been recently reported to show better performance. Up to now, these were only considered on the frame-level. In this paper, we propose an unsupervised learning framework to learn a vector representation of an audio sequence for AEC. This framework consists of a Recurrent Neural Network (RNN) encoder and a RNN decoder, which respectively transforms the variable-length audio sequence into a fixed-length vector and reconstructs the input sequence on the generated vector. After training the encoder-decoder, we feed the audio sequences to the encoder and then take the learnt vectors as the audio sequence representations. Compared with previous methods, the proposed method can not only deal with the problem of arbitrary-lengths of audio streams, but also learn the salient information of the sequence. Extensive evaluation on a large-size acoustic event database is performed, and the empirical results demonstrate that the learnt audio sequence representation yields a significant performance improvement by a large margin compared with other state-of-the-art hand-crafted sequence features for AEC

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Unexpected Event Prediction in Wire Electrical Discharge Machining Using Deep Learning Techniques

    Get PDF
    Theoretical models of manufacturing processes provide a valuable insight into physical phenomena but their application to practical industrial situations is sometimes difficult. In the context of Industry 4.0, artificial intelligence techniques can provide efficient solutions to actual manufacturing problems when big data are available. Within the field of artificial intelligence, the use of deep learning is growing exponentially in solving many problems related to information and communication technologies (ICTs) but it still remains scarce or even rare in the field of manufacturing. In this work, deep learning is used to efficiently predict unexpected events in wire electrical discharge machining (WEDM), an advanced machining process largely used for aerospace components. The occurrence of an unexpected event, namely the change of thickness of the machined part, can be effectively predicted by recognizing hidden patterns from process signals. Based on WEDM experiments, different deep learning architectures were tested. By using a combination of a convolutional layer with gated recurrent units, thickness variation in the machined component could be predicted in 97.4% of cases, at least 2 mm in advance, which is extremely fast, acting before the process has degraded. New possibilities of deep learning for high-performance machine tools must be examined in the near future.The authors gratefully acknowledge the funding support received from the Spanish Ministry of Economy and Competitiveness and the FEDER operation program for funding the project "Scientific models and machine-tool advanced sensing techniques for efficient machining of precision components of Low Pressure Turbines" (DPI2017-82239-P) and UPV/EHU (UFI 11/29). The authors would also like to thank Euskampus and ONA-EDM for their support in this project
    • …
    corecore