4 research outputs found


    Get PDF
    This study explores various speech data augmentation methods for the task of noise-robust fundamental frequency (F0) estimation with neural networks. The explored augmentation strategies are split into additive noise and channel-based augmentation and into vocoder-based augmentation methods. In vocoder-based augmentation, a glottal vocoder is used to enhance the accuracy of ground truth F0 used for training of the neural network, as well as to expand the training data diversity in terms of F0 patterns and vocal tract lengths of the talkers. Evaluations on the PTDB-TUG corpus indicate that noise and channel augmentation can be used to greatly increase the noise robustness of trained models, and that vocoder-based ground truth enhancement further increases model performance. For smaller datasets, vocoder-based diversity augmentation can also be used to increase performance. The best-performing proposed method greatly outperformed the compared F0 estimation methods in terms of noise robustness.Peer reviewe

    Comparison of End-to-End Neural Network Architectures and Data Augmentation Methods for Automatic Infant Motility Assessment Using Wearable Sensors

    Get PDF
    Infant motility assessment using intelligent wearables is a promising new approach for assessment of infant neurophysiological development, and where efficient signal analysis plays a central role. This study investigates the use of different end-to-end neural network architectures for processing infant motility data from wearable sensors. We focus on the performance and computational burden of alternative sensor encoder and time series modeling modules and their combinations. In addition, we explore the benefits of data augmentation methods in ideal and nonideal recording conditions. The experiments are conducted using a dataset of multisensor movement recordings from 7-month-old infants, as captured by a recently proposed smart jumpsuit for infant motility assessment. Our results indicate that the choice of the encoder module has a major impact on classifier performance. For sensor encoders, the best performance was obtained with parallel two-dimensional convolutions for intrasensor channel fusion with shared weights for all sensors. The results also indicate that a relatively compact feature representation is obtainable for within-sensor feature extraction without a drastic loss to classifier performance. Comparison of time series models revealed that feedforward dilated convolutions with residual and skip connections outperformed all recurrent neural network (RNN)-based models in performance, training time, and training stability. The experiments also indicate that data augmentation improves model robustness in simulated packet loss or sensor dropout scenarios. In particular, signal- and sensor-dropout-based augmentation strategies provided considerable boosts to performance without negatively affecting the baseline performance. Overall, the results provide tangible suggestions on how to optimize end-to-end neural network training for multichannel movement sensor data

    Ääneen lukemisen sujuvuuden automaattinen arviointi

    Get PDF
    Lukivaikeuksista kärsiviä etsitään arvioimalla ääneen lukemistaan. Arviointi vie aikaa, joka voitaisiin käyttää lukivaikeuden hoitamiseen. Automaattinen arviointi lisäisi hoitoon käytettävissä olevia resursseja, ja sitä voitaisiin käyttää myös harjoittelun tukena. Tässä työssä esitellään järjestelmä, joka arvioi lasten ääneen lukemisen sujuvuutta. Järjestelmä arvio puheen sujuvuutta joko yleisesti tai painotuksen, sujuvuuden, tahdin ja tunneilmaisun suhteen. Työssä analysoidaan myös järjestelmässä käytettyjen piirteiden vaikutusta arviointikriteereihin