5 research outputs found

    Unsupervised Representations Improve Supervised Learning in Speech Emotion Recognition

    Full text link
    Speech Emotion Recognition (SER) plays a pivotal role in enhancing human-computer interaction by enabling a deeper understanding of emotional states across a wide range of applications, contributing to more empathetic and effective communication. This study proposes an innovative approach that integrates self-supervised feature extraction with supervised classification for emotion recognition from small audio segments. In the preprocessing step, to eliminate the need of crafting audio features, we employed a self-supervised feature extractor, based on the Wav2Vec model, to capture acoustic features from audio data. Then, the output featuremaps of the preprocessing step are fed to a custom designed Convolutional Neural Network (CNN)-based model to perform emotion classification. Utilizing the ShEMO dataset as our testing ground, the proposed method surpasses two baseline methods, i.e. support vector machine classifier and transfer learning of a pretrained CNN. comparing the propose method to the state-of-the-art methods in SER task indicates the superiority of the proposed method. Our findings underscore the pivotal role of deep unsupervised feature learning in elevating the landscape of SER, offering enhanced emotional comprehension in the realm of human-computer interactions

    Efficiency of texture image enhancement by DCT-based filtering

    Get PDF
    International audienceTextures or high-detailed structures as well as image object shapes contain information that is widely exploited in pattern recognition and image classification. Noise can deteriorate these features and has to be removed. In this paper, we consider the influence of textural properties on efficiency of image enhancement by noise suppression for the posterior treatment. Among possible variants of denoising, filters based on discrete cosine transform known to be effective in removing additive white Gaussian noise are considered. It is shown that noise removal in texture images using the considered techniques can distort fine texture details. To detect such situations and to avoid texture degradation due to filtering, filtering efficiency predictors, including neural network based predictor, applicable to a wide class of images are proposed. These predictors use simple statistical parameters to estimate performance of the considered filters. Image enhancement is analysed in terms of both standard criteria and metrics of image visual quality for various scenarios of texture roughness and noise characteristics. The discrete cosine transform based filters are compared to several counterparts. Problems of noise removal in texture images are demonstrated for all of them. A special case of spatially correlated noise is considered as well. Potential efficiency of filtering is analysed for both studied noise models. It is shown that studied filters are close to the potential limits

    Effects of Aerial LiDAR Data Density on the Accuracy of Building Reconstruction

    Get PDF
    Previous work has identified a positive relationship between the density of aerial LiDAR input for building reconstruction and the accuracy of the resulting reconstructed models. We hypothesize a point of diminished returns at which higher data density no longer contributes meaningfully to higher accuracy in the end product. We investigate this relationship by subsampling a high-density dataset from the City of Surrey, BC to different densities and inputting each subsampled dataset to reconstruction using two different reconstruction methods. We then determine the accuracy of reconstruction based on manually created reference data, in terms of both 2D footprint accuracy and 3D model accuracy. We find that there is no quantitative evidence for meaningfully improved output accuracy from densities higher than 4 p/m2 for either method, although aesthetic improvements at higher point cloud densities are noted for one method

    Behavioral pedestrian tracking using a camera and lidar sensors on a moving vehicle

    Get PDF
    In this paper, we present a novel 2D–3D pedestrian tracker designed for applications in autonomous vehicles. The system operates on a tracking by detection principle and can track multiple pedestrians in complex urban traffic situations. By using a behavioral motion model and a non-parametric distribution as state model, we are able to accurately track unpredictable pedestrian motion in the presence of heavy occlusion. Tracking is performed independently, on the image and ground plane, in global, motion compensated coordinates. We employ Camera and LiDAR data fusion to solve the association problem where the optimal solution is found by matching 2D and 3D detections to tracks using a joint log-likelihood observation model. Each 2D–3D particle filter then updates their state from associated observations and a behavioral motion model. Each particle moves independently following the pedestrian motion parameters which we learned offline from an annotated training dataset. Temporal stability of the state variables is achieved by modeling each track as a Markov Decision Process with probabilistic state transition properties. A novel track management system then handles high level actions such as track creation, deletion and interaction. Using a probabilistic track score the track manager can cull false and ambiguous detections while updating tracks with detections from actual pedestrians. Our system is implemented on a GPU and exploits the massively parallelizable nature of particle filters. Due to the Markovian nature of our track representation, the system achieves real-time performance operating with a minimal memory footprint. Exhaustive and independent evaluation of our tracker was performed by the KITTI benchmark server, where it was tested against a wide variety of unknown pedestrian tracking situations. On this realistic benchmark, we outperform all published pedestrian trackers in a multitude of tracking metrics
    corecore