359 research outputs found

    Cluster-Based Adaptation Using Density Forest for HMM Phone Recognition

    Get PDF
    Publication in the conference proceedings of EUSIPCO, Lisbon, Portugal, 201

    Pertanika Journal of Science & Technology

    Get PDF

    Pertanika Journal of Science & Technology

    Get PDF

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Safety Performance Prediction of Large-Truck Drivers in the Transportation Industry

    Get PDF
    The trucking industry and truck drivers play a key role in the United States commercial transportation sector. Accidents involving large trucks is one such big event that can cause huge problems to the driver, company, customer and other road users causing property damage and loss of life. The objective of this research is to concentrate on an individual transportation company and use their historical data to build models based on statistical and machine learning methods to predict accidents. The focus is to build models that has high accuracy and correctly predicts an accident. Logistic regression and penalized logistic regression models were tested initially to obtain some interpretation between the predictor variables and the response variable. Random forest, gradient boosting machine (GBM) and deep learning methods are explored to deal with high non-linear and complex data. The cost of fatal and non-fatal accidents is also discussed to weight the difference between training a driver and encountering an accident. Since accidents are very rare events, the model accuracy should be balanced between predicting non-accidents (specificity) and predicting accidents (sensitivity). This framework can be a base line for transportation companies to emphasis the benefits of prediction to have safer and more productive drivers

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Time- and value-continuous explainable affect estimation in-the-wild

    Get PDF
    Today, the relevance of Affective Computing, i.e., of making computers recognise and simulate human emotions, cannot be overstated. All technology giants (from manufacturers of laptops to mobile phones to smart speakers) are in a fierce competition to make their devices understand not only what is being said, but also how it is being said to recognise user’s emotions. The goals have evolved from predicting the basic emotions (e.g., happy, sad) to now the more nuanced affective states (e.g., relaxed, bored) real-time. The databases used in such research too have evolved, from earlier featuring the acted behaviours to now spontaneous behaviours. There is a more powerful shift lately, called in-the-wild affect recognition, i.e., taking the research out of the laboratory, into the uncontrolled real-world. This thesis discusses, for the very first time, affect recognition for two unique in-the-wild audiovisual databases, GRAS2 and SEWA. The GRAS2 is the only database till date with time- and value-continuous affect annotations for Labov effect-free affective behaviours, i.e., without the participant’s awareness of being recorded (which otherwise is known to affect the naturalness of one’s affective behaviour). The SEWA features participants from six different cultural backgrounds, conversing using a video-calling platform. Thus, SEWA features in-the-wild recordings further corrupted by unpredictable artifacts, such as the network-induced delays, frame-freezing and echoes. The two databases present a unique opportunity to study time- and value-continuous affect estimation that is truly in-the-wild. A novel ‘Evaluator Weighted Estimation’ formulation is proposed to generate a gold standard sequence from several annotations. An illustration is presented demonstrating that the moving bag-of-words (BoW) representation better preserves the temporal context of the features, yet remaining more robust against the outliers compared to other statistical summaries, e.g., moving average. A novel, data-independent randomised codebook is proposed for the BoW representation; especially useful for cross-corpus model generalisation testing when the feature-spaces of the databases differ drastically. Various deep learning models and support vector regressors are used to predict affect dimensions time- and value-continuously. Better generalisability of the models trained on GRAS2 , despite the smaller training size, makes a strong case for the collection and use of Labov effect-free data. A further foundational contribution is the discovery of the missing many-to-many mapping between the mean square error (MSE) and the concordance correlation coefficient (CCC), i.e., between two of the most popular utility functions till date. The newly invented cost function |MSE_{XY}/σ_{XY}| has been evaluated in the experiments aimed at demystifying the inner workings of a well-performing, simple, low-cost neural network effectively utilising the BoW text features. Also proposed herein is the shallowest-possible convolutional neural network (CNN) that uses the facial action unit (FAU) features. The CNN exploits sequential context, but unlike RNNs, also inherently allows data- and process-parallelism. Interestingly, for the most part, these white-box AI models have shown to utilise the provided features consistent with the human perception of emotion expression

    New deep learning approaches to domain adaptation and their applications in 3D hand pose estimation

    Full text link
    This study investigates several methods for using artificial intelligence to give machines the ability to see. It introduced several methods for image recognition that are more accurate and efficient compared to the existing approaches
    corecore