610 research outputs found

    TASE: Task-Aware Speech Enhancement for Wake-Up Word Detection in Voice Assistants

    Get PDF
    Wake-up word spotting in noisy environments is a critical task for an excellent user experience with voice assistants. Unwanted activation of the device is often due to the presence of noises coming from background conversations, TVs, or other domestic appliances. In this work, we propose the use of a speech enhancement convolutional autoencoder, coupled with on-device keyword spotting, aimed at improving the trigger word detection in noisy environments. The end-to-end system learns by optimizing a linear combination of losses: a reconstruction-based loss, both at the log-mel spectrogram and at the waveform level, as well as a specific task loss that accounts for the cross-entropy error reported along the keyword spotting detection. We experiment with several neural network classifiers and report that deeply coupling the speech enhancement together with a wake-up word detector, e.g., by jointly training them, significantly improves the performance in the noisiest conditions. Additionally, we introduce a new publicly available speech database recorded for the Telefónica's voice assistant, Aura. The OK Aura Wake-up Word Dataset incorporates rich metadata, such as speaker demographics or room conditions, and comprises hard negative examples that were studiously selected to present different levels of phonetic similarity with respect to the trigger words 'OK Aura'. Keywords: speech enhancement; wake-up word; keyword spotting; deep learning; convolutional neural networ

    A Novel Echo State Network Autoencoder for Anomaly Detection in Industrial Iot Systems

    Get PDF
    The Industrial Internet of Things (IIoT) technology had a very strong impact on the realization of smart frameworks for detecting anomalous behaviors that could be potentially dangerous to a system. In this regard, most of the existing solutions involve the use of Artificial Intelligence (AI) models running on Edge devices, such as Intelligent Cyber Physical Systems (ICPS) typically equipped with sensing and actuating capabilities. However, the hardware restrictions of these devices make the implementation of an effective anomaly detection algorithm quite challenging. Considering an industrial scenario, where signals in the form of multivariate time-series should be analyzed to perform a diagnosis, Echo State Networks (ESNs) are a valid solution to bring the power of neural networks into low complexity models meeting the resource constraints. On the other hand, the use of such a technique has some limitations when applied in unsupervised contexts. In this paper, we propose a novel model that combines ESNs and autoencoders (ESN-AE) for the detection of anomalies in industrial systems. Unlike the ESN-AE models presented in the literature, our approach decouples the encoding and decoding steps and allows the optimization of both the processes while performing the dimensionality reduction. Experiments demonstrate that our solution outperforms other machine learning approaches and techniques we found in the literature resulting also in the best trade-off in terms of memory footprint and inference time

    Exploring the Potential of Machine and Deep Learning Models for OpenStreetMap Data Quality Assessment and Improvement (Short Paper)

    Get PDF
    The OpenStreetMap (OSM) project is a widely-used crowdsourced geographic data platform that allows users to contribute, edit, and access geographic information. However, the quality of the data in OSM is often uncertain, and assessing the quality of OSM data is crucial for ensuring its reliability and usability. Recently, the use of machine and deep learning models has shown to be promising in assessing and improving the quality of OSM data. In this paper, we explore the current state-of-the-art machine learning models for OSM data quality assessment and improvement as an attempt to discuss and classify the underlying methods into different categories depending on (1) the associated learning paradigm (supervised or unsupervised learning-based methods), (2) the usage of extrinsic or intrinsic-based metrics (i.e., assessing OSM data by comparing it against authoritative external datasets or via computing some internal quality indicators), and (3) the use of traditional or deep learning-based models for predicting and evaluating OSM features. We then identify the main trends and challenges in this field and provide recommendations for future research aiming at improving the quality of OSM data in terms of completeness, accuracy, and consistency

    Deep generative modeling for single-cell transcriptomics.

    Get PDF
    Single-cell transcriptome measurements can reveal unexplored biological diversity, but they suffer from technical noise and bias that must be modeled to account for the resulting uncertainty in downstream analyses. Here we introduce single-cell variational inference (scVI), a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells ( https://github.com/YosefLab/scVI ). scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. We used scVI for a range of fundamental analysis tasks including batch correction, visualization, clustering, and differential expression, and achieved high accuracy for each task

    Deep Learning Approach for Building Detection Using LiDAR-Orthophoto Fusion

    Full text link
    © 2018 Faten Hamed Nahhas et al. This paper reports on a building detection approach based on deep learning (DL) using the fusion of Light Detection and Ranging (LiDAR) data and orthophotos. The proposed method utilized object-based analysis to create objects, a feature-level fusion, an autoencoder-based dimensionality reduction to transform low-level features into compressed features, and a convolutional neural network (CNN) to transform compressed features into high-level features, which were used to classify objects into buildings and background. The proposed architecture was optimized for the grid search method, and its sensitivity to hyperparameters was analyzed and discussed. The proposed model was evaluated on two datasets selected from an urban area with different building types. Results show that the dimensionality reduction by the autoencoder approach from 21 features to 10 features can improve detection accuracy from 86.06% to 86.19% in the working area and from 77.92% to 78.26% in the testing area. The sensitivity analysis also shows that the selection of the hyperparameter values of the model significantly affects detection accuracy. The best hyperparameters of the model are 128 filters in the CNN model, the Adamax optimizer, 10 units in the fully connected layer of the CNN model, a batch size of 8, and a dropout of 0.2. These hyperparameters are critical to improving the generalization capacity of the model. Furthermore, comparison experiments with the support vector machine (SVM) show that the proposed model with or without dimensionality reduction outperforms the SVM models in the working area. However, the SVM model achieves better accuracy in the testing area than the proposed model without dimensionality reduction. This study generally shows that the use of an autoencoder in DL models can improve the accuracy of building recognition in fused LiDAR-orthophoto data
    corecore