5,732 research outputs found

    DNN adaptation by automatic quality estimation of ASR hypotheses

    Full text link
    In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of "good quality" instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the "true" sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores "predicted" by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201

    End-to-end representation learning for Correlation Filter based tracking

    Full text link
    The Correlation Filter is an algorithm that trains a linear template to discriminate between images and their translations. It is well suited to object tracking because its formulation in the Fourier domain provides a fast solution, enabling the detector to be re-trained once per frame. Previous works that use the Correlation Filter, however, have adopted features that were either manually designed or trained for a different task. This work is the first to overcome this limitation by interpreting the Correlation Filter learner, which has a closed-form solution, as a differentiable layer in a deep neural network. This enables learning deep features that are tightly coupled to the Correlation Filter. Experiments illustrate that our method has the important practical benefit of allowing lightweight architectures to achieve state-of-the-art performance at high framerates.Comment: To appear at CVPR 201

    Transfer Learning for Speech and Language Processing

    Full text link
    Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

    Improvements to deep convolutional neural networks for LVCSR

    Full text link
    Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural Networks (DNN), as they are able to better reduce spectral variation in the input signal. This has also been confirmed experimentally, with CNNs showing improvements in word error rate (WER) between 4-12% relative compared to DNNs across a variety of LVCSR tasks. In this paper, we describe different methods to further improve CNN performance. First, we conduct a deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features. Second, we apply various pooling strategies that have shown improvements in computer vision to an LVCSR speech task. Third, we introduce a method to effectively incorporate speaker adaptation, namely fMLLR, into log-mel features. Fourth, we introduce an effective strategy to use dropout during Hessian-free sequence training. We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline. On a larger 400-hour BN task, we find an additional 4-5% relative improvement over our previous best CNN baseline.Comment: 6 pages, 1 figur

    REI:An integrated measure for software reusability

    Get PDF
    To capitalize upon the benefits of software reuse, an efficient selection among candidate reusable assets should be performed in terms of functional fitness and adaptability. The reusability of assets is usually measured through reusability indices. However, these do not capture all facets of reusability, such as structural characteristics, external quality attributes, and documentation. In this paper, we propose a reusability index (REI) as a synthesis of various software metrics and evaluate its ability to quantify reuse, based on IEEE Standard on Software Metrics Validity. The proposed index is compared with existing ones through a case study on 80 reusable open-source assets. To illustrate the applicability of the proposed index, we performed a pilot study, where real-world reuse decisions have been compared with decisions imposed by the use of metrics (including REI). The results of the study suggest that the proposed index presents the highest predictive and discriminative power; it is the most consistent in ranking reusable assets and the most strongly correlated to their levels of reuse. The findings of the paper are discussed to understand the most important aspects in reusability assessment (interpretation of results), and interesting implications for research and practice are provided

    Precision medicine and artificial intelligence : a pilot study on deep learning for hypoglycemic events detection based on ECG

    Get PDF
    Tracking the fluctuations in blood glucose levels is important for healthy subjects and crucial diabetic patients. Tight glucose monitoring reduces the risk of hypoglycemia, which can result in a series of complications, especially in diabetic patients, such as confusion, irritability, seizure and can even be fatal in specific conditions. Hypoglycemia affects the electrophysiology of the heart. However, due to strong inter-subject heterogeneity, previous studies based on a cohort of subjects failed to deploy electrocardiogram (ECG)-based hypoglycemic detection systems reliably. The current study used personalised medicine approach and Artificial Intelligence (AI) to automatically detect nocturnal hypoglycemia using a few heartbeats of raw ECG signal recorded with non-invasive, wearable devices, in healthy individuals, monitored 24 hours for 14 consecutive days. Additionally, we present a visualisation method enabling clinicians to visualise which part of the ECG signal (e.g., T-wave, ST-interval) is significantly associated with the hypoglycemic event in each subject, overcoming the intelligibility problem of deep-learning methods. These results advance the feasibility of a real-time, non-invasive hypoglycemia alarming system using short excerpts of ECG signal

    Precision medicine and artificial intelligence : a pilot study on deep learning for hypoglycemic events detection based on ECG

    Get PDF
    Tracking the fluctuations in blood glucose levels is important for healthy subjects and crucial diabetic patients. Tight glucose monitoring reduces the risk of hypoglycemia, which can result in a series of complications, especially in diabetic patients, such as confusion, irritability, seizure and can even be fatal in specific conditions. Hypoglycemia affects the electrophysiology of the heart. However, due to strong inter-subject heterogeneity, previous studies based on a cohort of subjects failed to deploy electrocardiogram (ECG)-based hypoglycemic detection systems reliably. The current study used personalised medicine approach and Artificial Intelligence (AI) to automatically detect nocturnal hypoglycemia using a few heartbeats of raw ECG signal recorded with non-invasive, wearable devices, in healthy individuals, monitored 24 hours for 14 consecutive days. Additionally, we present a visualisation method enabling clinicians to visualise which part of the ECG signal (e.g., T-wave, ST-interval) is significantly associated with the hypoglycemic event in each subject, overcoming the intelligibility problem of deep-learning methods. These results advance the feasibility of a real-time, non-invasive hypoglycemia alarming system using short excerpts of ECG signal
    corecore