76 research outputs found

    Long-Running Speech Recognizer:An End-to-End Multi-Task Learning Framework for Online ASR and VAD

    Full text link
    When we use End-to-end automatic speech recognition (E2E-ASR) system for real-world applications, a voice activity detection (VAD) system is usually needed to improve the performance and to reduce the computational cost by discarding non-speech parts in the audio. This paper presents a novel end-to-end (E2E), multi-task learning (MTL) framework that integrates ASR and VAD into one model. The proposed system, which we refer to as Long-Running Speech Recognizer (LR-SR), learns ASR and VAD jointly from two seperate task-specific datasets in the training stage. With the assistance of VAD, the ASR performance improves as its connectionist temporal classification (CTC) loss function can leverage the VAD alignment information. In the inference stage, the LR-SR system removes non-speech parts at low computational cost and recognizes speech parts with high robustness. Experimental results on segmented speech data show that the proposed MTL framework outperforms the baseline single-task learning (STL) framework in ASR task. On unsegmented speech data, we find that the LR-SR system outperforms the baseline ASR systems that build an extra GMM-based or DNN-based voice activity detector.Comment: 5 pages, 2 figure

    Алгоритм виявлення голосової активності на основі короткочасових характеристик в умовах високої зашумленості

    Get PDF
    Алгоритм детектування голосової активності є важливою складовою в системах обробки звукової інформації. Ефективність більшості таких алгоритмів значно знижується при наявності шумових завад. Ідеальний детектор характеризується високою надійністю роботи, стійкістю до шумових завад, простотою реалізації. В роботі запропоновано алгоритм на основі трьох короткочасових характеристик, що дозволяє зменшити вплив шумових завад на якість детектування вокалізованих ділянок та в той же час зберегти простоту реалізації.Voice activity detection algorithm is an important component in the audio information processing systems. Effectiveness of most of these algorithms is greatly reduced in the pres-ence of noise. Ideal detector is characterized by high reliability work, resistant to noise inter-ference, simplicity of implementation. An algorithm based on short-term characteristics, reducing the impact of noise on the quality of detection; at the same time maintain the simplicity of implementation.Алгоритм детектирования речевой активности является важной составляющей в системах обработки звуковой информации. Эффективность большинства таких алго-ритмов значительно снижается при наличии шумовых помех. Идеальный детектор характеризуется высокой надежностью работы, устойчивостью к шумовым помехам, простотой реализации. В работе предложен алгоритм на основе кратковременных характеристик, что позволяет уменьшить влияние шумовых помех на качество детектирования, в то же время сохранить простоту реализации

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Interactive Learning of Probabilistic Decision Making by Service Robots with Multiple Skill Domains

    Get PDF
    This thesis makes a contribution to autonomous service robots, centered around two aspects. The first is modeling decision making in the face of incomplete information on top of diverse basic skills of a service robot. Second, based on such a model, it is investigated, how to transfer complex decision-making knowledge into the system. Interactive learning, naturally from both demonstrations of human teachers and in interaction with objects, yields decision-making models applicable by the robot

    IberSPEECH 2020: XI Jornadas en Tecnología del Habla and VII Iberian SLTech

    Get PDF
    IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de Tecnologías del Habla. Universidad de Valladoli

    Adaptive Cognitive Interaction Systems

    Get PDF
    Adaptive kognitive Interaktionssysteme beobachten und modellieren den Zustand ihres Benutzers und passen das Systemverhalten entsprechend an. Ein solches System besteht aus drei Komponenten: Dem empirischen kognitiven Modell, dem komputationalen kognitiven Modell und dem adaptiven Interaktionsmanager. Die vorliegende Arbeit enthält zahlreiche Beiträge zur Entwicklung dieser Komponenten sowie zu deren Kombination. Die Ergebnisse werden in zahlreichen Benutzerstudien validiert

    日常生活音からのリアルタイムADL 認識方法の研究

    Get PDF
    人間の行動や心情などを基にして,状況に応じて最適な制御ができるサービスが注目されているが,そのサービスを有用なものにするには,高次情報を得るためにセンサデータから得る低次情報が重要になる.そこで本研究では,ADL(日常生活行動)や心情などの把握を目的として,生活音や非言語音を話声や雑音と識別しながらリアルタイム認識ができるシステムの開発を行った.多種類の非言語音および生活音を対象としてリアルタイム認識を行った先行研究において,使われた認識手法によって,本研究で認識したい音声に対しても使えるかどうかについて検証した.その結果,「話声と非言語音が共存していないこと」や「雑音入力による誤検出対策が行われていない」という課題があり,さらにその手法が話声や非言語音の認識に向いていないという仮説を得た.そこで,「話声や雑音と識別するための手法」や「非言語音認識に適した状態定義手法」に関する既存研究について調査し,リアルタイム認識時の要件についても考慮した上で認識手法を提案した.さらに,提案手法に合った音声認識エンジンを用いてリアルタイム認識の実装を行うことにした.提案手法の認識精度を検証することを目的に,様々な話者や環境下での音声を使って3 種類の評価を行った.その結果,疑似音素列定義による非言語音同士での分類はそれなりの結果となったものの,連続音声からのリアルタイム認識を想定した処理を含めた場合,「非言語音の検出率」や「雑音入力による非言語音の誤検出」に関して課題が残った.その一方で話声による生活音および非言語音の誤検出は抑えることができたことに加えて,生活音については1 種類を除いて比較的正確な認識ができていた.また発話中に笑った場合でも,リアルタイム認識時と同様の設定で約65%の割合で笑いを検出することができたため,連続音声からのリアルタイム笑い声検出には本手法が有効になると考えた.電気通信大学201

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference