Search CORE

123 research outputs found

Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models

Author: Bacchiani Michiel
Kim Chanwoo
Narayanan Arun
Variani Ehsan
Publication venue
Publication date: 31/12/2018
Field of study

In this paper, we describe how to efficiently implement an acoustic room simulator to generate large-scale simulated data for training deep neural networks. Even though Google Room Simulator in [1] was shown to be quite effective in reducing the Word Error Rates (WERs) for far-field applications by generating simulated far-field training sets, it requires a very large number of Fast Fourier Transforms (FFTs) of large size. Room Simulator in [1] used approximately 80 percent of Central Processing Unit (CPU) usage in our CPU + Graphics Processing Unit (GPU) training architecture [2]. In this work, we implement an efficient OverLap Addition (OLA) based filtering using the open-source FFTW3 library. Further, we investigate the effects of the Room Impulse Response (RIR) lengths. Experimentally, we conclude that we can cut the tail portions of RIRs whose power is less than 20 dB below the maximum power without sacrificing the speech recognition accuracy. However, we observe that cutting RIR tail more than this threshold harms the speech recognition accuracy for rerecorded test sets. Using these approaches, we were able to reduce CPU usage for the room simulator portion down to 9.69 percent in CPU/GPU training architecture. Profiling result shows that we obtain 22.4 times speed-up on a single machine and 37.3 times speed up on Google's distributed training infrastructure.Comment: Published at INTERSPEECH 2018. (https://www.isca-speech.org/archive/Interspeech_2018/abstracts/2566.html

arXiv.org e-Print Archive

Crossref

Using biomarkers and early prophylactic treatment to prevent cardiotoxicity in cancer patients on chemotherapy

Author: Bacchiani Giulia
Cardinale Daniela
Publication venue: 'Stellenbosch University'
Publication date: 01/04/2017
Field of study

Cardiac toxicity induced by anticancer therapy is of considerable concern for, once it develops, it may compromise the clinical effectiveness of treatment independent of the oncologic prognosis. The main strategy to minimize cardiotoxicity is to detect high-risk patients and begin prophylactic treatment as early as possible. According to the current standard for monitoring cardiac function cardiotoxicity is usually detected only once a functional impairment has already occurred, thus precluding any chance of prevention. The measurement of cardio-specifi c biomarkers can be a valid diagnostic tool for the early identifi cation, assessment and monitoring of cardiotoxicity. The role of Troponin I in identifying patients with subclinicalcardiotoxicity and their subsequent treatment with angiotensin- converting enzyme inhibitors to prevent left ventricular ejection fraction (LVEF) reduction and cardiac events, is emerging as an effective strategy against these complications. When this approach is not feasible, a complete LVEFrecovery and a reduction in cardiac events may be achieved if left ventricular dysfunction (LVD) is detected early and the patient promptly treated with angiotensin-converting enzyme inhibitors, possibly in combination with beta-blocking agents

Directory of Open Access Journals

Stellenbosch University: SUNJournals

Intelligent Roundabout Insertion using Deep Reinforcement Learning

Author: Bacchiani Giulio
Capasso Alessandro Paolo
Molinari Daniele
Publication venue: 'Scitepress'
Publication date: 01/01/2020
Field of study

An important topic in the autonomous driving research is the development of maneuver planning systems. Vehicles have to interact and negotiate with each other so that optimal choices, in terms of time and safety, are taken. For this purpose, we present a maneuver planning module able to negotiate the entering in busy roundabouts. The proposed module is based on a neural network trained to predict when and how entering the roundabout throughout the whole duration of the maneuver. Our model is trained with a novel implementation of A3C, which we will call Delayed A3C (D-A3C), in a synthetic environment where vehicles move in a realistic manner with interaction capabilities. In addition, the system is trained such that agents feature a unique tunable behavior, emulating real world scenarios where drivers have their own driving styles. Similarly, the maneuver can be performed using different aggressiveness levels, which is particularly useful to manage busy scenarios where conservative rule-based policies would result in undefined waits

arXiv.org e-Print Archive

Crossref

From Simulation to Real World Maneuver Execution using Deep Reinforcement Learning

Author: Bacchiani Giulio
Broggi Alberto
Capasso Alessandro Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/12/2020
Field of study

Deep Reinforcement Learning has proved to be able to solve many control tasks in different fields, but the behavior of these systems is not always as expected when deployed in real-world scenarios. This is mainly due to the lack of domain adaptation between simulated and real-world data together with the absence of distinction between train and test datasets. In this work, we investigate these problems in the autonomous driving field, especially for a maneuver planning module for roundabout insertions. In particular, we present a system based on multiple environments in which agents are trained simultaneously, evaluating the behavior of the model in different scenarios. Finally, we analyze techniques aimed at reducing the gap between simulated and real-world data showing that this increased the generalization capabilities of the system both on unseen and real-world scenarios.Comment: Intelligent Vehicle Symposium 2020 (IV2020

arXiv.org e-Print Archive

Crossref

Restoring Punctuation and Capitalization in Transcribed Speech

Author: Bacchiani Michiel
Gravano Agustin
Jansche Martin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-basedn-gram language model. We study the effect on performance of varying the n-gram order (from n = 3 to n = 6) and the amount of training data (from 58 million to 55 billion tokens). Our results show that using larger training data sets consistently improves performance, while increasing the n-gram order does not help nearly as much

CiteSeerX

Crossref

Columbia University Academic Commons

A Session Subtyping Tool

Author: Bacchiani Lorenzo
Bravetti Mario
Lange Julien
Zavattaro Gianluigi
Publication venue
Publication date: 21/04/2021
Field of study

International audienceSession types are becoming popular and have been integrated in several mainstream programming languages. Nevertheless, while many programming languages consider asynchronous fifo channel communication, the notion of subtyping used in session type implementations is the one defined by Gay and Hole for synchronous communication. This might be because there are several notions of asynchronous session subtyping, these notions are usually undecidable, and only recently sound (but not complete) algorithmic characterizations for these subtypings have been proposed. But the fact that the definition of asynchronous session subtyping and the theory behind related algorithms are not easily accessible to non-experts may also prevent further integration. The aim of this paper, and of the tool presented therein, is to make the growing body of knowledge about asynchronous session subtyping more accessible, thus promoting its integration in practical applications of session types

Royal Holloway - Pure

INRIA a CCSD electronic archive server

Improved Name-Recognition with Meta-data Dependent Name Networks

Author: Bacchiani Michiel
Maskey Sameer R.
Roark Brian
Sproat Richard
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

A transcription system that requires accurate general name transcription is faced with the problem of covering the large number of names it may encounter. Without any prior knowledge, this requires a large increase in the size and complexity of the system due to the expansion of the lexicon. Furthermore, this increase will adversely affect the system performance due to the increased confusability. Here we propose a method that uses meta-data, available at runtime to ensure better name coverage without significantly increasing the system complexity. We tested this approach on a voicemail transcription task and assumed meta-data to be available in the form of a caller ID string (as it would show up on a caller ID enabled phone) and the name of the mailbox owner. Networks representing possible spoken realization of those names are generated at runtime and included in network of the decoder. The decoder network is built at training time using a class-dependent language model, with caller and mailbox name instances modeled as class tokens. The class tokens are replaced at test time with the name networks built from the meta-data. The proposed algorithm showed a reduction in the error rate of name tokens of 22.1%

CiteSeerX

Columbia University Academic Commons

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

Author: Bacchiani Michiel
Chen Zhifeng
Li Bo
Nguyen Patrick
Rao Kanishka
Sainath Tara N.
Sim Khe Chai
Weinstein Eugene
Wu Yonghui
Publication venue
Publication date: 05/12/2017
Field of study

Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network. In this work, we look at one such sequence-to-sequence model, namely listen, attend and spell (LAS), and explore the possibility of training a single model to serve different English dialects, which simplifies the process of training multi-dialect systems without the need for separate AM, PM and LMs for each dialect. We show that simply pooling the data from all dialects into one LAS model falls behind the performance of a model fine-tuned on each dialect. We then look at incorporating dialect-specific information into the model, both by modifying the training targets by inserting the dialect symbol at the end of the original grapheme sequence and also feeding a 1-hot representation of the dialect information into all layers of the model. Experimental results on seven English dialects show that our proposed system is effective in modeling dialect variations within a single LAS model, outperforming a LAS model trained individually on each of the seven dialects by 3.1 ~ 16.5% relative.Comment: submitted to ICASSP 201

arXiv.org e-Print Archive

Crossref