Search CORE

30 research outputs found

Low latency modeling of temporal contexts for speech recognition

Author: Peddinti Vijayaditya
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 22/05/2018
Field of study

This thesis focuses on the development of neural network acoustic models for large vocabulary continuous speech recognition (LVCSR) to satisfy the design goals of low latency and low computational complexity. Low latency enables online speech recognition; and low computational complexity helps reduce the computational cost both during training and inference. Long span sequential dependencies and sequential distortions in the input vector sequence are a major challenge in acoustic modeling. Recurrent neural networks have been shown to effectively model these dependencies. Specifically, bidirectional long short term memory (BLSTM) networks, provide state-of-the-art performance across several LVCSR tasks. However the deployment of bidirectional models for online LVCSR is non-trivial due to their large latency; and unidirectional LSTM models are typically preferred. In this thesis we explore the use of hierarchical temporal convolution to model long span temporal dependencies. We propose a sub-sampled variant of these temporal convolution neural networks, termed time-delay neural networks (TDNNs). These sub-sampled TDNNs reduce the computation complexity by ~5x, compared to TDNNs, during frame randomized pre-training. These models are shown to be effective in modeling long-span temporal contexts, however there is a performance gap compared to (B)LSTMs. As recent advancements in acoustic model training have eliminated the need for frame randomized pre-training we modify the TDNN architecture to use higher sampling rates, as the increased computation can be amortized over the sequence. These variants of sub- sampled TDNNs provide performance superior to unidirectional LSTM networks, while also affording a lower real time factor (RTF) during inference. However we show that the BLSTM models outperform both the TDNN and LSTM models. We propose a hybrid architecture interleaving temporal convolution and LSTM layers which is shown to outperform the BLSTM models. Further we improve these BLSTM models by using higher frame rates at lower layers and show that the proposed TDNN- LSTM model performs similar to these superior BLSTM models, while reducing the overall latency to 200 ms. Finally we describe an online system for reverberation robust ASR, using the above described models in conjunction with other data augmentation techniques like reverberation simulation, which simulates far-field environments, and volume perturbation, which helps tackle volume variation even without gain normalization

JScholarship

Total coloring of 1-toroidal graphs of maximum degree at least 11 and no adjacent triangles

Author: AV Kostochka
AV Kostochka
DP Sanders
G Ringel
HP Yap
M Rosenfeld
N Vijayaditya
OV Borodin
OV Borodin
OV Borodin
Tao Wang
TR Jensen
VG Vizing
X Zhang
X Zhang
X Zhang
X Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/11/2018
Field of study

A {\em total coloring} of a graph

G

is an assignment of colors to the vertices and the edges of

G

such that every pair of adjacent/incident elements receive distinct colors. The {\em total chromatic number} of a graph

G

, denoted by \chiup''(G), is the minimum number of colors in a total coloring of

G

. The well-known Total Coloring Conjecture (TCC) says that every graph with maximum degree

\Delta

admits a total coloring with at most

\Delta + 2

colors. A graph is {\em

1

-toroidal} if it can be drawn in torus such that every edge crosses at most one other edge. In this paper, we investigate the total coloring of

1

-toroidal graphs, and prove that the TCC holds for the

1

-toroidal graphs with maximum degree at least~

11

and some restrictions on the triangles. Consequently, if

G

is a

1

-toroidal graph with maximum degree

\Delta

at least~

11

and without adjacent triangles, then

G

admits a total coloring with at most

\Delta + 2

colors.Comment: 10 page

arXiv.org e-Print Archive

Crossref

NICNET - a Hierarchic distributed computer-communication network for decision support in the Indian Government

Author: Bobde D. P.
Kutty K. K. K.
Moni M.
Seshagiri N.
Sharma Y. K.
Vijayaditya N.
Publication venue: Proceedings of the International Conference on Computer-Communication (CCDC-'87), New Delhi
Publication date: 01/01/1987
Field of study

A decision support information system for the Indian Government is being evolved, based on the design of a predominantly query-based computer network with hierarchric distributed databases and random access communication. The four level hierarchy spans 439 districts at the lowest level, the Central Government headquarters in New Delhi, the set of 32 State Capitals and Union Territories, and the set of four Regional Centres. With interference tolerance and random access as two guiding principles behind the choice, Spread Spectrum transmission and Code Division Multiple Access system of satellite communication was adopted. Each node of the network is a 32-bit computer which is capable of local bulk storage of up to three units of 300 megabytes each for purposes of queryaccessible distributed databases. The design and implementation of such a distributed database has endowed the network with the capability to distribute the data related to such databases over various nodes in the network so as to be able to accept a query from any of the nodes

DR-NTU (Digital Repository of NTU)

On topological relaxations of chromatic conjectures

Author: Ambrus Zsbán
Behzad
Björner
Catlin
Csorba
Csorba
Csorba
Csorba
Dochtermann
El-Zahar
Erdős
Erdős
Fan
Geelen
Gyárfás
Gábor Simonyi
Hell
Imrich
Jensen
Kahn
Kawarabayashi
Kawarabayashi
Kawarabayashi
Kilakos
Kneser
Kostochka
Kozlov
Lovász
Lovász
Matoušek
Matoušek
Reed
Robertson
Rosenfeld
Scheinerman
Schrijver
Simonyi
Simonyi
Simonyi
Talbot
Tardif
Tardif
Tardif
Tardif
Thomassen
Toft
Vijayaditya
Vizing
Publication venue: 'Elsevier BV'
Publication date: 24/02/2010
Field of study

There are several famous unsolved conjectures about the chromatic number that were relaxed and already proven to hold for the fractional chromatic number. We discuss similar relaxations for the topological lower bound(s) of the chromatic number. In particular, we prove that such a relaxed version is true for the Behzad-Vizing conjecture and also discuss the conjectures of Hedetniemi and of Hadwiger from this point of view. For the latter, a similar statement was already proven in an earlier paper of the first author with G. Tardos, our main concern here is that the so-called odd Hadwiger conjecture looks much more difficult in this respect. We prove that the statement of the odd Hadwiger conjecture holds for large enough Kneser graphs and Schrijver graphs of any fixed chromatic number

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Low latency modeling of temporal contexts for speech recognition

Author: Peddinti Vijayaditya
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 22/05/2018
Field of study

Johns Hopkins University

National Informatics Centre

Author: Vijayaditya N
Publication venue: NISCAIR-CSIR, India
Publication date: 01/12/1975
Field of study

175-17

NOPR

On Total Chromatic Number of a Graph

Author: N. Vijayaditya
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

3-D CNN MODELS FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION

Author: Ganapathy Sriram
Peddinti Vijayaditya
Publication venue: IEEE
Publication date
Field of study

Automatic speech recognition (ASR) in far-field reverberant environments, especially when involving natural conversational multiparty speech conditions, is challenging even with the state-of-theart recognition methodologies. The two main issues are artifacts in the signal due to reverberation and the presence of multiple speakers. In this paper, we propose a three dimensional (3-D) convolutional neural network (CNN) architecture for multi-channel far-field ASR. This architecture processes time, frequency & channel dimensions of the input spectrogram to learn representations using convolutional layers. Experiments are performed on the REVERB challenge LVCSR task and the augmented multi-party (AMI) LVCSR task using the array microphone recordings. The proposed method shows improvements over the baseline system that uses beamforming of the multi-channel audio along with a 2-D conventional CNN framework (absolute improvements of 1.1 % over the beamformed baseline system on AMI dataset)

Crossref

Open Access Repository of IISc Research Publications

Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs

Author: Daniel Povey
Sanjeev Khudanpur
Vijayaditya Peddinti
Yiming Wang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref