Search CORE

109 research outputs found

Efficient audio signal processing for embedded systems

Author: Chiu Leung Kin
Publication venue: Georgia Institute of Technology
Publication date: 21/05/2012
Field of study

We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.PhDCommittee Chair: Anderson, David; Committee Member: Hasler, Jennifer; Committee Member: Hunt, William; Committee Member: Lanterman, Aaron; Committee Member: Minch, Bradle

Scholarly Materials And Research @ Georgia Tech

Deep learning for audio-visual speech recognition

Author: Κουμπαρούλης Αλέξανδρος Κ.
Publication venue
Publication date: 01/01/2017
Field of study

University of Thessaly Institutional Repository

Phonetic Event-based Whole-Word Modeling Approaches for Speech Recognition

Author: Kintzley Keith Russell
Publication venue: Johns Hopkins University
Publication date
Field of study

Speech is composed of basic speech sounds called phonemes, and these subword units are the foundation of most speech recognition systems. While detailed acoustic models of phones (and phone sequences) are common, most recognizers model words themselves as a simple concatenation of phonemes and do not closely model the temporal relationships between phonemes within words. Human speech production is constrained by the movement of speech articulators, and there is abundant evidence to indicate that human speech recognition is inextricably linked to the temporal patterns of speech sounds. Structures such as the hidden Markov model (HMM) have proved extremely useful and effective because they offer a convenient framework for combining acoustic modeling of phones with powerful probabilistic language models. However, this convenience masks deficiencies in temporal modeling. Additionally, robust recognition requires complex automatic speech recognition (ASR) systems and entails non-trivial computational costs. As an alternative, we extend previous work on the point process model (PPM) for keyword spotting, an approach to speech recognition expressly based on whole-word modeling of the temporal relations of phonetic events. In our research, we have investigated and advanced a number of major components of this system. First, we have considered alternate methods of determining phonetic events from phone posteriorgrams. We have introduced several parametric approaches to modeling intra-word phonetic timing distributions which allow us to cope with data sparsity issues. We have substantially improved algorithms used to compute keyword detections, capitalizing on the sparse nature of the phonetic input which permits the system to be scaled to large data sets. We have considered enhanced CART-based modeling of phonetic timing distributions based on related text-to-speech synthesis work. Lastly, we have developed a point process based spoken term detection system and applied it to the conversational telephone speech task of the 2006 NIST Spoken Term Detection evaluation. We demonstrate the PPM system to be competitive with state-of-the-art phonetic search systems while requiring significantly fewer computational resources

JScholarship

Speaker identification using multimodal neural networks and wavelet analysis

Author: Abdalla M.I.
Amrouche A.
Arora S.
Benesty J.
Chetty G.
Chi T.S.
Deshpande M.S.
Gelbart D.
Gomez P.
Hall D.L.
Holmes W.
Kekre1 H.B.
Mallat S.
Morris A.
Rabiner L.
Revada L.K.V.
Revathi A.
Ross A.
Shukla A.
Suvarna Kumar G.
Vetterli M.
Ye J.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/03/2015
Field of study

© 2014 The Authors. Published by IET. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1049/iet-bmt.2014.0011The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem of identifying a speaker from its voice regardless of the content. In this study, the authors designed and implemented a novel text-independent multimodal speaker identification system based on wavelet analysis and neural networks. Wavelet analysis comprises discrete wavelet transform, wavelet packet transform, wavelet sub-band coding and Mel-frequency cepstral coefficients (MFCCs). The learning module comprises general regressive, probabilistic and radial basis function neural networks, forming decisions through a majority voting scheme. The system was found to be competitive and it improved the identification rate by 15% as compared with the classical MFCC. In addition, it reduced the identification time by 40% as compared with the back-propagation neural network, Gaussian mixture model and principal component analysis. Performance tests conducted using the GRID database corpora have shown that this approach has faster identification time and greater accuracy compared with traditional approaches, and it is applicable to real-time, text-independent speaker identification systems

Crossref

Wolverhampton Intellectual Repository and E-theses

Speech-driven facial animation with realistic dynamics

Author: A. Bojorquez
A. Esposito
I. Rudomin
J.L. Castillo
O.N. Garcia
P.K. Kakumanu
R. Gutierrez-Osuna
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Data-Driven Representation Learning in Multimodal Feature Fusion

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance. This dissertation focuses on the representation learning approaches as the fusion strategy. Specifically, the objective is to learn the shared latent representation which jointly exploit the structural information encoded in all modalities, such that a straightforward learning model can be adopted to obtain the prediction. We first consider sensor fusion, a typical multimodal fusion problem critical to building a pervasive computing platform. A systematic fusion technique is described to support both multiple sensors and descriptors for activity recognition. Targeted to learn the optimal combination of kernels, Multiple Kernel Learning (MKL) algorithms have been successfully applied to numerous fusion problems in computer vision etc. Utilizing the MKL formulation, next we describe an auto-context algorithm for learning image context via the fusion with low-level descriptors. Furthermore, a principled fusion algorithm using deep learning to optimize kernel machines is developed. By bridging deep architectures with kernel optimization, this approach leverages the benefits of both paradigms and is applied to a wide variety of fusion problems. In many real-world applications, the modalities exhibit highly specific data structures, such as time sequences and graphs, and consequently, special design of the learning architecture is needed. In order to improve the temporal modeling for multivariate sequences, we developed two architectures centered around attention models. A novel clinical time series analysis model is proposed for several critical problems in healthcare. Another model coupled with triplet ranking loss as metric learning framework is described to better solve speaker diarization. Compared to state-of-the-art recurrent networks, these attention-based multivariate analysis tools achieve improved performance while having a lower computational complexity. Finally, in order to perform community detection on multilayer graphs, a fusion algorithm is described to derive node embedding from word embedding techniques and also exploit the complementary relational information contained in each layer of the graph.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Metaheuristic design of feedforward neural networks: a review of two decades of research

Author: Abbass
Abraham
Ackley
Ajith Abraham
Akhand
Alba
Ali Ahmadi
Almeida
Alvarez
Amari
Andersen
Angeline
Arifovic
Augusteijn
Azimi-Sadjadi
Bakker
Baranyi
Battiti
Bertsekas
Bishop
Bland
Bousquet
Boussaid
Breiman
Brownlee
Carvalho
Chandra
Charalambous
Chen
Chen
Chen
Chen
Cho
Chrisley
Coello
Cortes
Costa
Cruz-Ramírez
Cybenko
Da
da Silva
Dai
Das
Das
Davis
de Albuquerque Teixeira
Deneubourg
Dhahri
Diebold
Ding
Ditzler
Dominey
Donate
Dorigo
Dumont
Engel
Fahlman
Feo
FernandezCaballero
Fister
Fletcher
Fogel
Fogel
Fontanari
Formato
Frean
Fukumizu
Fullér
Furtuna
Garcia-Pedrajas
García-Pedrajas
García-Pedrajas
Gaspar-Cunha
Geem
Geman
Gershenfeld
Ghalambaz
Girosi
Giustolisi
Glover
Goh
Goldberg
Gori
Gorin
Green
Grossberg
Hagan
Hansen
Haykin
Haykin
Hernández
Hestenes
Hinton
Hinton
Hinton
Hirose
Ho
Holland
Hopfield
Hornik
Hornik
Huang
Huang
Huang
Huang
Huang
Igel
Ilonen
Irani
Irani
Islam
Jacobs
Jain
Jain
Jin
Juang
Kaelbling
Karaboga
Karpat
Kennedy
Khan
Khan
Kim
Kim
Kim
Kim
Kiranyaz
Kirkpatrick
Kitano
Kitano
Kohonen
Kolmogorov
Kordík
Kouda
Koza
Kulluk
Kŭrková
Lam
Larrañaga
LeCun
Lera
Leshno
Leung
Leung
Lewenstein
Li
Lin
Lin
Ling
Lippmann
Liu
Liu
Lowe
Ludermir
Mahdavi
Maniezzo
March
Marquardt
Martínez-Muñoz
Mazurowski
McCulloch
Menczer
Merrill
Metropolis
Minku
Minsky
Mirjalili
Mirjalili
Mitra
Mjolsness
Mladenović
Moriarty
Murray
Nakama
Nandy
Narayanan
Natschläger
Nedjah
Niranjan
Niu
Nolfi
Oh
Ojha
Osman
Pan
Passino
Pearce
Pencina
Peng
Pettersson
Pipino
Polikar
Prechelt
Prisecaru
Puig
Rashedi
Reed
Ritchie
Rosenblatt
Rumelhart
Rumelhart
Saad
Salajegheh
Sarkar
Schaffer
Schapire
Schmidhuber
Schwefel
Sejnowski
Selmic
Sexton
Sexton
Sexton
Shang
Sharma
Sietsma
Simovici
Sivagaminathan
Slowik
Socha
Socha
Sokolova
Sporea
Stanley
Storn
Sum
Sörensen
Tang
Tayefeh Mahmoudi
Toh
Tong
Trelea
Trentin
Tsai
Tsai
Tsoulos
Twomey
Ulagammai
Van den Bergh
van der Voet
Varun Kumar Ojha
Venkadesh
Ventura
Vieira
Václav Snášel
Wand
Wang
Wessels
Weyland
Whitley
Widrow
Wiegand
Wilson
Wolpert
Wolpert
Xi-Zhao
Yaghini
Yang
Yang
Yao
Yao
Yao
Yao
Yao
Yao
Yao
Ye
Yin
Yusiong
Zhang
Zhang
Zhang
Zhang
Zhang
Zhao
Zhou
Zhou
Zikopoulos
Zăvoianu
Černỳ
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era

arXiv.org e-Print Archive

Central Archive at the University of Reading

Repository for Publications and Research Data

Crossref

DSpace at VSB Technical University of Ostrava