127 research outputs found
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
Privacy-Protecting Techniques for Behavioral Data: A Survey
Our behavior (the way we talk, walk, or think) is unique and can be used as a biometric trait. It also correlates with sensitive attributes like emotions. Hence, techniques to protect individuals privacy against unwanted inferences are required. To consolidate knowledge in this area, we systematically reviewed applicable anonymization techniques. We taxonomize and compare existing solutions regarding privacy goals, conceptual operation, advantages, and limitations. Our analysis shows that some behavioral traits (e.g., voice) have received much attention, while others (e.g., eye-gaze, brainwaves) are mostly neglected. We also find that the evaluation methodology of behavioral anonymization techniques can be further improved
Multimedia Context Awareness for Smart Mobile Environments
openNowadays the development of the IoT framework and the resulting huge number of smart connected devices opens the door to exploit the presence of multiple smart nodes to accomplish a variety of tasks. Multimedia context awareness, together with the concept of ambient intelligence, is tightly related to the IoT framework, and it can be applied to a large number of smart scenarios. In this thesis, the aim is to study and analyze the role of context awareness in different applications related to smart mobile environments, such as future smart spaces and connected cities. Indeed, this research work focuses on different aspects of ambient intelligence, such as audio-awareness and wireless-awareness. In particular, this thesis tackles two main research topics: the first one, related to the framework of audio-awareness, concerns a multiple observations approach for smart speaker recognition in mobile environments; the second one, tied to the concept of wireless-awareness, regards Unmanned Aerial Vehicle (UAV) detection based on WiFi statistical fingerprint analysis.openXXXI CICLO - SC. E TECN. ING. ELETTR. E DELLE TEL. - Ambienti cognitivi interattiviGaribotto, Chiar
Biometric walk recognizer. Research and results on wearable sensor-based gait recognition
Gait is a biometric trait that can allow user authentication, though being classified as a "soft" one due to a certain lack in permanence, and to sensibility to specific conditions. The earliest research relies on computer vision-based approaches, especially applied in video surveillance. More recently, the spread of wearable sensors, especially those embedded in mobile devices, which are able to capture the dynamics of the walking pattern through simpler 1D signals, has spurred a different research line. This capture modality can avoid some problems related to computer vision-based techniques, but suffers from specific limitations. Related research is still in a less advanced phase with respect to other biometric traits. However, the promising results achieved so far, the increasing accuracy of sensors, the ubiquitous presence of mobile devices, and the low cost of related techniques, make this biometrics attractive and suggest to continue the investigations in this field. The first Chapters of this thesis deal with an introduction to biometrics, and more specifically to gait trait. A comprehensive review of technologies, approaches and strategies exploited by gait recognition proposals in the state-of-the-art is also provided. After such introduction, the contributions of this work are presented in details. Summarizing, it improves preceding result achieved during my Master Degree in Computer Science course of Biometrics and extended in my following Master Degree Thesis. The research deals with different strategies, including preprocessing and recognition techniques, applied to the gait biometrics, in order to allow both an automatic recognition and an improvement of the system accuracy
Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN
Call Centers have huge amount of audio data which can be used for achieving
valuable business insights and transcription of phone calls is manually tedious
task. An effective Automated Speech Recognition system can accurately
transcribe these calls for easy search through call history for specific
context and content allowing automatic call monitoring, improving QoS through
keyword search and sentiment analysis. ASR for Call Center requires more
robustness as telephonic environment are generally noisy. Moreover, there are
many low-resourced languages that are on verge of extinction which can be
preserved with help of Automatic Speech Recognition Technology. Urdu is the
most widely spoken language in the world, with 231,295,440 worldwide
still remains a resource constrained language in ASR. Regional call-center
conversations operate in local language, with a mix of English numbers and
technical terms generally causing a "code-switching" problem. Hence, this paper
describes an implementation framework of a resource efficient Automatic Speech
Recognition/ Speech to Text System in a noisy call-center environment using
Chain Hybrid HMM and CNN-TDNN for Code-Switched Urdu Language. Using Hybrid
HMM-DNN approach allowed us to utilize the advantages of Neural Network with
less labelled data. Adding CNN with TDNN has shown to work better in noisy
environment due to CNN's additional frequency dimension which captures extra
information from noisy speech, thus improving accuracy. We collected data from
various open sources and labelled some of the unlabelled data after analysing
its general context and content from Urdu language as well as from commonly
used words from other languages, primarily English and were able to achieve WER
of 5.2% with noisy as well as clean environment in isolated words or numbers as
well as in continuous spontaneous speech.Comment: 32 pages, 19 figures, 2 tables, preprin
- âŠ