554 research outputs found

    Normalization and Transformation Techniques for Robust Speaker Recognition

    Get PDF

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Hierachical methods for large population speaker identification using telephone speech

    Get PDF
    This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion

    Speech assessment and characterization for law enforcement applications

    No full text
    Speech signals acquired, transmitted or stored in non-ideal conditions are often degraded by one or more effects including, for example, additive noise. These degradations alter the signal properties in a manner that deteriorates the intelligibility or quality of the speech signal. In the law enforcement context such degradations are commonplace due to the limitations in the audio collection methodology, which is often required to be covert. In severe degradation conditions, the acquired signal may become unintelligible, losing its value in an investigation and in less severe conditions, a loss in signal quality may be encountered, which can lead to higher transcription time and cost. This thesis proposes a non-intrusive speech assessment framework from which algorithms for speech quality and intelligibility assessment are derived, to guide the collection and transcription of law enforcement audio. These methods are trained on a large database labelled using intrusive techniques (whose performance is verified with subjective scores) and shown to perform favorably when compared with existing non-intrusive techniques. Additionally, a non-intrusive CODEC identification and verification algorithm is developed which can identify a CODEC with an accuracy of 96.8 % and detect the presence of a CODEC with an accuracy higher than 97 % in the presence of additive noise. Finally, the speech description taxonomy framework is developed, with the aim of characterizing various aspects of a degraded speech signal, including the mechanism that results in a signal with particular characteristics, the vocabulary that can be used to describe those degradations and the measurable signal properties that can characterize the degradations. The taxonomy is implemented as a relational database that facilitates the modeling of the relationships between various attributes of a signal and promises to be a useful tool for training and guiding audio analysts

    Quality of experience in telemeetings and videoconferencing: a comprehensive survey

    Get PDF
    Telemeetings such as audiovisual conferences or virtual meetings play an increasingly important role in our professional and private lives. For that reason, system developers and service providers will strive for an optimal experience for the user, while at the same time optimizing technical and financial resources. This leads to the discipline of Quality of Experience (QoE), an active field originating from the telecommunication and multimedia engineering domains, that strives for understanding, measuring, and designing the quality experience with multimedia technology. This paper provides the reader with an entry point to the large and still growing field of QoE of telemeetings, by taking a holistic perspective, considering both technical and non-technical aspects, and by focusing on current and near-future services. Addressing both researchers and practitioners, the paper first provides a comprehensive survey of factors and processes that contribute to the QoE of telemeetings, followed by an overview of relevant state-of-the-art methods for QoE assessment. To embed this knowledge into recent technology developments, the paper continues with an overview of current trends, focusing on the field of eXtended Reality (XR) applications for communication purposes. Given the complexity of telemeeting QoE and the current trends, new challenges for a QoE assessment of telemeetings are identified. To overcome these challenges, the paper presents a novel Profile Template for characterizing telemeetings from the holistic perspective endorsed in this paper

    In Car Audio

    Get PDF
    This chapter presents implementations of advanced in Car Audio Applications. The system is composed by three main different applications regarding the In Car listening and communication experience. Starting from a high level description of the algorithms, several implementations on different levels of hardware abstraction are presented, along with empirical results on both the design process undergone and the performance results achieved
    • …
    corecore