39 research outputs found

    An Information Theoretic Approach to Speaker Diarization of Meeting Recordings

    Get PDF
    In this thesis we investigate a non parametric approach to speaker diarization for meeting recordings based on an information theoretic framework. The problem is formulated using the Information Bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. The distance between speech segments is selected as the Jensen-Shannon divergence as it arises from the IB objective function optimization. In the first part of the thesis, we explore IB based diarization with Mel frequency cepstral coefficients (MFCC) as input features. We study issues related to IB based speaker diarization such as optimizing the IB objective function, criteria for inferring the number of speakers. Furthermore, we benchmark the proposed system against a state-of-the-art systemon the NIST RT06 (Rich Transcription) meeting data for speaker diarization. The IB based system achieves similar speaker error rates (16.8%) as compared to a baseline HMM/GMM system (17.0%). This approach being non parametric clustering, perform diarization six times faster than realtime while the baseline is slower than realtime. The second part of thesis proposes a novel feature combination system in the context of IB diarization. Both speaker clustering and speaker realignment steps are discussed. In contrary to current systems, the proposed method avoids the feature combination by averaging log-likelihood scores. Two different sets of features were considered – (a) combination of MFCC features with time delay of arrival features (b) a four feature stream combination that combines MFCC, TDOA, modulation spectrum and frequency domain linear prediction. Experiments show that the proposed system achieve 5% absolute improvement over the baseline in case of two feature combination, and 7% in case of four feature combination. The increase in algorithm complexity of the IB system is minimal with more features. The system with four feature input performs in real time that is ten times faster than the GMM based system

    An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization

    Full text link

    A prospective study on impact of patient counselling on quality of life of patients with atopic dermatitis

    Get PDF
    Background: Atopic dermatitis is an acute, sub-acute or chronic relapsing skin disorder characterized by intense itching, pruritus and oozing. It adversely affects the routine activities of patients for which effective treatment is to be provided along with proper counselling. The aim of the current study was to evaluate the impact of patient counselling on quality of life (QoL).Methods: A prospective study was conducted in 108 patients recruited from the Department of Dermatology for a period of 6 months. A written informed consent was taken. Out of the 108 patients, 54 patients received tacrolimus and the remaining received corticosteroids. The collected data was analysed and presented. Data was collected by using a suitably designed proforma. Dermatology life quality index (DLQI) was used for assessing QoL. Patients were counselled regarding the disease, drugs and lifestyle modifications using patient information leaflet (PIL).Results: The current study found that patient counselling was effective for both the groups with a p value <0.05. The effect of disease in quality of life improved from severe to mild in both groups (prior to counselling QoL mean value of 2.93±0.61 shifted to 1.18±0.71 post counselling). In the tacrolimus group, QoL mean value of 2.81±0.61 shifted to 0.98±0.71 after counselling. In the corticosteroids group, a shift from a mean QoL value of 3.05±0.59 to a mean of 1.38±0.65 was observed post counselling.Conclusions: The provision of effective counselling was found to have a profound impact on improving patient’s quality of life. A transition from severe effects of the disease to milder effects of the disease on quality of life was observed

    Integration of TDOA Features in Information Bottleneck Framework for Fast Speaker Diarization

    Get PDF
    In this paper we address the combination of multiple feature streams in a fast speaker diarization system for meeting recordings. Whenever Multiple Distant Microphones (MDM) are used, it is possible to estimate the Time Delay of Arrival (TDOA) for different channels. In \cite{xavi_comb}, it is shown that TDOA can be used as additional features together with conventional spectral features for improving speaker diarization. We investigate here the combination of TDOA and spectral features in a fast diarization system based on the Information Bottleneck principle. We evaluate the algorithm on the NIST RT06 diarization task. Adding TDOA features to spectral features reduces the speaker error by 3\% absolute. Results are comparable to those of conventional HMM/GMM based systems with consistent reduction in computational complexity

    Mutual Information based Channel Selection for Speaker Diarization of Meetings Data

    Get PDF
    In the meeting case scenario, audio is often recorded using Multiple Distance Microphones (MDM) in a non-intrusive manner. Typically a beamforming is performed in order to obtain a single enhanced signal out of the multiple channels. This paper investigates the use of mutual information for selecting the channel subset that produces the lowest error in a diarization system. Conventional systems perform channel selection on the basis of signal properties such as SNR, cross correlation. In this paper, we propose the use of a mutual information measure that is directly related to the objective function of the diarization system. The proposed algorithms are evaluated on the NIST RT 06 eval dataset. Channel selection improves the speaker error by 1.1% absolute (6.5% relative) w.r.t. the use of all channels

    AGGLOMERATIVE INFORMATION BOTTLENECK FOR SPEAKER DIARIZATION OF MEETINGS DATA

    Get PDF
    In this paper, we investigate the use of agglomerative Information Bottleneck (aIB) clustering for the speaker diarization task of meetings data. In contrary to the state-of-the-art diarization systems that models individual speakers with Gaussian Mixture Models, the proposed algorithm is completely non parametric . Both clustering and model selection issues of non-parametric models are addressed in this work. The proposed algorithm is evaluated on meeting data on the RT06 evaluation data set. The system is able to achieve Diarization Error Rates comparable to state-of-the-art systems at a much lower computational complexity

    An Information Theoretic Approach to Speaker Diarization of Meeting Data

    Get PDF
    A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the {\em Information Bottleneck} (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the trade-off between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (Rich Transcription) data set for speaker diarization of meeting. The IB based system achieves a Diarization Error Rate of 23.2%23.2\% as compared to 23.6%23.6\% of the baseline system. This approach being mainly based on non-parametric clustering, it runs significantly faster then the baseline HMM/GMM based system, resulting in faster-then-real-time diarization
    corecore