29 research outputs found
A Novel Method For Speech Segmentation Based On Speakers' Characteristics
Speech Segmentation is the process change point detection for partitioning an
input audio stream into regions each of which corresponds to only one audio
source or one speaker. One application of this system is in Speaker Diarization
systems. There are several methods for speaker segmentation; however, most of
the Speaker Diarization Systems use BIC-based Segmentation methods. The main
goal of this paper is to propose a new method for speaker segmentation with
higher speed than the current methods - e.g. BIC - and acceptable accuracy. Our
proposed method is based on the pitch frequency of the speech. The accuracy of
this method is similar to the accuracy of common speaker segmentation methods.
However, its computation cost is much less than theirs. We show that our method
is about 2.4 times faster than the BIC-based method, while the average accuracy
of pitch-based method is slightly higher than that of the BIC-based method.Comment: 14 pages, 8 figure
Likelihood-Maximizing-Based Multiband Spectral Subtraction for Robust Speech Recognition
Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected between speech quality improvement and the increase in recognition accuracy. This paper proposes a novel approach for solving this problem by considering SS and the speech recognizer not as two independent entities cascaded together, but rather as two interconnected components of a single system, sharing the common goal of improved speech recognition accuracy. This will incorporate important information of the statistical models of the recognition engine as a feedback for tuning SS parameters. By using this architecture, we overcome the drawbacks of previously proposed methods and achieve better recognition accuracy. Experimental evaluations show that the proposed method can achieve significant improvement of recognition rates across a wide range of signal to noise ratios
Persian Keyphrase Generation Using Sequence-to-Sequence Models
Keyphrases are a very short summary of an input text and provide the main
subjects discussed in the text. Keyphrase extraction is a useful upstream task
and can be used in various natural language processing problems, for example,
text summarization and information retrieval, to name a few. However, not all
the keyphrases are explicitly mentioned in the body of the text. In real-world
examples there are always some topics that are discussed implicitly. Extracting
such keyphrases requires a generative approach, which is adopted here. In this
paper, we try to tackle the problem of keyphrase generation and extraction from
news articles using deep sequence-to-sequence models. These models
significantly outperform the conventional methods such as Topic Rank, KPMiner,
and KEA in the task of keyphrase extraction
Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based Approach to Offensive Language Identification
This paper presents the models submitted by Ghmerti team for subtasks A and B
of the OffensEval shared task at SemEval 2019. OffensEval addresses the problem
of identifying and categorizing offensive language in social media in three
subtasks; whether or not a content is offensive (subtask A), whether it is
targeted (subtask B) towards an individual, a group, or other entities (subtask
C). The proposed approach includes character-level Convolutional Neural
Network, word-level Recurrent Neural Network, and some preprocessing. The
performance achieved by the proposed model for subtask A is 77.93%
macro-averaged F1-score
A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion
Speech Emotion Recognition (SER) is a challenging task. In this paper, we
introduce a modality conversion concept aimed at enhancing emotion recognition
performance on the MELD dataset. We assess our approach through two
experiments: first, a method named Modality-Conversion that employs automatic
speech recognition (ASR) systems, followed by a text classifier; second, we
assume perfect ASR output and investigate the impact of modality conversion on
SER, this method is called Modality-Conversion++. Our findings indicate that
the first method yields substantial results, while the second method
outperforms state-of-the-art (SOTA) speech-based approaches in terms of SER
weighted-F1 (WF1) score on the MELD dataset. This research highlights the
potential of modality conversion for tasks that can be conducted in alternative
modalities