42 research outputs found
Analysis of a Modern Voice Morphing Approach using Gaussian Mixture Models for Laryngectomees
This paper proposes a voice morphing system for people suffering from
Laryngectomy, which is the surgical removal of all or part of the larynx or the
voice box, particularly performed in cases of laryngeal cancer. A primitive
method of achieving voice morphing is by extracting the source's vocal
coefficients and then converting them into the target speaker's vocal
parameters. In this paper, we deploy Gaussian Mixture Models (GMM) for mapping
the coefficients from source to destination. However, the use of the
traditional/conventional GMM-based mapping approach results in the problem of
over-smoothening of the converted voice. Thus, we hereby propose a unique
method to perform efficient voice morphing and conversion based on GMM,which
overcomes the traditional-method effects of over-smoothening. It uses a
technique of glottal waveform separation and prediction of excitations and
hence the result shows that not only over-smoothening is eliminated but also
the transformed vocal tract parameters match with the target. Moreover, the
synthesized speech thus obtained is found to be of a sufficiently high quality.
Thus, voice morphing based on a unique GMM approach has been proposed and also
critically evaluated based on various subjective and objective evaluation
parameters. Further, an application of voice morphing for Laryngectomees which
deploys this unique approach has been recommended by this paper.Comment: 6 pages, 4 figures, 4 tables; International Journal of Computer
Applications Volume 49, Number 21, July 201
Facial Emotion Recognition
We present a facial emotion recognition framework, built upon Swin vision
Transformers jointly with squeeze and excitation block (SE). A transformer
model based on an attention mechanism has been presented recently to address
vision tasks. Our method uses a vision transformer with a Squeeze excitation
block (SE) and sharpness-aware minimizer (SAM). We have used a hybrid dataset,
to train our model and the AffectNet dataset to evaluate the result of our
modelComment: arXiv admin note: text overlap with arXiv:2103.14030 by other author
Comparative Study and Optimization of Feature-Extraction Techniques for Content based Image Retrieval
The aim of a Content-Based Image Retrieval (CBIR) system, also known as Query
by Image Content (QBIC), is to help users to retrieve relevant images based on
their contents. CBIR technologies provide a method to find images in large
databases by using unique descriptors from a trained image. The image
descriptors include texture, color, intensity and shape of the object inside an
image. Several feature-extraction techniques viz., Average RGB, Color Moments,
Co-occurrence, Local Color Histogram, Global Color Histogram and Geometric
Moment have been critically compared in this paper. However, individually these
techniques result in poor performance. So, combinations of these techniques
have also been evaluated and results for the most efficient combination of
techniques have been presented and optimized for each class of image query. We
also propose an improvement in image retrieval performance by introducing the
idea of Query modification through image cropping. It enables the user to
identify a region of interest and modify the initial query to refine and
personalize the image retrieval results.Comment: 8 pages, 16 figures, 11 table
Dynamic Corrective Self-Distillation for Better Fine-Tuning of Pretrained Models
We tackle the challenging issue of aggressive fine-tuning encountered during
the process of transfer learning of pre-trained language models (PLMs) with
limited labeled downstream data. This problem primarily results in a decline in
performance on the subsequent task. Inspired by the adaptive boosting method in
traditional machine learning, we present an effective dynamic corrective
self-distillation (DCS) approach to improve the fine-tuning of the PLMs. Our
technique involves performing a self-distillation mechanism where, at each
iteration, the student model actively adapts and corrects itself by dynamically
adjusting the weights assigned to individual data points. This iterative
self-correcting process significantly enhances the overall fine-tuning
capability of PLMs, leading to improved performance and robustness. We
conducted comprehensive evaluations using the GLUE benchmark demonstrating the
efficacy of our method in enhancing the fine-tuning process for various PLMs
across diverse downstream tasks
Breaking Language Barriers: A Question Answering Dataset for Hindi and Marathi
The recent advances in deep-learning have led to the development of highly
sophisticated systems with an unquenchable appetite for data. On the other
hand, building good deep-learning models for low-resource languages remains a
challenging task. This paper focuses on developing a Question Answering dataset
for two such languages- Hindi and Marathi. Despite Hindi being the 3rd most
spoken language worldwide, with 345 million speakers, and Marathi being the
11th most spoken language globally, with 83.2 million speakers, both languages
face limited resources for building efficient Question Answering systems. To
tackle the challenge of data scarcity, we have developed a novel approach for
translating the SQuAD 2.0 dataset into Hindi and Marathi. We release the
largest Question-Answering dataset available for these languages, with each
dataset containing 28,000 samples. We evaluate the dataset on various
architectures and release the best-performing models for both Hindi and
Marathi, which will facilitate further research in these languages. Leveraging
similarity tools, our method holds the potential to create datasets in diverse
languages, thereby enhancing the understanding of natural language across
varied linguistic contexts. Our fine-tuned models, code, and dataset will be
made publicly available