47 research outputs found
Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition
Speech Emotion Recognition (SER) is a challenging task due to limited data
and blurred boundaries of certain emotions. In this paper, we present a
comprehensive approach to improve the SER performance throughout the model
lifecycle, including pre-training, fine-tuning, and inference stages. To
address the data scarcity issue, we utilize a pre-trained model, wav2vec2.0.
During fine-tuning, we propose a novel loss function that combines
cross-entropy loss with supervised contrastive learning loss to improve the
model's discriminative ability. This approach increases the inter-class
distances and decreases the intra-class distances, mitigating the issue of
blurred boundaries. Finally, to leverage the improved distances, we propose an
interpolation method at the inference stage that combines the model prediction
with the output from a k-nearest neighbors model. Our experiments on IEMOCAP
demonstrate that our proposed methods outperform current state-of-the-art
results.Comment: Accepted by lnterspeech 2023, poste
Zero-shot stance detection based on cross-domain feature enhancement by contrastive learning
Zero-shot stance detection is challenging because it requires detecting the
stance of previously unseen targets in the inference phase. The ability to
learn transferable target-invariant features is critical for zero-shot stance
detection. In this work, we propose a stance detection approach that can
efficiently adapt to unseen targets, the core of which is to capture
target-invariant syntactic expression patterns as transferable knowledge.
Specifically, we first augment the data by masking the topic words of
sentences, and then feed the augmented data to an unsupervised contrastive
learning module to capture transferable features. Then, to fit a specific
target, we encode the raw texts as target-specific features. Finally, we adopt
an attention mechanism, which combines syntactic expression patterns with
target-specific features to obtain enhanced features for predicting previously
unseen targets. Experiments demonstrate that our model outperforms competitive
baselines on four benchmark datasets
In vitro expression and analysis of the 826 human G protein-coupled receptors
ABSTRACT G protein-coupled receptors (GPCRs) are involved in all human physiological systems where they are responsible for transducing extracellular signals into cells. GPCRs signal in response to a diverse array of stimuli including light, hormones, and lipids, where these signals affect downstream cascades to impact both health and disease states. Yet, despite their importance as therapeutic targets, detailed molecular structures of only 30 GPCRs have been determined to date. A key challenge to their structure determination is adequate protein expression. Here we report the quantification of protein expression in an insect cell expression system for all 826 human GPCRs using two different fusion constructs. Expression characteristics are analyzed in aggregate and among each of the five distinct subfamilies. These data can be used to identify trends related to GPCR expression between different fusion constructs and between different GPCR families, and to prioritize lead candidates for future structure determination feasibility
Computational inference and analysis of genetic regulatory networks via a supervised combinatorial-optimization pattern
Zero-Delay Joint Source Channel Coding for a Bivariate Gaussian Source over the Broadcast Channel with One-Bit ADC Front Ends
In this work, we consider the zero-delay transmission of bivariate Gaussian sources over a Gaussian broadcast channel with one-bit analog-to-digital converter (ADC) front ends. An outer bound on the conditional distortion region is derived. Focusing on the minimization of the average distortion, two types of methods are proposed to design nonparametric mappings. The first one is based on the joint optimization between the encoder and decoder with the use of an iterative algorithm. In the second method, we derive the necessary conditions to develop the optimal encoder numerically. Using these necessary conditions, an algorithm based on gradient descent search is designed. Subsequently, the characteristics of the optimized encoding mapping structure are discussed, and inspired by which, several parametric mappings are proposed. Numerical results show that the proposed parametric mappings outperform the uncoded scheme and previous parametric mappings for broadcast channels with infinite resolution ADC front ends. The nonparametric mappings succeed in outperforming the parametric mappings. The causes for the differences between the performances of two nonparametric mappings are analyzed. The average distortions of the parametric and nonparametric mappings proposed here are close to the bound for the cases with one-bit ADC front ends in low channel signal-to-noise ratio regions
Research and implementation of license plate recognition based on android platform
This paper studies and optimizes license plate location and recognition in license plate recognition. A license plate recognition system based on Android platform is designed and implemented. Opencv and Tesseract OCR are integrated in Android studio environment. The license plate number is located by combining Laplace algorithm and HSV model. On the basis of fully understanding the principle of Tesseract OCR recognition, a large number of training pictures are generated by license plate number simulation generator, and license plate character library is generated by using jtessboxeditor tool, which realizes offline recognition of license plate number
Research and implementation of license plate recognition based on android platform
This paper studies and optimizes license plate location and recognition in license plate recognition. A license plate recognition system based on Android platform is designed and implemented. Opencv and Tesseract OCR are integrated in Android studio environment. The license plate number is located by combining Laplace algorithm and HSV model. On the basis of fully understanding the principle of Tesseract OCR recognition, a large number of training pictures are generated by license plate number simulation generator, and license plate character library is generated by using jtessboxeditor tool, which realizes offline recognition of license plate number
Recommended from our members
Incorporating Mental State into Contrastive Learning for Fine-grained Implicit Hate Speech Classification
Many people have suffered harm as a result of hate speech on social media. The majority of research has focused on coarse-grained explicit hate speech detection while disregarding fine-grained implicit hate speech classification. It is crucial for more effectively combating hate speech. Although the language used in implicit hate speech may vary greatly, the mental states involved are usually the same. There are rarely similarities and differences between the mental states present in implicit hate speech examined. We create a module to infer mental states from implicit hate speech to close this gap. Mental states primarily refer to the speaker's intent and the reader's reaction. Then, we use them as the positive sample in contrastive learning. This strategy can pull the implicit hate speech which has similar mental states in similar representations and push away different ones. Comprehensive experiment results demonstrate superior classification performance and generalization of the proposed method
Recommended from our members
Foreground Enhanced Network for Weakly Supervised Temporal Language Grounding
Temporal language grounding (TLG) aims to localize query-related events in videos, which explores how to cognize relationships of video content with language descriptions. According to selective visual attention mechanism in cognitive science, people’s cognition and understanding of what happens often rely on dynamic foreground information in the video. Nonetheless, background usually predominates the scenes so that query-related visual features and irrelevant ones are confused. Thus, we propose a Foreground Enhanced Network (FEN) to diminish the background effect from two aspects.
FEN at first in spatial dimension explicitly models the evolving foreground in video features by removing relatively unchanged background content. Besides, we propose a progressive contrastive sample generation module to gradually learn the differences between the predicted proposal and its elongated proposals that include the former as a portion, thereby distinguishing similar neighborhood frames. Experiments on two common-used datasets show the efficacy of our model