49 research outputs found
A Fuzzy-Based Multimedia Content Retrieval Method Using Mood Tags and Their Synonyms in Social Networks
The preferences of Web information purchasers are rapidly evolving. Cost-effectiveness is now becoming less regarded than cost-satisfaction, which emphasizes the purchaserās psychological satisfaction. One method to improve a userās cost-satisfaction in multimedia content retrieval is to utilize the mood inherent in multimedia items. An example of applications using this method is SNS (Social Network Services), which is based on folksonomy, but its applications encounter problems due to synonyms. In order to solve the problem of synonyms in our previous study, the mood of multimedia content is represented with arousal and valence (AV) in Thayerās two-dimensional model as its internal tag. Although some problems of synonyms could now be solved, the retrieval performance of the previous study was less than that of a keyword-based method. In this paper, a new method that can solve the synonym problem is proposed, while simultaneously maintaining the same performance as the keyword-based approach. In the proposed method, a mood of multimedia content is represented with a fuzzy set of 12 moods of the Thayer model. For the analysis, the proposed method is compared with two methods, one based on AV value and the other based on keyword. The analysis results demonstrate that the proposed method is superior to the two methods
Joint unsupervised and supervised learning for context-aware language identification
Language identification (LID) recognizes the language of a spoken utterance
automatically. According to recent studies, LID models trained with an
automatic speech recognition (ASR) task perform better than those trained with
a LID task only. However, we need additional text labels to train the model to
recognize speech, and acquiring the text labels is a cost high. In order to
overcome this problem, we propose context-aware language identification using a
combination of unsupervised and supervised learning without any text labels.
The proposed method learns the context of speech through masked language
modeling (MLM) loss and simultaneously trains to determine the language of the
utterance with supervised learning loss. The proposed joint learning was found
to reduce the error rate by 15.6% compared to the same structure model trained
by supervised-only learning on a subset of the VoxLingua107 dataset consisting
of sub-three-second utterances in 11 languages.Comment: Accepted by ICASSP 202
Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling
We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture
that integrates full- and sub-band (FSB) modeling, for single- and
multi-channel speech enhancement in the short-time Fourier transform (STFT)
domain. The model maintains an information highway to flow an over-complete
input representation through multiple FSB-LSTM modules. Each FSB-LSTM module
consists of a full-band block to model spectro-temporal patterns at all
frequencies and a sub-band block to model patterns within each sub-band, where
each of the two blocks takes a down-sampled representation as input and returns
an up-sampled discriminative representation to be added to the block input via
a residual connection. The model is designed to have a low algorithmic
complexity, a small run-time buffer and a very low algorithmic latency, at the
same time producing a strong enhancement performance on a noisy-reverberant
speech enhancement task even if the hop size is as low as ms.Comment: in ICASSP 202
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
We propose TF-GridNet for speech separation. The model is a novel multi-path
deep neural network (DNN) integrating full- and sub-band modeling in the
time-frequency (T-F) domain. It stacks several multi-path blocks, each
consisting of an intra-frame full-band module, a sub-band temporal module, and
a cross-frame self-attention module. It is trained to perform complex spectral
mapping, where the real and imaginary (RI) components of input signals are
stacked as features to predict target RI components. We first evaluate it on
monaural anechoic speaker separation. Without using data augmentation and
dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in
scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a standard
dataset for two-speaker separation. To show its robustness to noise and
reverberation, we evaluate it on monaural reverberant speaker separation using
the SMS-WSJ dataset and on noisy-reverberant speaker separation using WHAMR!,
and obtain state-of-the-art performance on both datasets. We then extend
TF-GridNet to multi-microphone conditions through multi-microphone complex
spectral mapping, and integrate it into a two-DNN system with a beamformer in
between (named as MISO-BF-MISO in earlier studies), where the beamformer
proposed in this paper is a novel multi-frame Wiener filter computed based on
the outputs of the first DNN. State-of-the-art performance is obtained on the
multi-channel tasks of SMS-WSJ and WHAMR!. Besides speaker separation, we apply
the proposed algorithms to speech dereverberation and noisy-reverberant speech
enhancement. State-of-the-art performance is obtained on a dereverberation
dataset and on the dataset of the recent L3DAS22 multi-channel speech
enhancement challenge.Comment: In submission. A sound demo is available at
https://zqwang7.github.io/demos/TF-GridNet-demo/index.htm
Effect of Wavelength and Intensity of Light on a-InGaZnO TFTs under Negative Bias Illumination Stress
We investigated degradation mechanism of a-IGZO TFTs under NBIS with different wavelengths. and intensities IL of light. Negative gate bias was applied for 4000 s while drain and source were grounded, and illuminations with lambda = 450, 530, or 700 nm were applied. Illumination with photon energy exceeding similar to 2.3 eV (530 nm) induced noticeable change in threshold voltage shift Delta V-th, which can be interpreted in terms of ionization of oxygen vacancies V-O. In addition, I-L of blue illumination (450 nm) was varied from 6 to 200 lux and saturation in Delta V-th was observed after exceeding a certain I-L. We suggest that the saturation occurs because V-O-ionization rate is saturated by outward relaxation of metal atoms in the a-IGZO film. (C) The Author(s) 2016. Published by ECS.1174Ysciescopu
That's What I Said: Fully-Controllable Talking Face Generation
The goal of this paper is to synthesise talking faces with controllable
facial motions. To achieve this goal, we propose two key ideas. The first is to
establish a canonical space where every face has the same motion patterns but
different identities. The second is to navigate a multimodal motion space that
only represents motion-related features while eliminating identity information.
To disentangle identity and motion, we introduce an orthogonality constraint
between the two different latent spaces. From this, our method can generate
natural-looking talking faces with fully controllable facial attributes and
accurate lip synchronisation. Extensive experiments demonstrate that our method
achieves state-of-the-art results in terms of both visual quality and lip-sync
score. To the best of our knowledge, we are the first to develop a talking face
generation framework that can accurately manifest full target facial motions
including lip, head pose, and eye movements in the generated video without any
additional supervision beyond RGB video with audio
Efficacy of PEEK Cages and Plate Augmentation in Three-Level Anterior Cervical Fusion of Elderly Patients
Epstein-Barr Virus-Positivity in Tumor has no Correlation with the Clinical Outcomes of Patients with Angioimmunoblastic T-cell Lymphoma
Inspection System for Vehicle Headlight Defects Based on Convolutional Neural Network
This paper proposes a method to detect the defects in the region of interest (ROI) based on a convolutional neural network (CNN) after alignment (position and rotation calibration) of a manufacturerās headlights to determine whether the vehicle headlights are defective. The results were compared with an existing method for distinguishing defects among the previously proposed methods. One hundred original headlight images were acquired for each of the two vehicle types for the purpose of this experiment, and 20,000 high quality images and 20,000 defective images were obtained by applying the position and rotation transformation to the original images. It was found that the method proposed in this paper demonstrated a performance improvement of more than 0.1569 (15.69% on average) as compared to the existing method
Mechanism of carrier controllability with metal capping layer on amorphous oxide SiZnSnO semiconductor
Abstract The change of electrical performance of amorphous SiZnSnO thin film transistors (a-SZTO TFTs) has been investigated depending on various metal capping layers on the channel layer by causing different contact property. It was confirmed that the change of electrical characteristics was sensitively dependent on the change of the capping layer materials on the same channel layer between the source/drain electrodes. This sensitive change in the electrical characteristics is mainly due to different work function of metal capping layer on the channel layer. The work function of each capping layer material has been analyzed and derived by using Kelvin probe force microscopy and compared with the energy bandgap of the SZTO layer. When the work function of the capping layer is larger than that of the channel layer, electrons are depleted from the channel layer to the capping layer. On the contrary, in the case of using a material having a work function smaller than that of the channel layer, the electrical characteristics were improved because electrons were injected into the channel layer. Based on depletion and injection mechanism caused by different contact barrier between metal capping layer and channel layer, NOT, NAND, and NOR logic circuits have been implemented simply by changing metal capping layer on the channel layer