1,938 research outputs found
Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue
Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question
regarding a given video and dialogue context. Despite the recent success of
multi-modal reasoning to generate answer sentences, existing dialogue systems
still suffer from a text hallucination problem, which denotes indiscriminate
text-copying from input texts without an understanding of the question. This is
due to learning spurious correlations from the fact that answer sentences in
the dataset usually include the words of input texts, thus the VGD system
excessively relies on copying words from input texts by hoping those words to
overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating
(THAM) framework, which incorporates Text Hallucination Regularization (THR)
loss derived from the proposed information-theoretic text hallucination
measurement approach. Applying THAM with current dialogue systems validates the
effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows
enhanced interpretability.Comment: 12 pages, Accepted in EMNLP 202
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Video-grounded Dialogue (VGD) aims to answer questions regarding a given
multi-modal input comprising video, audio, and dialogue history. Although there
have been numerous efforts in developing VGD systems to improve the quality of
their responses, existing systems are competent only to incorporate the
information in the video and text and tend to struggle in extracting the
necessary information from the audio when generating appropriate responses to
the question. The VGD system seems to be deaf, and thus, we coin this symptom
of current systems' ignoring audio data as a deaf response. To overcome the
deaf response problem, Hearing Enhanced Audio Response (HEAR) framework is
proposed to perform sensible listening by selectively attending to audio
whenever the question requires it. The HEAR framework enhances the accuracy and
audibility of VGD systems in a model-agnostic manner. HEAR is validated on VGD
datasets (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows effectiveness with various
VGD systems.Comment: EMNLP 2023, 14 pages, 13 figure
ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure
Studies have shown that modern neural networks tend to be poorly calibrated
due to over-confident predictions. Traditionally, post-processing methods have
been used to calibrate the model after training. In recent years, various
trainable calibration measures have been proposed to incorporate them directly
into the training process. However, these methods all incorporate internal
hyperparameters, and the performance of these calibration objectives relies on
tuning these hyperparameters, incurring more computational costs as the size of
neural networks and datasets become larger. As such, we present Expected
Squared Difference (ESD), a tuning-free (i.e., hyperparameter-free) trainable
calibration objective loss, where we view the calibration error from the
perspective of the squared difference between the two expectations. With
extensive experiments on several architectures (CNNs, Transformers) and
datasets, we demonstrate that (1) incorporating ESD into the training improves
model calibration in various batch size settings without the need for internal
hyperparameter tuning, (2) ESD yields the best-calibrated results compared with
previous approaches, and (3) ESD drastically improves the computational costs
required for calibration during training due to the absence of internal
hyperparameter. The code is publicly accessible at
https://github.com/hee-suk-yoon/ESD.Comment: ICLR 202
Method and an apparatus for processing a signal
A method of processing a signal is disclosed. The present invention includes receiving a maximum number of band and a code value of at least one section length, calculating a bit number corresponding to the code value of the at least one section length using the maximum number of the band, and obtaining the section length information by decoding the code value of the section length based on the bit number. A method of processing a signal is disclosed. The present invention includes receiving factor information of a current frame, receiving flag information indicating whether a coding mode of the factor information is an absolute value mode or a relative value mode, and obtaining factor data of the current frame using factor data of a previous frame and the received factor information based on the flag information
Methods and apparatuses for encoding and decoding object-based audio signals
Provided are an audio encoding method and apparatus and an audio decoding method and apparatus in which audio signals can be encoded or decoded so that sound images can be localized at any desired position for each object audio signal. The audio decoding method generating a third downmix signal by combining a first downmix signal extracted from a first audio signal and a second downmix signal extracted from a second audio signal; generating third object-based side information by combining first object-based side information extracted from the first audio signal and second object-based side information extracted from the second audio signal; converting the third object-based side information into channel-based side information; and generating a multi-channel audio signal using the third downmix signal and the channel-based side information
2-Bromo-p-terphenyl
In the title compound, C18H13Br, the dihedral angles between the mean planes of the central benzene ring and the mean planes of the outer phenyl and bromophenyl rings are 33.47 (8) and 66.35 (8)°, respectively. In the crystal, weak C—H⋯π and intermolecular Br⋯Br [3.5503 (15) Å] interactions contribute to the stabilization of the packing
Effect of Fe/N-doped carbon nanotube (CNT) wall thickness on CO2 conversion: A DFT study
Many researches on CO2 adsorption using carbon nanotubes (CNTs) have been actively studied, but experimental and theoretical studies on CO2 conversion are still in demand. In particular, the effect of CNT wall thickness on CO2 conversion is not yet established clearly. This study employed two different-walled CNT catalysts doped with iron and nitrogen, single-walled CNT (Fe-N-SWCNT) and double-walled CNT (Fe-N-DWCNT). The structural and electrical properties of these CNTs and their influences on CO2 conversion were characterized and compared using density functional theory (DFT) calculations. As a result, Fe-N-DWCNT was shown to improve catalyst stability with higher formation energy and adsorption energy for CO2 adsorption than Fe-N-SWCNT. Also, the CO2 molecules were found to be highly delocalized and strongly hybridized with Fe-N-DWCNT, leading to more active charge transfer in the catalyst. These findings demonstrate the potential of selective CO2 conversion, as wall thickness differences can lead to different electrical properties of CNTs by showing that the larger the thicknesses, the lower the energy barrier required for CO2 conversion. Specifically, Fe-N-DWCNT is easier to convert CO2 to HCOOH than Fe-N-SWCNT at lower overpotential (0.15 V) obtained with limiting potentials and free energies calculated by understanding the possible reaction pathways in the proton-electron transfer process. Therefore, these results support the hypothesis that the wall thickness of CNT influences CO2 conversion by showing that the double-walled heterogeneous CNT (Fe-N-DWCNT) is a potential catalyst to selectively produce HCOOH from CO2 conversion.Qatar National Research Fund (QNRF) - grant #NPRP 10-1210-160019
- …