1,938 research outputs found

    Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue

    Full text link
    Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input texts, thus the VGD system excessively relies on copying words from input texts by hoping those words to overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating (THAM) framework, which incorporates Text Hallucination Regularization (THR) loss derived from the proposed information-theoretic text hallucination measurement approach. Applying THAM with current dialogue systems validates the effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows enhanced interpretability.Comment: 12 pages, Accepted in EMNLP 202

    HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

    Full text link
    Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in developing VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information from the audio when generating appropriate responses to the question. The VGD system seems to be deaf, and thus, we coin this symptom of current systems' ignoring audio data as a deaf response. To overcome the deaf response problem, Hearing Enhanced Audio Response (HEAR) framework is proposed to perform sensible listening by selectively attending to audio whenever the question requires it. The HEAR framework enhances the accuracy and audibility of VGD systems in a model-agnostic manner. HEAR is validated on VGD datasets (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows effectiveness with various VGD systems.Comment: EMNLP 2023, 14 pages, 13 figure

    ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure

    Full text link
    Studies have shown that modern neural networks tend to be poorly calibrated due to over-confident predictions. Traditionally, post-processing methods have been used to calibrate the model after training. In recent years, various trainable calibration measures have been proposed to incorporate them directly into the training process. However, these methods all incorporate internal hyperparameters, and the performance of these calibration objectives relies on tuning these hyperparameters, incurring more computational costs as the size of neural networks and datasets become larger. As such, we present Expected Squared Difference (ESD), a tuning-free (i.e., hyperparameter-free) trainable calibration objective loss, where we view the calibration error from the perspective of the squared difference between the two expectations. With extensive experiments on several architectures (CNNs, Transformers) and datasets, we demonstrate that (1) incorporating ESD into the training improves model calibration in various batch size settings without the need for internal hyperparameter tuning, (2) ESD yields the best-calibrated results compared with previous approaches, and (3) ESD drastically improves the computational costs required for calibration during training due to the absence of internal hyperparameter. The code is publicly accessible at https://github.com/hee-suk-yoon/ESD.Comment: ICLR 202

    Method and an apparatus for processing a signal

    Get PDF
    A method of processing a signal is disclosed. The present invention includes receiving a maximum number of band and a code value of at least one section length, calculating a bit number corresponding to the code value of the at least one section length using the maximum number of the band, and obtaining the section length information by decoding the code value of the section length based on the bit number. A method of processing a signal is disclosed. The present invention includes receiving factor information of a current frame, receiving flag information indicating whether a coding mode of the factor information is an absolute value mode or a relative value mode, and obtaining factor data of the current frame using factor data of a previous frame and the received factor information based on the flag information

    Methods and apparatuses for encoding and decoding object-based audio signals

    Get PDF
    Provided are an audio encoding method and apparatus and an audio decoding method and apparatus in which audio signals can be encoded or decoded so that sound images can be localized at any desired position for each object audio signal. The audio decoding method generating a third downmix signal by combining a first downmix signal extracted from a first audio signal and a second downmix signal extracted from a second audio signal; generating third object-based side information by combining first object-based side information extracted from the first audio signal and second object-based side information extracted from the second audio signal; converting the third object-based side information into channel-based side information; and generating a multi-channel audio signal using the third downmix signal and the channel-based side information

    2-Bromo-p-terphen­yl

    Get PDF
    In the title compound, C18H13Br, the dihedral angles between the mean planes of the central benzene ring and the mean planes of the outer phenyl and bromo­phenyl rings are 33.47 (8) and 66.35 (8)°, respectively. In the crystal, weak C—H⋯π and inter­molecular Br⋯Br [3.5503 (15) Å] inter­actions contribute to the stabilization of the packing

    Effect of Fe/N-doped carbon nanotube (CNT) wall thickness on CO2 conversion: A DFT study

    Get PDF
    Many researches on CO2 adsorption using carbon nanotubes (CNTs) have been actively studied, but experimental and theoretical studies on CO2 conversion are still in demand. In particular, the effect of CNT wall thickness on CO2 conversion is not yet established clearly. This study employed two different-walled CNT catalysts doped with iron and nitrogen, single-walled CNT (Fe-N-SWCNT) and double-walled CNT (Fe-N-DWCNT). The structural and electrical properties of these CNTs and their influences on CO2 conversion were characterized and compared using density functional theory (DFT) calculations. As a result, Fe-N-DWCNT was shown to improve catalyst stability with higher formation energy and adsorption energy for CO2 adsorption than Fe-N-SWCNT. Also, the CO2 molecules were found to be highly delocalized and strongly hybridized with Fe-N-DWCNT, leading to more active charge transfer in the catalyst. These findings demonstrate the potential of selective CO2 conversion, as wall thickness differences can lead to different electrical properties of CNTs by showing that the larger the thicknesses, the lower the energy barrier required for CO2 conversion. Specifically, Fe-N-DWCNT is easier to convert CO2 to HCOOH than Fe-N-SWCNT at lower overpotential (0.15 V) obtained with limiting potentials and free energies calculated by understanding the possible reaction pathways in the proton-electron transfer process. Therefore, these results support the hypothesis that the wall thickness of CNT influences CO2 conversion by showing that the double-walled heterogeneous CNT (Fe-N-DWCNT) is a potential catalyst to selectively produce HCOOH from CO2 conversion.Qatar National Research Fund (QNRF) - grant #NPRP 10-1210-160019
    corecore