43 research outputs found
Structured Co-reference Graph Attention for Video-grounded Dialogue
A video-grounded dialogue system referred to as the Structured Co-reference
Graph Attention (SCGA) is presented for decoding the answer sequence to a
question regarding a given video while keeping track of the dialogue context.
Although recent efforts have made great strides in improving the quality of the
response, performance is still far from satisfactory. The two main challenging
issues are as follows: (1) how to deduce co-reference among multiple modalities
and (2) how to reason on the rich underlying semantic structure of video with
complex spatial and temporal dynamics. To this end, SCGA is based on (1)
Structured Co-reference Resolver that performs dereferencing via building a
structured graph over multiple modalities, (2) Spatio-temporal Video Reasoner
that captures local-to-global dynamics of video via gradually neighboring graph
attention. SCGA makes use of pointer network to dynamically replicate parts of
the question for decoding the answer sequence. The validity of the proposed
SCGA is demonstrated on AVSD@DSTC7 and AVSD@DSTC8 datasets, a challenging
video-grounded dialogue benchmarks, and TVQA dataset, a large-scale videoQA
benchmark. Our empirical results show that SCGA outperforms other
state-of-the-art dialogue systems on both benchmarks, while extensive ablation
study and qualitative analysis reveal performance gain and improved
interpretability.Comment: Accepted to AAAI202
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Video-grounded Dialogue (VGD) aims to answer questions regarding a given
multi-modal input comprising video, audio, and dialogue history. Although there
have been numerous efforts in developing VGD systems to improve the quality of
their responses, existing systems are competent only to incorporate the
information in the video and text and tend to struggle in extracting the
necessary information from the audio when generating appropriate responses to
the question. The VGD system seems to be deaf, and thus, we coin this symptom
of current systems' ignoring audio data as a deaf response. To overcome the
deaf response problem, Hearing Enhanced Audio Response (HEAR) framework is
proposed to perform sensible listening by selectively attending to audio
whenever the question requires it. The HEAR framework enhances the accuracy and
audibility of VGD systems in a model-agnostic manner. HEAR is validated on VGD
datasets (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows effectiveness with various
VGD systems.Comment: EMNLP 2023, 14 pages, 13 figure
Seek or Provide: Comparative Effects of Online Information Sharing on Seniorsâ Quality of Life
Seniorsâ social activities are critical in assuring their quality of life, and seniorsâ quality of life (QoL) declines with the deterioration of their social activity. Social support from online social relationships has been considered to be important determinants of QoL, and is an important goal of the design of online health communities to support patient-centered e-health initiatives. In this study, we find that, rather than attempting to improve seniorsâ quality of life through interventions and online community platforms that are designed directly to increase social interactions and focus on social relationship formation, it is more effective for such online health communities to be designed to facilitate information sharing. Information sharing may be an easy way for seniors to become familiar with the online environment and pave the way for subsequent online social relationships. This study investigated seniorsâ online information sharing behaviors and the impacts on their quality of life. Survey data from 130 seniors was used to test our research model. Seniorsâ online information seeking and provision indirectly affect their quality of life, and the relative importance of information seeking and information provision varies depending on the seniorsâ perceived subjective age, i.e., cognitive age
Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue
Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question
regarding a given video and dialogue context. Despite the recent success of
multi-modal reasoning to generate answer sentences, existing dialogue systems
still suffer from a text hallucination problem, which denotes indiscriminate
text-copying from input texts without an understanding of the question. This is
due to learning spurious correlations from the fact that answer sentences in
the dataset usually include the words of input texts, thus the VGD system
excessively relies on copying words from input texts by hoping those words to
overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating
(THAM) framework, which incorporates Text Hallucination Regularization (THR)
loss derived from the proposed information-theoretic text hallucination
measurement approach. Applying THAM with current dialogue systems validates the
effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows
enhanced interpretability.Comment: 12 pages, Accepted in EMNLP 202
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval
Video Moment Retrieval (VMR) is a task to localize the temporal moment in
untrimmed video specified by natural language query. For VMR, several methods
that require full supervision for training have been proposed. Unfortunately,
acquiring a large number of training videos with labeled temporal boundaries
for each query is a labor-intensive process. This paper explores methods for
performing VMR in a weakly-supervised manner (wVMR): training is performed
without temporal moment labels but only with the text query that describes a
segment of the video. Existing methods on wVMR generate multi-scale proposals
and apply query-guided attention mechanisms to highlight the most relevant
proposal. To leverage the weak supervision, contrastive learning is used which
predicts higher scores for the correct video-query pairs than for the incorrect
pairs. It has been observed that a large number of candidate proposals, coarse
query representation, and one-way attention mechanism lead to blurry attention
maps which limit the localization performance. To handle this issue,
Video-Language Alignment Network (VLANet) is proposed that learns sharper
attention by pruning out spurious candidate proposals and applying a
multi-directional attention mechanism with fine-grained query representation.
The Surrogate Proposal Selection module selects a proposal based on the
proximity to the query in the joint embedding space, and thus substantially
reduces candidate proposals which leads to lower computation load and sharper
attention. Next, the Cascaded Cross-modal Attention module considers dense
feature interactions and multi-directional attention flow to learn the
multi-modal alignment. VLANet is trained end-to-end using contrastive loss
which enforces semantically similar videos and queries to gather. The
experiments show that the method achieves state-of-the-art performance on
Charades-STA and DiDeMo datasets.Comment: 16 pages, 6 figures, European Conference on Computer Vision, 202
ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure
Studies have shown that modern neural networks tend to be poorly calibrated
due to over-confident predictions. Traditionally, post-processing methods have
been used to calibrate the model after training. In recent years, various
trainable calibration measures have been proposed to incorporate them directly
into the training process. However, these methods all incorporate internal
hyperparameters, and the performance of these calibration objectives relies on
tuning these hyperparameters, incurring more computational costs as the size of
neural networks and datasets become larger. As such, we present Expected
Squared Difference (ESD), a tuning-free (i.e., hyperparameter-free) trainable
calibration objective loss, where we view the calibration error from the
perspective of the squared difference between the two expectations. With
extensive experiments on several architectures (CNNs, Transformers) and
datasets, we demonstrate that (1) incorporating ESD into the training improves
model calibration in various batch size settings without the need for internal
hyperparameter tuning, (2) ESD yields the best-calibrated results compared with
previous approaches, and (3) ESD drastically improves the computational costs
required for calibration during training due to the absence of internal
hyperparameter. The code is publicly accessible at
https://github.com/hee-suk-yoon/ESD.Comment: ICLR 202
Lowest threshold lasing modes localized on marginally unstable periodic orbits in a semiconductor microcavity laser
The lowest threshold lasing mode in a rounded D-shape microcavity is theoretically analyzed and experimentally demonstrated. To identify the lowest threshold lasing mode, we investigate threshold conditions of different periodic orbits by considering the linear gain condition due to the effective pumping region and total loss consisting of internal and scattering losses in ray dynamics. We compare the ray dynamical result with resonance mode analysis, including gain and loss. We find that the resonance modes localized on the pentagonal marginally unstable periodic orbit have the lowest threshold in our fabrication configuration. Our findings are verified by obtaining the path lengths and far-field patterns of lasing modes. © 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement.1
Publisher Correction: MEMOTE for standardized genome-scale metabolic model testing
An amendment to this paper has been published and can be accessed via a link at the top of the paper.(undefined)info:eu-repo/semantics/publishedVersio
MEMOTE for standardized genome-scale metabolic model testing
Supplementary information is available for this paper at https://doi.org/10.1038/s41587-020-0446-yReconstructing metabolic reaction networks enables the development of testable hypotheses of an organisms metabolism under different conditions1. State-of-the-art genome-scale metabolic models (GEMs) can include thousands of metabolites and reactions that are assigned to subcellular locations. Geneproteinreaction (GPR) rules and annotations using database information can add meta-information to GEMs. GEMs with metadata can be built using standard reconstruction protocols2, and guidelines have been put in place for tracking provenance and enabling interoperability, but a standardized means of quality control for GEMs is lacking3. Here we report a community effort to develop a test suite named MEMOTE (for metabolic model tests) to assess GEM quality.We acknowledge D. Dannaher and A. Lopez for their supporting work on the Angular parts of MEMOTE; resources and support from the DTU Computing Center; J. Cardoso, S. Gudmundsson, K. Jensen and D. Lappa for their feedback on conceptual details; and P. D. Karp and I. Thiele for critically reviewing the manuscript. We thank J. Daniel, T. KristjĂĄnsdĂłttir, J. Saez-Saez, S. Sulheim, and P. Tubergen for being early adopters of MEMOTE and for providing written testimonials. J.O.V. received the Research Council of Norway grants 244164 (GenoSysFat), 248792 (DigiSal) and 248810 (Digital Life Norway); M.Z. received the Research Council of Norway grant 244164 (GenoSysFat); C.L. received funding from the Innovation Fund Denmark (project âEnvironmentally Friendly Protein Production (EFPro2)â); C.L., A.K., N. S., M.B., M.A., D.M., P.M, B.J.S., P.V., K.R.P. and M.H. received funding from the European Unionâs Horizon 2020 research and innovation programme under grant agreement 686070 (DD-DeCaF); B.G.O., F.T.B. and A.D. acknowledge funding from the US National Institutes of Health (NIH, grant number 2R01GM070923-13); A.D. was supported by infrastructural funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), Cluster of Excellence EXC 2124 Controlling Microbes to Fight Infections; N.E.L. received funding from NIGMS R35 GM119850, Novo Nordisk Foundation NNF10CC1016517 and the Keck Foundation; A.R. received a Lilly Innovation Fellowship Award; B.G.-J. and J. Nogales received funding from the European Unionâs Horizon 2020 research and innovation programme under grant agreement no 686585 for the project LIAR, and the Spanish Ministry of Economy and Competitivity through the RobDcode grant (BIO2014-59528-JIN); L.M.B. has received funding from the European Unionâs Horizon 2020 research and innovation programme under grant agreement 633962 for project P4SB; R.F. received funding from the US Department of Energy, Offices of Advanced Scientific Computing Research and the Biological and Environmental Research as part of the Scientific Discovery Through Advanced Computing program, grant DE-SC0010429; A.M., C.Z., S.L. and J. Nielsen received funding from The Knut and Alice Wallenberg Foundation, Advanced Computing program, grant #DE-SC0010429; S.K.âs work was in part supported by the German Federal Ministry of Education and Research (de.NBI partner project âModSimâ (FKZ: 031L104B)); E.K. and J.A.H.W. were supported by the German Federal Ministry of Education and Research (project âSysToxChipâ, FKZ 031A303A); M.K. is supported by the Federal Ministry of Education and Research (BMBF, Germany) within the research network Systems Medicine of the Liver (LiSyM, grant number 031L0054); J.A.P. and G.L.M. acknowledge funding from US National Institutes of Health (T32-LM012416, R01-AT010253, R01-GM108501) and the Wagner Foundation; G.L.M. acknowledges funding from a Grand Challenges Exploration Phase I grant (OPP1211869) from the Bill & Melinda Gates Foundation; H.H. and R.S.M.S. received funding from the Biotechnology and Biological Sciences Research Council MultiMod (BB/N019482/1); H.U.K. and S.Y.L. received funding from the Technology Development Program to Solve Climate Changes on Systems Metabolic Engineering for Biorefineries (grants NRF-2012M1A2A2026556 and NRF-2012M1A2A2026557) from the Ministry of Science and ICT through the National Research Foundation (NRF) of Korea; H.U.K. received funding from the Bio & Medical Technology Development Program of the NRF, the Ministry of Science and ICT (NRF-2018M3A9H3020459); P.B., B.J.S., Z.K., B.O.P., C.L., M.B., N.S., M.H. and A.F. received funding through Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (NNF10CC1016517); D.-Y.L. received funding from the Next-Generation BioGreen 21 Program (SSAC, PJ01334605), Rural Development Administration, Republic of Korea; G.F. was supported by the RobustYeast within ERA net project via SystemsX.ch; V.H. received funding from the ETH Domain and Swiss National Science Foundation; M.P. acknowledges Oxford Brookes University; J.C.X. received support via European Research Council (666053) to W.F. Martin; B.E.E. acknowledges funding through the CSIRO-UQ Synthetic Biology Alliance; C.D. is supported by a Washington Research Foundation Distinguished Investigator Award. I.N. received funding from National Institutes of Health (NIH)/National Institute of General Medical Sciences (NIGMS) (grant P20GM125503).info:eu-repo/semantics/publishedVersio