433 research outputs found
Transferring Cross-domain Knowledge for Video Sign Language Recognition
Word-level sign language recognition (WSLR) is a fundamental task in sign
language interpretation. It requires models to recognize isolated sign words
from videos. However, annotating WSLR data needs expert knowledge, thus
limiting WSLR dataset acquisition. On the contrary, there are abundant
subtitled sign news videos on the internet. Since these videos have no
word-level annotation and exhibit a large domain gap from isolated signs, they
cannot be directly used for training WSLR models. We observe that despite the
existence of a large domain gap, isolated and news signs share the same visual
concepts, such as hand gestures and body movements. Motivated by this
observation, we propose a novel method that learns domain-invariant visual
concepts and fertilizes WSLR models by transferring knowledge of subtitled news
sign to them. To this end, we extract news signs using a base WSLR model, and
then design a classifier jointly trained on news and isolated signs to coarsely
align these two domain features. In order to learn domain-invariant features
within each class and suppress domain-specific features, our method further
resorts to an external memory to store the class centroids of the aligned news
signs. We then design a temporal attention based on the learnt descriptor to
improve recognition performance. Experimental results on standard WSLR datasets
show that our method outperforms previous state-of-the-art methods
significantly. We also demonstrate the effectiveness of our method on
automatically localizing signs from sign news, achieving 28.1 for [email protected]: CVPR2020 (oral) preprin
Large-scale fine-grained semantic indexing of biomedical literature based on weakly-supervised deep learning
Semantic indexing of biomedical literature is usually done at the level of
MeSH descriptors, representing topics of interest for the biomedical community.
Several related but distinct biomedical concepts are often grouped together in
a single coarse-grained descriptor and are treated as a single topic for
semantic indexing. This study proposes a new method for the automated
refinement of subject annotations at the level of concepts, investigating deep
learning approaches. Lacking labelled data for this task, our method relies on
weak supervision based on concept occurrence in the abstract of an article. The
proposed approach is evaluated on an extended large-scale retrospective
scenario, taking advantage of concepts that eventually become MeSH descriptors,
for which annotations become available in MEDLINE/PubMed. The results suggest
that concept occurrence is a strong heuristic for automated subject annotation
refinement and can be further enhanced when combined with dictionary-based
heuristics. In addition, such heuristics can be useful as weak supervision for
developing deep learning models that can achieve further improvement in some
cases.Comment: 48 pages, 5 figures, 9 tables, 1 algorith
SPA: A Graph Spectral Alignment Perspective for Domain Adaptation
Unsupervised domain adaptation (UDA) is a pivotal form in machine learning to
extend the in-domain model to the distinctive target domains where the data
distributions differ. Most prior works focus on capturing the inter-domain
transferability but largely overlook rich intra-domain structures, which
empirically results in even worse discriminability. In this work, we introduce
a novel graph SPectral Alignment (SPA) framework to tackle the tradeoff. The
core of our method is briefly condensed as follows: (i)-by casting the DA
problem to graph primitives, SPA composes a coarse graph alignment mechanism
with a novel spectral regularizer towards aligning the domain graphs in
eigenspaces; (ii)-we further develop a fine-grained message propagation module
-- upon a novel neighbor-aware self-training mechanism -- in order for enhanced
discriminability in the target domain. On standardized benchmarks, the
extensive experiments of SPA demonstrate that its performance has surpassed the
existing cutting-edge DA methods. Coupled with dense model analysis, we
conclude that our approach indeed possesses superior efficacy, robustness,
discriminability, and transferability. Code and data are available at:
https://github.com/CrownX/SPA.Comment: NeurIPS 2023 camera read
Recommended from our members
Building robust and modular question answering systems
Over the past few years, significant progress has been made in QA systems due to the availability of annotated datasets on a large scale and the impressive advancements in large-scale pre-trained language models. Despite these successes, the black-box nature of end-to-end trained QA systems makes them hard to interpret and control. When these systems encounter inputs that deviate from their training data distribution or are subjected to adversarial perturbations, their performance tends to deteriorate by a large margin. Furthermore, they may occasionally produce unanticipated results, potentially leading to confusion among users. Additionally, this deficiency in robustness and interpretability poses challenges when deploying such models in real-world scenarios.
In this dissertation, we aim to build robust QA systems by explicitly decomposing various QA tasks into distinct sub-modules, each responsible for a particular aspect of the overall QA process. Through this decomposition, we seek to achieve improved performance in terms of both the system's ability to handle diverse and challenging inputs (robustness) and its capacity to provide transparent and explainable reasoning (interpretability).
To address the aforementioned limitations, in this dissertation, we aim to build robust QA models by explicitly decomposing different QA tasks into different sub-modules. We argue that utilizing these sub-modules can substantially improve the robustness and interpretability of different QA systems. In the first half of this dissertation, we introduce three sub-modules to mitigate the dataset artifacts that models learn from datasets. These sub-modules also enable us to examine and exert explicit control over the intermediate outputs. In the first work, to address question answering that requires multi-hop reasoning, we propose a chain extractor, which extracts the reasoning chains necessary for models to derive the final answer. The reasoning chains not only prevent the model from exploiting reasoning shortcuts but also provide an explanation of how the answer is derived. In the second work, we incorporate an alignment layer between the question and the context before generating the answer. This alignment layer can help us interpret the models' behavior and improve the robustness of adversarial settings. In the third work, we add an answer verifier after QA models generate the answer. This verifier can boost QA models' prediction confidence across several different domains and help us spot cases where QA models predict the right answer for the wrong reason by utilizing the external NLI datasets and models.
In the second half of this dissertation, we tackle the problem of complex fact-checking in the real world by treating it as a modularized QA task. We first decompose a complex claim into several yes-no subquestions whose answer directly contributes to the veracity of the claim. Then, each sub-question is fed into a commercial search engine to retrieve relevant documents. Additionally, we extract the relevant snippets in the retrieved documents and use a GPT3-based summarizer to generate the core evidence for checking the claim. We show that the decompositions can play an important role in both evidence retrieval and veracity composition of an explainable fact-checking system. Also, we show the GPT3-based evidence summarizer generates faithful summaries of documents most of the time indicating it can be used as an
effective part of the pipeline. Moreover, we annotate a dataset -- ClaimDecomp, containing 1,200 complex claims and the decompositions. We believe that this dataset can further promote building explainable fact-checking systems and analyzing complex claims in the real world.Computer Science
Towards Learning Discrete Representations via Self-Supervision for Wearables-Based Human Activity Recognition
Human activity recognition (HAR) in wearable computing is typically based on
direct processing of sensor data. Sensor readings are translated into
representations, either derived through dedicated preprocessing, or integrated
into end-to-end learning. Independent of their origin, for the vast majority of
contemporary HAR, those representations are typically continuous in nature.
That has not always been the case. In the early days of HAR, discretization
approaches have been explored - primarily motivated by the desire to minimize
computational requirements, but also with a view on applications beyond mere
recognition, such as, activity discovery, fingerprinting, or large-scale
search. Those traditional discretization approaches, however, suffer from
substantial loss in precision and resolution in the resulting representations
with detrimental effects on downstream tasks. Times have changed and in this
paper we propose a return to discretized representations. We adopt and apply
recent advancements in Vector Quantization (VQ) to wearables applications,
which enables us to directly learn a mapping between short spans of sensor data
and a codebook of vectors, resulting in recognition performance that is
generally on par with their contemporary, continuous counterparts - sometimes
surpassing them. Therefore, this work presents a proof-of-concept for
demonstrating how effective discrete representations can be derived, enabling
applications beyond mere activity classification but also opening up the field
to advanced tools for the analysis of symbolic sequences, as they are known,
for example, from domains such as natural language processing. Based on an
extensive experimental evaluation on a suite of wearables-based benchmark HAR
tasks, we demonstrate the potential of our learned discretization scheme and
discuss how discretized sensor data analysis can lead to substantial changes in
HAR
- …