1,071 research outputs found
AnoOnly: Semi-Supervised Anomaly Detection without Loss on Normal Data
Semi-supervised anomaly detection (SSAD) methods have demonstrated their
effectiveness in enhancing unsupervised anomaly detection (UAD) by leveraging
few-shot but instructive abnormal instances. However, the dominance of
homogeneous normal data over anomalies biases the SSAD models against
effectively perceiving anomalies. To address this issue and achieve balanced
supervision between heavily imbalanced normal and abnormal data, we develop a
novel framework called AnoOnly (Anomaly Only). Unlike existing SSAD methods
that resort to strict loss supervision, AnoOnly suspends it and introduces a
form of weak supervision for normal data. This weak supervision is instantiated
through the utilization of batch normalization, which implicitly performs
cluster learning on normal data. When integrated into existing SSAD methods,
the proposed AnoOnly demonstrates remarkable performance enhancements across
various models and datasets, achieving new state-of-the-art performance.
Additionally, our AnoOnly is natively robust to label noise when suffering from
data contamination. Our code is publicly available at
https://github.com/cool-xuan/AnoOnly.Comment: Under review for NeurIPS202
Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval
Most existing cross-modal retrieval methods employ two-stream encoders with
different architectures for images and texts, \textit{e.g.}, CNN for images and
RNN/Transformer for texts. Such discrepancy in architectures may induce
different semantic distribution spaces and limit the interactions between
images and texts, and further result in inferior alignment between images and
texts. To fill this research gap, inspired by recent advances of Transformers
in vision tasks, we propose to unify the encoder architectures with
Transformers for both modalities. Specifically, we design a cross-modal
retrieval framework purely based on two-stream Transformers, dubbed
\textbf{Hierarchical Alignment Transformers (HAT)}, which consists of an image
Transformer, a text Transformer, and a hierarchical alignment module. With such
identical architectures, the encoders could produce representations with more
similar characteristics for images and texts, and make the interactions and
alignments between them much easier. Besides, to leverage the rich semantics,
we devise a hierarchical alignment scheme to explore multi-level
correspondences of different layers between images and texts. To evaluate the
effectiveness of the proposed HAT, we conduct extensive experiments on two
benchmark datasets, MSCOCO and Flickr30K. Experimental results demonstrate that
HAT outperforms SOTA baselines by a large margin. Specifically, on two key
tasks, \textit{i.e.}, image-to-text and text-to-image retrieval, HAT achieves
7.6\% and 16.7\% relative score improvement of Recall@1 on MSCOCO, and 4.4\%
and 11.6\% on Flickr30k respectively. The code is available at
\url{https://github.com/LuminosityX/HAT}.Comment: Accepted at ACM Multimedia 202
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering
Large Language Models (LLMs) have recently demonstrated exceptional
performance in various Natural Language Processing (NLP) tasks. They have also
shown the ability to perform chain-of-thought (CoT) reasoning to solve complex
problems. Recent studies have explored CoT reasoning in complex multimodal
scenarios, such as the science question answering task, by fine-tuning
multimodal models with high-quality human-annotated CoT rationales. However,
collecting high-quality COT rationales is usually time-consuming and costly.
Besides, the annotated rationales are hardly accurate due to the external
essential information missed. To address these issues, we propose a novel
method termed \emph{T-SciQ} that aims at teaching science question answering
with LLM signals. The T-SciQ approach generates high-quality CoT rationales as
teaching signals and is advanced to train much smaller models to perform CoT
reasoning in complex modalities. Additionally, we introduce a novel data mixing
strategy to produce more effective teaching data samples by policy for simple
and complex science question answer problems. Extensive experimental results
show that our T-SciQ method achieves a new state-of-the-art performance on the
ScienceQA benchmark, with an accuracy of 96.18\%. Moreover, our approach
outperforms the most powerful fine-tuned baseline by 4.5\%
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction
Large language models (LLMs), such as GPT-3 and ChatGPT, have demonstrated
remarkable results in various natural language processing (NLP) tasks with
in-context learning, which involves inference based on a few demonstration
examples. Despite their successes in NLP tasks, no investigation has been
conducted to assess the ability of LLMs to perform document information
extraction (DIE) using in-context learning. Applying LLMs to DIE poses two
challenges: the modality and task gap. To this end, we propose a simple but
effective in-context learning framework called ICL-D3IE, which enables LLMs to
perform DIE with different types of demonstration examples. Specifically, we
extract the most difficult and distinct segments from hard training documents
as hard demonstrations for benefiting all test instances. We design
demonstrations describing relationships that enable LLMs to understand
positional relationships. We introduce formatting demonstrations for easy
answer extraction. Additionally, the framework improves diverse demonstrations
by updating them iteratively. Our experiments on three widely used benchmark
datasets demonstrate that the ICL-D3IE framework enables GPT-3/ChatGPT to
achieve superior performance when compared to previous pre-trained methods
fine-tuned with full training in both the in-distribution (ID) setting and in
the out-of-distribution (OOD) setting
Hydrogen Therapy may be a Novel and Effective Treatment for COPD
The protective effect of hydrogen (H2) on ROS-induced diseases has been proved by many researches, which demonstrated that through eliminating •OH and •ONOO–, H2 could effectively attenuate lipid and DNA peroxidation, improve cellular antioxidant capacity, and then protect cells against oxidant damage. Most of free radicals in human body are ROS, including O2•–,•OH, H2O2, NO•,•ONOO–, and so on. Under normal circumstances cells are able to maintain an adequate homeostasis between the formation and removal of ROS through particular enzymatic pathways or antioxidants. But under some pathological conditions, the balance is disturbed, leading to oxidative stress and various diseases, such as chronic obstructive pulmonary disease (COPD). Studies have shown that ROS played a pivotal role in the development of COPD and some antioxidants were effective in the protection against the damaging effects of oxidative stress. Therefore, we hypothesize that owing to its peculiarity to eliminate toxic ROS, hydrogen therapy may be a novel and effective treatment for COPD
Non-Autoregressive Math Word Problem Solver with Unified Tree Structure
Existing MWP solvers employ sequence or binary tree to present the solution
expression and decode it from given problem description. However, such
structures fail to handle the variants that can be derived via mathematical
manipulation, e.g., and can both be
possible valid solutions for a same problem but formulated as different
expression sequences or trees. The multiple solution variants depicting
different possible solving procedures for the same input problem would raise
two issues: 1) making it hard for the model to learn the mapping function
between the input and output spaces effectively, and 2) wrongly indicating
\textit{wrong} when evaluating a valid expression variant. To address these
issues, we introduce a unified tree structure to present a solution expression,
where the elements are permutable and identical for all the expression
variants. We propose a novel non-autoregressive solver, named \textit{MWP-NAS},
to parse the problem and deduce the solution expression based on the unified
tree. For evaluating the possible expression variants, we design a path-based
metric to evaluate the partial accuracy of expressions of a unified tree. The
results from extensive experiments conducted on Math23K and MAWPS demonstrate
the effectiveness of our proposed MWP-NAS. The codes and checkpoints are
available at: \url{https://github.com/mengqunhan/MWP-NAS}.Comment: Accepted at EMNLP202
Experimental Generation of Spin-Photon Entanglement in Silicon Carbide
A solid-state approach for quantum networks is advantages, as it allows the
integration of nanophotonics to enhance the photon emission and the utilization
of weakly coupled nuclear spins for long-lived storage. Silicon carbide,
specifically point defects within it, shows great promise in this regard due to
the easy of availability and well-established nanofabrication techniques.
Despite of remarkable progresses made, achieving spin-photon entanglement
remains a crucial aspect to be realized. In this paper, we experimentally
generate entanglement between a silicon vacancy defect in silicon carbide and a
scattered single photon in the zero-phonon line. The spin state is measured by
detecting photons scattered in the phonon sideband. The photonic qubit is
encoded in the time-bin degree-of-freedom and measured using an unbalanced
Mach-Zehnder interferometer. Photonic correlations not only reveal the quality
of the entanglement but also verify the deterministic nature of the
entanglement creation process. By harnessing two pairs of such spin-photon
entanglement, it becomes straightforward to entangle remote quantum nodes at
long distance.Comment: 8 pages in total, 4 figures in the main text, 1 figure in the
supplemental materia
- …