118 research outputs found
Alleviating the Inequality of Attention Heads for Neural Machine Translation
Recent studies show that the attention heads in Transformer are not equal. We
relate this phenomenon to the imbalance training of multi-head attention and
the model dependence on specific heads. To tackle this problem, we propose a
simple masking method: HeadMask, in two specific ways. Experiments show that
translation improvements are achieved on multiple language pairs. Subsequent
empirical analyses also support our assumption and confirm the effectiveness of
the method
Controlling Styles in Neural Machine Translation with Activation Prompt
Controlling styles in neural machine translation (NMT) has attracted wide
attention, as it is crucial for enhancing user experience. Earlier studies on
this topic typically concentrate on regulating the level of formality and
achieve some progress in this area. However, they still encounter two major
challenges. The first is the difficulty in style evaluation. The style
comprises various aspects such as lexis, syntax, and others that provide
abundant information. Nevertheless, only formality has been thoroughly
investigated. The second challenge involves excessive dependence on incremental
adjustments, particularly when new styles are necessary. To address both
challenges, this paper presents a new benchmark and approach. A multiway
stylized machine translation (MSMT) benchmark is introduced, incorporating
diverse categories of styles across four linguistic domains. Then, we propose a
method named style activation prompt (StyleAP) by retrieving prompts from
stylized monolingual corpus, which does not require extra fine-tuning.
Experiments show that StyleAP could effectively control the style of
translation and achieve remarkable performance.Comment: Accepted by Findings of ACL 2023; The code is available at
https://github.com/IvanWang0730/StyleA
Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation
Document-level Neural Machine Translation (DocNMT) has been proven crucial
for handling discourse phenomena by introducing document-level context
information. One of the most important directions is to input the whole
document directly to the standard Transformer model. In this case, efficiency
becomes a critical concern due to the quadratic complexity of the attention
module. Existing studies either focus on the encoder part, which cannot be
deployed on sequence-to-sequence generation tasks, e.g., Machine Translation
(MT), or suffer from a significant performance drop. In this work, we keep the
translation performance while gaining 20\% speed up by introducing extra
selection layer based on lightweight attention that selects a small portion of
tokens to be attended. It takes advantage of the original attention to ensure
performance and dimension reduction to accelerate inference. Experimental
results show that our method could achieve up to 95\% sparsity (only 5\% tokens
attended) approximately, and save 93\% computation cost on the attention module
compared with the original Transformer, while maintaining the performance.Comment: Accepted by AACL 202
Accurate and lightweight dehazing via multi-receptive-field non-local network and novel contrastive regularization
Recently, deep learning-based methods have dominated image dehazing domain.
Although very competitive dehazing performance has been achieved with
sophisticated models, effective solutions for extracting useful features are
still under-explored. In addition, non-local network, which has made a
breakthrough in many vision tasks, has not been appropriately applied to image
dehazing. Thus, a multi-receptive-field non-local network (MRFNLN) consisting
of the multi-stream feature attention block (MSFAB) and cross non-local block
(CNLB) is presented in this paper. We start with extracting richer features for
dehazing. Specifically, we design a multi-stream feature extraction (MSFE)
sub-block, which contains three parallel convolutions with different receptive
fields (i.e., , , ) for extracting multi-scale
features. Following MSFE, we employ an attention sub-block to make the model
adaptively focus on important channels/regions. The MSFE and attention
sub-blocks constitute our MSFAB. Then, we design a cross non-local block
(CNLB), which can capture long-range dependencies beyond the query. Instead of
the same input source of query branch, the key and value branches are enhanced
by fusing more preceding features. CNLB is computation-friendly by leveraging a
spatial pyramid down-sampling (SPDS) strategy to reduce the computation and
memory consumption without sacrificing the performance. Last but not least, a
novel detail-focused contrastive regularization (DFCR) is presented by
emphasizing the low-level details and ignoring the high-level semantic
information in the representation space. Comprehensive experimental results
demonstrate that the proposed MRFNLN model outperforms recent state-of-the-art
dehazing methods with less than 1.5 Million parameters.Comment: submitted to IEEE TCYB for possible publicatio
Prompt-based test-time real image dehazing: a novel pipeline
Existing methods attempt to improve models' generalization ability on
real-world hazy images by exploring well-designed training schemes (e.g.,
CycleGAN, prior loss). However, most of them need very complicated training
procedures to achieve satisfactory results. In this work, we present a totally
novel testing pipeline called Prompt-based Test-Time Dehazing (PTTD) to help
generate visually pleasing results of real-captured hazy images during the
inference phase. We experimentally find that given a dehazing model trained on
synthetic data, by fine-tuning the statistics (i.e., mean and standard
deviation) of encoding features, PTTD is able to narrow the domain gap,
boosting the performance of real image dehazing. Accordingly, we first apply a
prompt generation module (PGM) to generate a visual prompt, which is the source
of appropriate statistical perturbations for mean and standard deviation. And
then, we employ the feature adaptation module (FAM) into the existing dehazing
models for adjusting the original statistics with the guidance of the generated
prompt. Note that, PTTD is model-agnostic and can be equipped with various
state-of-the-art dehazing models trained on synthetic hazy-clean pairs.
Extensive experimental results demonstrate that our PTTD is flexible meanwhile
achieves superior performance against state-of-the-art dehazing methods in
real-world scenarios. The source code of our PTTD will be made available at
https://github.com/cecret3350/PTTD-Dehazing.Comment: update github link (https://github.com/cecret3350/PTTD-Dehazing
Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts
Domain adaptation is an important challenge for neural machine translation.
However, the traditional fine-tuning solution requires multiple extra training
and yields a high cost. In this paper, we propose a non-tuning paradigm,
resolving domain adaptation with a prompt-based method. Specifically, we
construct a bilingual phrase-level database and retrieve relevant pairs from it
as a prompt for the input sentences. By utilizing Retrieved Phrase-level
Prompts (RePP), we effectively boost the translation quality. Experiments show
that our method improves domain-specific machine translation for 6.2 BLEU
scores and improves translation constraints for 11.5% accuracy without
additional training
“Greening” Worcester: Municipal Best Practices for Sustainability
In response to the urgent threat posed by climate change, more and more cities, including Worcester, are attempting to become more environmentally responsible and sustainable. Worcester is attempting to develop ways to become more sustainable; both to strengthen their communities and to protect the planet. The Green Worcester Working Group (GWWG) tasked the Clark Capstone Team with researching best practices for municipal sustainability. The GWWG has set the following priorities: climate change mitigation, resilience, open spaces, sustainable resource management, education and awareness. Taking these into account, the Clark Capstone Team researched the sustainability practices of cities in New England, across the U.S., and around the world, gathering and synthesizing the information found. Through careful data evaluation, the team selected six cities to recommend: Portsmouth, NH; Cambridge, MA; Bridgeport, CT; Somerville, MA; Seattle, WA; and New York, NY
Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation
Multimodal machine translation (MMT) aims to improve translation quality by
incorporating information from other modalities, such as vision. Previous MMT
systems mainly focus on better access and use of visual information and tend to
validate their methods on image-related datasets. These studies face two
challenges. First, they can only utilize triple data (bilingual texts with
images), which is scarce; second, current benchmarks are relatively restricted
and do not correspond to realistic scenarios. Therefore, this paper
correspondingly establishes new methods and new datasets for MMT. First, we
propose a framework 2/3-Triplet with two new approaches to enhance MMT by
utilizing large-scale non-triple data: monolingual image-text data and parallel
text-only data. Second, we construct an English-Chinese {e}-commercial
{m}ulti{m}odal {t}ranslation dataset (including training and testing), named
EMMT, where its test set is carefully selected as some words are ambiguous and
shall be translated mistakenly without the help of images. Experiments show
that our method is more suitable for real-world scenarios and can significantly
improve translation performance by using more non-triple data. In addition, our
model also rivals various SOTA models in conventional multimodal translation
benchmarks.Comment: 8 pages, ACL 2023 Findin
- …