115 research outputs found
Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels
One important assumption underlying common classification models is the
stationarity of the data. However, in real-world streaming applications, the
data concept indicated by the joint distribution of feature and label is not
stationary but drifting over time. Concept drift detection aims to detect such
drifts and adapt the model so as to mitigate any deterioration in the model's
predictive performance. Unfortunately, most existing concept drift detection
methods rely on a strong and over-optimistic condition that the true labels are
available immediately for all already classified instances. In this paper, a
novel Hierarchical Hypothesis Testing framework with Request-and-Reverify
strategy is developed to detect concept drifts by requesting labels only when
necessary. Two methods, namely Hierarchical Hypothesis Testing with
Classification Uncertainty (HHT-CU) and Hierarchical Hypothesis Testing with
Attribute-wise "Goodness-of-fit" (HHT-AG), are proposed respectively under the
novel framework. In experiments with benchmark datasets, our methods
demonstrate overwhelming advantages over state-of-the-art unsupervised drift
detectors. More importantly, our methods even outperform DDM (the widely used
supervised drift detector) when we use significantly fewer labels.Comment: Published as a conference paper at IJCAI 201
BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training
Automatic metrics play a crucial role in machine translation. Despite the
widespread use of n-gram-based metrics, there has been a recent surge in the
development of pre-trained model-based metrics that focus on measuring sentence
semantics. However, these neural metrics, while achieving higher correlations
with human evaluations, are often considered to be black boxes with potential
biases that are difficult to detect. In this study, we systematically analyze
and compare various mainstream and cutting-edge automatic metrics from the
perspective of their guidance for training machine translation systems. Through
Minimum Risk Training (MRT), we find that certain metrics exhibit robustness
defects, such as the presence of universal adversarial translations in BLEURT
and BARTScore. In-depth analysis suggests two main causes of these robustness
deficits: distribution biases in the training datasets, and the tendency of the
metric paradigm. By incorporating token-level constraints, we enhance the
robustness of evaluation metrics, which in turn leads to an improvement in the
performance of machine translation systems. Codes are available at
\url{https://github.com/powerpuffpomelo/fairseq_mrt}.Comment: Accepted to ACL 2023 main conferenc
Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation
Document-level Neural Machine Translation (DocNMT) has been proven crucial
for handling discourse phenomena by introducing document-level context
information. One of the most important directions is to input the whole
document directly to the standard Transformer model. In this case, efficiency
becomes a critical concern due to the quadratic complexity of the attention
module. Existing studies either focus on the encoder part, which cannot be
deployed on sequence-to-sequence generation tasks, e.g., Machine Translation
(MT), or suffer from a significant performance drop. In this work, we keep the
translation performance while gaining 20\% speed up by introducing extra
selection layer based on lightweight attention that selects a small portion of
tokens to be attended. It takes advantage of the original attention to ensure
performance and dimension reduction to accelerate inference. Experimental
results show that our method could achieve up to 95\% sparsity (only 5\% tokens
attended) approximately, and save 93\% computation cost on the attention module
compared with the original Transformer, while maintaining the performance.Comment: Accepted by AACL 202
Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts
Domain adaptation is an important challenge for neural machine translation.
However, the traditional fine-tuning solution requires multiple extra training
and yields a high cost. In this paper, we propose a non-tuning paradigm,
resolving domain adaptation with a prompt-based method. Specifically, we
construct a bilingual phrase-level database and retrieve relevant pairs from it
as a prompt for the input sentences. By utilizing Retrieved Phrase-level
Prompts (RePP), we effectively boost the translation quality. Experiments show
that our method improves domain-specific machine translation for 6.2 BLEU
scores and improves translation constraints for 11.5% accuracy without
additional training
Feasibility of accelerated T2 mapping for the preoperative assessment of endometrial carcinoma
ObjectiveThe application value of T2 mapping in evaluating endometrial carcinoma (EMC) features remains unclear. The aim of the study was to determine the quantitative T2 values in EMC using a novel accelerated T2 mapping, and evaluate them for detection, classification,and grading of EMC.Materials and methodsFifty-six patients with pathologically confirmed EMC and 17 healthy volunteers were prospectively enrolled in this study. All participants underwent pelvic magnetic resonance imaging, including DWI and accelerated T2 mapping, before treatment. The T2 and apparent diffusion coefficient (ADC) values of different pathologic EMC features were extracted and compared. Receiver operating characteristic (ROC) curve analysis was performed to analyze the diagnostic efficacy of the T2 and ADC values in distinguishing different pathological features of EMC.ResultsThe T2 values and ADC values were significantly lower in EMC than in normal endometrium (bothl p < 0.05). The T2 and ADC values were significantly different between endometrioid adenocarcinoma (EA) and non-EA (both p < 0.05) and EMC tumor grades (all p < 0.05) but not for EMC clinical types (both p > 0.05) and depth of myometrial invasion (both p > 0.05). The area under the ROC curve (AUC) was higher for T2 values than for ADC values in predicting grade 3 EA (0.939 vs. 0.764, p = 0.048). When combined T2 and ADC values, the AUC for predicting grade 3 EA showed a significant increase to 0.947 (p = 0.03) compared with those of ADC values. The T2 and ADC values were negatively correlated with the tumor grades (r = -0.706 and r = -0.537, respectively).ConclusionQuantitative T2 values demonstrate potential suitability in discriminating between EMC and normal endometrium, EA and non-EA, grade 3 EA and grade 1/2 EA. Combining T2 and ADC values performs better in predicting the histological grades of EA in comparison with ADC values alone
- …