115 research outputs found

    Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels

    Full text link
    One important assumption underlying common classification models is the stationarity of the data. However, in real-world streaming applications, the data concept indicated by the joint distribution of feature and label is not stationary but drifting over time. Concept drift detection aims to detect such drifts and adapt the model so as to mitigate any deterioration in the model's predictive performance. Unfortunately, most existing concept drift detection methods rely on a strong and over-optimistic condition that the true labels are available immediately for all already classified instances. In this paper, a novel Hierarchical Hypothesis Testing framework with Request-and-Reverify strategy is developed to detect concept drifts by requesting labels only when necessary. Two methods, namely Hierarchical Hypothesis Testing with Classification Uncertainty (HHT-CU) and Hierarchical Hypothesis Testing with Attribute-wise "Goodness-of-fit" (HHT-AG), are proposed respectively under the novel framework. In experiments with benchmark datasets, our methods demonstrate overwhelming advantages over state-of-the-art unsupervised drift detectors. More importantly, our methods even outperform DDM (the widely used supervised drift detector) when we use significantly fewer labels.Comment: Published as a conference paper at IJCAI 201

    BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training

    Full text link
    Automatic metrics play a crucial role in machine translation. Despite the widespread use of n-gram-based metrics, there has been a recent surge in the development of pre-trained model-based metrics that focus on measuring sentence semantics. However, these neural metrics, while achieving higher correlations with human evaluations, are often considered to be black boxes with potential biases that are difficult to detect. In this study, we systematically analyze and compare various mainstream and cutting-edge automatic metrics from the perspective of their guidance for training machine translation systems. Through Minimum Risk Training (MRT), we find that certain metrics exhibit robustness defects, such as the presence of universal adversarial translations in BLEURT and BARTScore. In-depth analysis suggests two main causes of these robustness deficits: distribution biases in the training datasets, and the tendency of the metric paradigm. By incorporating token-level constraints, we enhance the robustness of evaluation metrics, which in turn leads to an improvement in the performance of machine translation systems. Codes are available at \url{https://github.com/powerpuffpomelo/fairseq_mrt}.Comment: Accepted to ACL 2023 main conferenc

    Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation

    Full text link
    Document-level Neural Machine Translation (DocNMT) has been proven crucial for handling discourse phenomena by introducing document-level context information. One of the most important directions is to input the whole document directly to the standard Transformer model. In this case, efficiency becomes a critical concern due to the quadratic complexity of the attention module. Existing studies either focus on the encoder part, which cannot be deployed on sequence-to-sequence generation tasks, e.g., Machine Translation (MT), or suffer from a significant performance drop. In this work, we keep the translation performance while gaining 20\% speed up by introducing extra selection layer based on lightweight attention that selects a small portion of tokens to be attended. It takes advantage of the original attention to ensure performance and dimension reduction to accelerate inference. Experimental results show that our method could achieve up to 95\% sparsity (only 5\% tokens attended) approximately, and save 93\% computation cost on the attention module compared with the original Transformer, while maintaining the performance.Comment: Accepted by AACL 202

    Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts

    Full text link
    Domain adaptation is an important challenge for neural machine translation. However, the traditional fine-tuning solution requires multiple extra training and yields a high cost. In this paper, we propose a non-tuning paradigm, resolving domain adaptation with a prompt-based method. Specifically, we construct a bilingual phrase-level database and retrieve relevant pairs from it as a prompt for the input sentences. By utilizing Retrieved Phrase-level Prompts (RePP), we effectively boost the translation quality. Experiments show that our method improves domain-specific machine translation for 6.2 BLEU scores and improves translation constraints for 11.5% accuracy without additional training

    Feasibility of accelerated T2 mapping for the preoperative assessment of endometrial carcinoma

    Get PDF
    ObjectiveThe application value of T2 mapping in evaluating endometrial carcinoma (EMC) features remains unclear. The aim of the study was to determine the quantitative T2 values in EMC using a novel accelerated T2 mapping, and evaluate them for detection, classification,and grading of EMC.Materials and methodsFifty-six patients with pathologically confirmed EMC and 17 healthy volunteers were prospectively enrolled in this study. All participants underwent pelvic magnetic resonance imaging, including DWI and accelerated T2 mapping, before treatment. The T2 and apparent diffusion coefficient (ADC) values of different pathologic EMC features were extracted and compared. Receiver operating characteristic (ROC) curve analysis was performed to analyze the diagnostic efficacy of the T2 and ADC values in distinguishing different pathological features of EMC.ResultsThe T2 values and ADC values were significantly lower in EMC than in normal endometrium (bothl p < 0.05). The T2 and ADC values were significantly different between endometrioid adenocarcinoma (EA) and non-EA (both p < 0.05) and EMC tumor grades (all p < 0.05) but not for EMC clinical types (both p > 0.05) and depth of myometrial invasion (both p > 0.05). The area under the ROC curve (AUC) was higher for T2 values than for ADC values in predicting grade 3 EA (0.939 vs. 0.764, p = 0.048). When combined T2 and ADC values, the AUC for predicting grade 3 EA showed a significant increase to 0.947 (p = 0.03) compared with those of ADC values. The T2 and ADC values were negatively correlated with the tumor grades (r = -0.706 and r = -0.537, respectively).ConclusionQuantitative T2 values demonstrate potential suitability in discriminating between EMC and normal endometrium, EA and non-EA, grade 3 EA and grade 1/2 EA. Combining T2 and ADC values performs better in predicting the histological grades of EA in comparison with ADC values alone
    • …
    corecore