85 research outputs found

    Data-driven sentence simplification: Survey and benchmark

    Get PDF
    Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common datasets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments

    Multistage BiCross encoder for multilingual access to COVID-19 health information

    Get PDF
    The Coronavirus (COVID-19) pandemic has led to a rapidly growing ‘infodemic’ of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs

    MTCue: learning zero-shot control of extra-textual attributes by leveraging unstructured context in neural machine translation

    Get PDF
    Efficient utilisation of both intra- and extra-textual context remains one of the critical gaps between machine and human translation. Existing research has primarily focused on providing individual, well-defined types of context in translation, such as the surrounding text or discrete external variables like the speaker's gender. This work introduces MTCUE, a novel neural machine translation (NMT) framework that interprets all context (including discrete variables) as text. MTCUE learns an abstract representation of context, enabling transferability across different data settings and leveraging similar attributes in low-resource scenarios. With a focus on a dialogue domain with access to document and metadata context, we extensively evaluate MTCUE in four language pairs in both translation directions. Our framework demonstrates significant improvements in translation quality over a parameter-matched non-contextual baseline, as measured by BLEU (+0.88) and COMET (+1.58). Moreover, MTCUE significantly outperforms a “tagging” baseline at translating English text. Analysis reveals that the context encoder of MTCUE learns a representation space that organises context based on specific attributes, such as formality, enabling effective zero-shot control. Pretraining on context embeddings also improves MTCUE's few-shot performance compared to the “tagging” baseline. Finally, an ablation study conducted on model components and contextual variables further supports the robustness of MTCUE for context-based NMT

    Measuring what counts : the case of rumour stance classification

    Get PDF
    Stance classification can be a powerful tool for understanding whether and which users believe in online rumours. The task aims to automatically predict the stance of replies towards a given rumour, namely support, deny, question, or comment. Numerous methods have been proposed and their performance compared in the RumourEval shared tasks in 2017 and 2019. Results demonstrated that this is a challenging problem since naturally occurring rumour stance data is highly imbalanced. This paper specifically questions the evaluation metrics used in these shared tasks. We re-evaluate the systems submitted to the two RumourEval tasks and show that the two widely adopted metrics – accuracy and macro-F1 – are not robust for the four-class imbalanced task of rumour stance classification, as they wrongly favour systems with highly skewed accuracy towards the majority class. To overcome this problem, we propose new evaluation metrics for rumour stance detection. These are not only robust to imbalanced data but also score higher systems that are capable of recognising the two most informative minority classes (support and deny)

    UTDRM: unsupervised method for training debunked-narrative retrieval models

    Get PDF
    A key task in the fact-checking workflow is to establish whether the claim under investigation has already been debunked or fact-checked before. This is essentially a retrieval task where a misinformation claim is used as a query to retrieve from a corpus of debunks. Prior debunk retrieval methods have typically been trained on annotated pairs of misinformation claims and debunks. The novelty of this paper is an Unsupervised Method for Training Debunked-Narrative Retrieval Models (UTDRM) in a zero-shot setting, eliminating the need for human-annotated pairs. This approach leverages fact-checking articles for the generation of synthetic claims and employs a neural retrieval model for training. Our experiments show that UTDRM tends to match or exceed the performance of state-of-the-art methods on seven datasets, which demonstrates its effectiveness and broad applicability. The paper also analyses the impact of various factors on UTDRM’s performance, such as the quantity of fact-checking articles utilised, the number of synthetically generated claims employed, the proposed entity inoculation method, and the usage of large language models for retrieval

    Testing the performance of an innovative markerless technique for quantitative and qualitative gait analysis

    Get PDF
    Gait abnormalities such as high stride and step frequency/cadence (SF-stride/second, CAD-step/second), stride variability (SV) and low harmony may increase the risk of injuries and be a sentinel of medical conditions. This research aims to present a new markerless video-based technology for quantitative and qualitative gait analysis. 86 healthy individuals (mead age 32 years) performed a 90 s test on treadmill at self-selected walking speed. We measured SF and CAD by a photoelectric sensors system; then, we calculated average \ub1 standard deviation (SD) and within-subject coefficient of variation (CV) of SF as an index of SV. We also recorded a 60 fps video of the patient. With a custom-designed web-based video analysis software, we performed a spectral analysis of the brightness over time for each pixel of the image, that reinstituted the frequency contents of the videos. The two main frequency contents (F1 and F2) from this analysis should reflect the forcing/dominant variables, i.e., SF and CAD. Then, a harmony index (HI) was calculated, that should reflect the proportion of the pixels of the image that move consistently with F1 or its supraharmonics. The higher the HI value, the less variable the gait. The correspondence SF-F1 and CAD-F2 was evaluated with both paired t-Test and correlation and the relationship between SV and HI with correlation. SF and CAD were not significantly different from and highly correlated with F1 (0.893 \ub1 0.080 Hz vs. 0.895 \ub1 0.084 Hz, p < 0.001, r2 = 0.99) and F2 (1.787 \ub1 0.163 Hz vs. 1.791 \ub1 0.165 Hz, p < 0.001, r2 = 0.97). The SV was 1.84% \ub1 0.66% and it was significantly and moderately correlated with HI (0.082 \ub1 0.028, p < 0.001, r2 = 0.13). The innovative video-based technique of global, markerless gait analysis proposed in our study accurately identifies the main frequency contents and the variability of gait in healthy individuals, thus providing a time-efficient, low-cost means to quantitatively and qualitatively study human locomotion

    The (un)suitability of automatic evaluation metrics for text simplification

    Get PDF
    In order to simplify sentences, several rewriting operations can be performed such as replacing complex words per simpler synonyms, deleting unnecessary information, and splitting long sentences. Despite this multi-operation nature, evaluation of automatic simplification systems relies on metrics that moderately correlate with human judgements on the simplicity achieved by executing specific operations (e.g. simplicity gain based on lexical replacements). In this article, we investigate how well existing metrics can assess sentence-level simplifications where multiple operations may have been applied and which, therefore, require more general simplicity judgements. For that, we first collect a new and more reliable dataset for evaluating the correlation of metrics and human judgements of overall simplicity. Second, we conduct the first meta-evaluation of automatic metrics in Text Simplification, using our new dataset (and other existing data) to analyse the variation of the correlation between metrics’ scores and human judgements across three dimensions: the perceived simplicity level, the system type and the set of references used for computation. We show that these three aspects affect the correlations and, in particular, highlight the limitations of commonly-used operation-specific metrics. Finally, based on our findings, we propose a set of recommendations for automatic evaluation of multi-operation simplifications, suggesting which metrics to compute and how to interpret their scores

    ASSET : a dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations

    Get PDF
    In order to simplify a sentence, human editors perform multiple rewriting transformations: they split it into several shorter sentences, paraphrase words (i.e. replacing complex words or phrases by simpler synonyms), reorder components, and/or delete information deemed unnecessary. Despite these varied range of possible text alterations, current models for automatic sentence simplification are evaluated using datasets that are focused on a single transformation, such as lexical paraphrasing or splitting. This makes it impossible to understand the ability of simplification models in more realistic settings. To alleviate this limitation, this paper introduces ASSET, a new dataset for assessing sentence simplification in English. ASSET is a crowdsourced multi-reference corpus where each simplification was produced by executing several rewriting transformations. Through quantitative and qualitative experiments, we show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task. Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed

    Toxic language detection in social media for Brazilian Portuguese : new dataset and multilingual analysis

    Get PDF
    Hate speech and toxic comments are a common concern of social media platform users. Although these comments are, fortunately, the minority in these platforms, they are still capable of causing harm. Therefore, identifying these comments is an important task for studying and preventing the proliferation of toxicity in social media. Previous work in automatically detecting toxic comments focus mainly in English, with very few work in languages like Brazilian Portuguese. In this paper, we propose a new large-scale dataset for Brazilian Portuguese with tweets annotated as either toxic or non-toxic or in different types of toxicity. We present our dataset collection and annotation process, where we aimed to select candidates covering multiple demographic groups. State-of-the-art BERT models were able to achieve 76% macro-F1 score using monolingual data in the binary case. We also show that large-scale monolingual data is still needed to create more accurate models, despite recent advances in multilingual approaches. An error analysis and experiments with multi-label classification show the difficulty of classifying certain types of toxic comments that appear less frequently in our data and highlights the need to develop models that are aware of different categories of toxicity
    • 

    corecore