12 research outputs found

    A self-training approach for short text clustering

    No full text
    Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations for short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method

    A million tweets are worth a few points : tuning transformers for customer service tasks

    Get PDF
    In online domain-specific customer service applications, many companies struggle to deploy advanced NLP models successfully, due to the limited availability of and noise in their datasets. While prior research demonstrated the potential of migrating large open-domain pretrained models for domain-specific tasks, the appropriate (pre)training strategies have not yet been rigorously evaluated in such social media customer service settings, especially under multilingual conditions. We address this gap by collecting a multilingual social media corpus containing customer service conversations (865k tweets), comparing various pipelines of pretraining and finetuning approaches, applying them on 5 different end tasks. We show that pretraining a generic multilingual transformer model on our in-domain dataset, before finetuning on specific end tasks, consistently boosts performance, especially in non-English settings

    Learning to Reuse Distractors to support Multiple Choice Question Generation in Education

    Full text link
    Multiple choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, due to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an expensive and time-consuming task. A particularly sensitive aspect of MCQ creation is to devise relevant distractors, i.e., wrong answers that are not easily identifiable as being wrong. This paper studies how a large existing set of manually created answers and distractors for questions over a variety of domains, subjects, and languages can be leveraged to help teachers in creating new MCQs, by the smart reuse of existing distractors. We built several data-driven models based on context-aware question and distractor representations, and compared them with static feature-based models. The proposed models are evaluated with automated metrics and in a realistic user test with teachers. Both automatic and human evaluations indicate that context-aware models consistently outperform a static feature-based approach. For our best-performing context-aware model, on average 3 distractors out of the 10 shown to teachers were rated as high-quality distractors. We create a performance benchmark, and make it public, to enable comparison between different approaches and to introduce a more standardized evaluation of the task. The benchmark contains a test of 298 educational questions covering multiple subjects & languages and a 77k multilingual pool of distractor vocabulary for future research.Comment: 24 pages and 4 figures Accepted for publication in IEEE Transactions on Learning technologie

    Towards efficient NLP models : strategies for improving data selection, training, and inference

    No full text
    In tegenstelling tot hun kleinere tegenhangers, hebben grote DNN-modellen in de voorbije jaren opvallend goede resultaten behaald bij het uitvoeren van verschillende NLP-taken. Als gevolg hiervan is er binnen het NLP-domein een race ontstaan om betere resultaten te behalen door grotere modellen te ontwikkelen. Deze trend in de richting van steeds grotere modellen en complexiteit evolueert zo snel dat state-of-the-art modellen nu om de paar maanden worden vervangen. Het huidige tempo is echter onhoudbaar vanwege de kost, beschikbaarheid van hardware, en technische moeilijkheden. Omwille van deze redenen is de efficiëntie van zulke modellen op vlak van architectuur, training en inferentie vanuit verschillende perspectieven van vitaal belang geworden. In dit proefschrift presenteren we verschillende strategieën om de efficiëntie van NLP-modellen te verbeteren. Concreet doorlopen we stadia die vaak worden geassocieerd met de ontwikkeling van NLP-gebaseerde systemen, waaronder (voor)verwerking van gegevens, “pretraining”, “finetuning” en “inference”, en presenteren we verschillende technieken om bepaalde doelen of resultaten te bereiken met minder middelen (bijv. rekenkracht)

    Exploration of block-wise dynamic sparseness

    No full text
    Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these models has seen significant research attention in recent years. In this paper, we present a new method for \emph{dynamic sparseness}, whereby part of the computations are omitted dynamically, based on the input. For efficiency, we combined the idea of dynamic sparseness with block-wise matrix-vector multiplications. In contrast to static sparseness, which permanently zeroes out selected positions in weight matrices, our method preserves the full network capabilities by potentially accessing any trained weights. Yet, matrix vector multiplications are accelerated by omitting a pre-defined fraction of weight blocks from the matrix, based on the input. Experimental results on the task of language modeling, using recurrent and quasi-recurrent models, show that the proposed method can outperform a magnitude-based static sparseness baseline. In addition, our method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time

    An Emotional Journey: Detecting Emotion Trajectories in Dutch Customer Service Dialogues

    No full text
    The ability to track fine-grained emotions in customer service dialogues has many real-world applications, but has not been studied extensively. This paper measures the potential of prediction models on that task, based on a real-world dataset of Dutch Twitter conversations in the domain of customer service. We find that modeling emotion trajectories has a small, but measurable benefit compared to predictions based on isolated turns. The models used in our study are shown to generalize well to different companies and economic sectors

    An emotional journey: Detecting emotion trajectories in dutch customer service dialogues

    No full text
    The ability to track fine-grained emotions in customer service dialogues has many real-world applications, but has not been studied extensively. This paper measures the potential of prediction models on that task, based on a real-world dataset of Dutch Twitter conversations in the domain of customer service. We find that modeling emotion trajectories has a small, but measurable benefit compared to predictions based on isolated turns. The models used in our study are shown to generalize well to different companies and economic sectors

    UGent-T2K at the 2nd DialDoc shared task : a retrieval-focused dialog system grounded in multiple documents

    No full text
    This work presents the contribution from the Text-to-Knowledge team of Ghent University (UGent-T2K) to the MultiDoc2Dial shared task on modeling dialogs grounded in multiple documents. We propose a pipeline system, comprising (1) document retrieval, (2) passage retrieval, and (3) response generation. We engineered these individual components mainly by, for (1)-(2), combining multiple ranking models and adding a final LambdaMART reranker, and, for (3), by adopting a Fusion-in-Decoder (FiD) model. We thus significantly boost the baseline system’s performance (over +10 points for both F1 and SacreBLEU). Further, error analysis reveals two major failure cases, to be addressed in future work: (i) in case of topic shift within the dialog, retrieval often fails to select the correct grounding document(s), and (ii) generation sometimes fails to use the correctly retrieved grounding passage. Our code is released at this link

    EduQG : a multi-format multiple-choice dataset for the educational domain

    No full text
    Natural language processing technology has made significant progress in recent years, fuelled by increasingly powerful general language models. This has also inspired a sizeable body of work targeted specifically towards the educational domain, where the creation of questions (both for assessment and practice) is a laborious/expensive effort. Thus, automatic Question-Generation (QG) solutions have been proposed and studied. Yet, according to a recent survey of the educational QG community's progress, a common baseline dataset unifying multiple domains and question forms (e.g., multiple choice vs. fill-the-gap), including readily available baseline models to compare against, is largely missing. This is the gap we aim to fill with this paper. In particular, we introduce a high-quality dataset in the educational domain, containing over 3,000 entries, comprising (i) multiple-choice questions, (ii) the corresponding answers (including distractors), and (iii) associated passages from the course material used as sources for the questions. Each question is phrased in two forms, normal and cloze (i.e., fill-the-gap), and correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion. Furthermore, 903 questions are accompanied by their cognitive complexity level as per Bloom's taxonomy. All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards. Our analysis and experiments suggest distinguishable differences between our dataset and commonly used ones for question generation for educational purposes. We believe this new dataset can serve as a valuable resource for research and evaluation in the educational domain. The dataset and baselines are made available to support further research in question generation for education (https://github.com/hadifar/question-generation)
    corecore