86 research outputs found

    Textual Data Augmentation for Efficient Active Learning on Tiny Datasets

    Get PDF
    In this paper we propose a novel data augmentation approach where guided outputs of a language generation model, e.g. GPT-2, when labeled, can improve the performance of text classifiers through an active learning process. We transform the data generation task into an optimization problem which maximizes the usefulness of the generated output, using Monte Carlo Tree Search (MCTS) as the optimization strategy and incorporating entropy as one of the optimization criteria. We test our approach against a Non-Guided Data Generation (NGDG) process that does not optimize for a reward function. Starting with a small set of data, our results show an increased performance with MCTS of 26% on the TREC-6 Questions dataset, and 10% on the Stanford Sentiment Treebank SST-2 dataset. Compared with NGDG, we are able to achieve increases of 3% and 5% on TREC-6 and SST-2

    Variation in the timing of Covid-19 communication across universities in the UK.

    Get PDF
    During the Covid-19 pandemic, universities in the UK used social media to raise awareness and provide guidance and advice about the disease to students and staff. We explain why some universities used social media to communicate with stakeholders sooner than others. To do so, we identified the date of the first Covid-19 related tweet posted by each university in the country and used survival models to estimate the effect of university-specific characteristics on the timing of these messages. In order to confirm our results, we supplemented our analysis with a study of the introduction of coronavirus-related university webpages. We find that universities with large numbers of students are more likely to use social media and the web to speak about the pandemic sooner than institutions with fewer students. Universities with large financial resources are also more likely to tweet sooner, but they do not introduce Covid-19 webpages faster than other universities. We also find evidence of a strong process of emulation, whereby universities are more likely to post a coronavirus-related tweet or webpage if other universities have already done so

    Text generation for small data regimes

    Get PDF
    In Natural Language Processing (NLP), applications trained on downstream tasks for text classification usually require enormous amounts of data to perform well. Neural Network (NN) models are among the applications that can always be trained to produce better results. Yet, a huge factor in improving results is the ability to scale over large datasets. Given that Deep NNs are known to be data hungry, having more training samples can always be beneficial. For a classification model to perform well, it could require thousands or even millions of textual training examples. Transfer learning enables us to leverage knowledge gained from general data collections to perform well on target tasks. In NLP, training language models on large data collections has been shown to achieve great results when tuned to different task-specific datasets Wang et al. (2019, 2018a). However, even with transfer learning, adequate training data remains a condition for training machine learning models. Nonetheless, we show that small textual datasets can be augmented to a degree that is enough to achieve improved classification performance. In this thesis, we make multiple contributions to data augmentation. Firstly, we transform the data generation task into an optimization problem which maximizes the usefulness of the generated output, using Monte Carlo Tree Search (MCTS) as the optimization strategy and incorporating entropy as one of the optimization criteria. Secondly, we propose a language generation approach for targeted data generation with the participation of the training classifier. With a user in the loop, we find that manual annotation of a small proportion of the generated data is enough to boost classification performance. Thirdly, under a self-learning scheme, we replace the user by an automated approach in which the classifier is trained on its own pseudo-labels. Finally, we extend the data generation approach to the knowledge distillation domain, by generating samples that a teacher model can confidently label, but not its student

    Enhancing Task-Specific Distillation in Small Data Regimes through Language Generation

    Get PDF
    Large-scale pretrained language models have led to significant improvements in Natural Language Processing. Unfortunately, they come at the cost of high computational and storage requirements that complicate their deployment on low-resource devices. This issue can be addressed by distilling knowledge from larger models to smaller ones through pseudo-labels on task-specific datasets. However, this can be difficult for tasks with very limited data. To overcome this challenge, we present a novel approach where knowledge can be distilled from a teacher model to a student model through the generation of synthetic data. For this to be done, we first fine-tune the teacher and student models, as well as a Natural Language Generation (NLG) model, on the target task dataset. We then let both student and teacher work together to condition the NLG model to generate examples that can enhance the performance of the student. We tested our approach on two data generation methods: a) Targeted generation using the Monte Carlo Tree Search (MCTS) algorithm, and b) A Non-Targeted Text Generation (NTTG) method. We evaluate the effectiveness of our approaches against a baseline that uses the BERT model for data augmentation through random word replacement. By testing this approach on the SST-2, MRPC, YELP-2, DBpedia, and TREC-6 datasets, we consistently witnessed considerable improvements over the word-replacement baseline

    Variation in the timing of Covid-19 communication across universities in the UK

    Get PDF
    During the Covid-19 pandemic, universities in the UK used social media to raise awareness and provide guidance and advice about the disease to students and staff. We explain why some universities used social media to communicate with stakeholders sooner than others. To do so, we identified the date of the first Covid-19 related tweet posted by each university in the country and used survival models to estimate the effect of university-specific characteristics on the timing of these messages. In order to confirm our results, we supplemented our analysis with a study of the introduction of coronavirus-related university webpages. We find that universities with large numbers of students are more likely to use social media and the web to speak about the pandemic sooner than institutions with fewer students. Universities with large financial resources are also more likely to tweet sooner, but they do not introduce Covid-19 webpages faster than other universities. We also find evidence of a strong process of emulation, whereby universities are more likely to post a coronavirus-related tweet or webpage if other universities have already done so

    Prediction of early weight gain during psychotropic treatment using a combinatorial model with clinical and genetic markers.

    Get PDF
    Psychotropic drugs can induce significant (>5%) weight gain (WG) already after 1 month of treatment, which is a good predictor for major WG at 3 and 12 months. The large interindividual variability of drug-induced WG can be explained in part by genetic and clinical factors. The aim of this study was to determine whether extensive analysis of genes, in addition to clinical factors, can improve prediction of patients at risk for more than 5% WG at 1 month of treatment. Data were obtained from a 1-year naturalistic longitudinal study, with weight monitoring during weight-inducing psychotropic treatment. A total of 248 Caucasian psychiatric patients, with at least baseline and 1-month weight measures, and with compliance ascertained were included. Results were tested for replication in a second cohort including 32 patients. Age and baseline BMI were associated significantly with strong WG. The area under the curve (AUC) of the final model including genetic (18 genes) and clinical variables was significantly greater than that of the model including clinical variables only (AUCfinal: 0.92, AUCclinical: 0.75, P<0.0001). Predicted accuracy increased by 17% with genetic markers (Accuracyfinal: 87%), indicating that six patients must be genotyped to avoid one misclassified patient. The validity of the final model was confirmed in a replication cohort. Patients predicted before treatment as having more than 5% WG after 1 month of treatment had 4.4% more WG over 1 year than patients predicted to have up to 5% WG (P≤0.0001). These results may help to implement genetic testing before starting psychotropic drug treatment to identify patients at risk of important WG

    Association of genetic risk scores with body mass index in Swiss psychiatric cohorts.

    Get PDF
    OBJECTIVE: Weight gain is associated with psychiatric disorders and/or with psychotropic drug treatments. We analyzed in three psychiatric cohorts under psychotropic treatment the association of weighted genetic risk scores (w-GRSs) with BMI by integrating BMI-related polymorphisms from the candidate-gene approach and Genome-Wide Association Studies (GWAS). MATERIALS AND METHODS: w-GRS of 32 polymorphisms associated previously with BMI in general population GWAS and 20 polymorphisms associated with antipsychotics-induced weight gain were investigated in three independent psychiatric samples. RESULTS: w-GRS of 32 polymorphisms were significantly associated with BMI in the psychiatric sample 1 (n=425) and were replicated in another sample (n=177). Those at the percentile 95 (p95) of the score had 2.26 and 2.99 kg/m higher predicted BMI compared with individuals at the percentile 5 (p5) in sample 1 and in sample 3 (P=0.009 and 0.04, respectively). When combining all samples together (n=750), a significant difference of 1.89 kg/m predicted BMI was found between p95 and p5 individuals at 12 months of treatment. Stronger associations were found among men (difference: 2.91 kg/m of predicted BMI between p95 and p5, P=0.0002), whereas no association was found among women. w-GRS of 20 polymorphisms was not associated with BMI. The w-GRS of 52 polymorphisms and the clinical variables (age, sex, treatment) explained 1.99 and 3.15%, respectively, of BMI variability. CONCLUSION: The present study replicated in psychiatric cohorts previously identified BMI risk variants obtained in GWAS analyses from population-based samples. Sex-specific analysis should be considered in further analysis

    Impact of HSD11B1 polymorphisms on BMI and components of the metabolic syndrome in patients receiving psychotropic treatments.

    Get PDF
    BACKGROUND: Metabolic syndrome (MetS) associated with psychiatric disorders and psychotropic treatments represents a major health issue. 11β-Hydroxysteroid dehydrogenase type 1 (11β-HSD1) is an enzyme that catalyzes tissue regeneration of active cortisol from cortisone. Elevated enzymatic activity of 11β-HSD1 may lead to the development of MetS. METHODS: We investigated the association between seven HSD11B1 gene (encoding 11β-HSD1) polymorphisms and BMI and MetS components in a psychiatric sample treated with potential weight gain-inducing psychotropic drugs (n=478). The polymorphisms that survived Bonferroni correction were analyzed in two independent psychiatric samples (nR1=168, nR2=188) and in several large population-based samples (n1=5338; n2=123 865; n3>100 000). RESULTS: HSD11B1 rs846910-A, rs375319-A, and rs4844488-G allele carriers were found to be associated with lower BMI, waist circumference, and diastolic blood pressure compared with the reference genotype (Pcorrected<0.05). These associations were exclusively detected in women (n=257) with more than 3.1 kg/m, 7.5 cm, and 4.2 mmHg lower BMI, waist circumference, and diastolic blood pressure, respectively, in rs846910-A, rs375319-A, and rs4844488-G allele carriers compared with noncarriers (Pcorrected<0.05). Conversely, carriers of the rs846906-T allele had significantly higher waist circumference and triglycerides and lower high-density lipoprotein-cholesterol exclusively in men (Pcorrected=0.028). The rs846906-T allele was also associated with a higher risk of MetS at 3 months of follow-up (odds ratio: 3.31, 95% confidence interval: 1.53-7.17, Pcorrected=0.014). No association was observed between HSD11B1 polymorphisms and BMI and MetS components in the population-based samples. CONCLUSIONS: Our results indicate that HSD11B1 polymorphisms may contribute toward the development of MetS in psychiatric patients treated with potential weight gain-inducing psychotropic drugs, but do not play a significant role in the general population

    Association of CRTC1 polymorphisms with obesity markers in subjects from the general population with lifetime depression.

    Get PDF
    Psychiatric disorders have been hypothesized to share common etiological pathways with obesity, suggesting related neurobiological bases. We aimed to examine whether CRTC1 polymorphisms were associated with major depressive disorder (MDD) and to test the association of these polymorphisms with obesity markers in several large case-control samples with MDD. The association between CRTC1 polymorphisms and MDD was investigated in three case-control samples with MDD (PsyCoLaus n1=3,362, Radiant n2=3,148 and NESDA/NTR n3=4,663). The effect of CRTC1 polymorphisms on obesity markers was then explored. CRTC1 polymorphisms were not associated with MDD in the three samples. CRTC1 rs6510997C>T was significantly associated with fat mass in the PsyCoLaus study. In fact, a protective effect of this polymorphism was found in MDD cases (n=1,434, β=-1.32%, 95% CI -2.07 to -0.57, p<0.001), but not in controls. In the Radiant study, CRTC1 polymorphisms were associated with BMI, exclusively in individuals with MDD (n=2,138, β=-0.75kg/m(2), 95% CI -1.30 to -0.21, p=0.007), while no association with BMI was found in the NESDA/NTR study. Estimated fat mass using bioimpedance that capture more accurately adiposity was only present in the PsyCoLaus sample. CRTC1 polymorphisms seem to play a role with obesity markers in individuals with MDD rather than non-depressive individuals. Therefore, the weak association previously reported in the population-based samples was driven by cases diagnosed with lifetime MDD. However, CRTC1 seems not to be implicated directly in the development of psychiatric diseases

    A deep CNN architecture with novel pooling layer applied to two Sudanese Arabic sentiment data sets

    Get PDF
    Arabic sentiment analysis has become an important research field in recent years. Initially, work focused on Modern Standard Arabic (MSA), which is the most widely used form. Since then, work has been carried out on several different dialects, including Egyptian, Levantine and Moroccan. Moreover, a number of data sets have been created to support such work. However, up until now, no work has been carried out on Sudanese Arabic, a dialect which has 32 million speakers. In this article, two new public data sets are introduced, the two-class Sudanese Sentiment Data set (SudSenti2) and the three-class Sudanese Sentiment Data set (SudSenti3). In the preparation phase, we establish a Sudanese stopword list. Furthermore, a convolutional neural network (CNN) architecture, Sentiment Convolutional MMA (SCM), is proposed, comprising five CNN layers together with a novel Mean Max Average (MMA) pooling layer, to extract the best features. This SCM model is applied to SudSenti2 and SudSenti3 and shown to be superior to the baseline models, with accuracies of 92.25% and 85.23% (Experiments 1 and 2). The performance of MMA is compared with Max, Avg and Min and shown to be better on SudSenti2, the Saudi Sentiment Data set and the MSA Hotel Arabic Review Data set by 1.00%, 0.83% and 0.74%, respectively (Experiment 3). Next, we conduct an ablation study to determine the contribution to performance of text normalisation and the Sudanese stopword list (Experiment 4). For normalisation, this makes a difference of 0.43% on two-class and 0.45% on three-class. For the custom stoplist, the differences are 0.82% and 0.72%, respectively. Finally, the model is compared with other deep learning classifiers, including transformer-based language models for Arabic, and shown to be comparable for SudSenti2 (Experiment 5)
    corecore