191 research outputs found

    Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification

    Get PDF
    Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient. Previous results demonstrated that these methods can even improve performance on some classification tasks. This paper complements the existing research by investigating how these techniques influence the classification performance and computation costs compared to full fine-tuning when applied to multilingual text classification tasks (genre, framing, and persuasion techniques detection; with different input lengths, number of predicted classes and classification difficulty), some of which have limited training data. In addition, we conduct in-depth analyses of their efficacy across different training scenarios (training on the original multilingual data; on the translations into English; and on a subset of English-only data) and different languages. Our findings provide valuable insights into the applicability of the parameter-efficient fine-tuning techniques, particularly to complex multilingual and multilabel classification tasks

    An untrained deep learning method for reconstructing dynamic magnetic resonance images from accelerated model-based data

    Full text link
    The purpose of this work is to implement physics-based regularization as a stopping condition in tuning an untrained deep neural network for reconstructing MR images from accelerated data. The ConvDecoder neural network was trained with a physics-based regularization term incorporating the spoiled gradient echo equation that describes variable-flip angle (VFA) data. Fully-sampled VFA k-space data were retrospectively accelerated by factors of R={8,12,18,36} and reconstructed with ConvDecoder (CD), ConvDecoder with the proposed regularization (CD+r), locally low-rank (LR) reconstruction, and compressed sensing with L1-wavelet regularization (L1). Final images from CD+r training were evaluated at the \emph{argmin} of the regularization loss; whereas the CD, LR, and L1 reconstructions were chosen optimally based on ground truth data. The performance measures used were the normalized root-mean square error, the concordance correlation coefficient (CCC), and the structural similarity index (SSIM). The CD+r reconstructions, chosen using the stopping condition, yielded SSIMs that were similar to the CD (p=0.47) and LR SSIMs (p=0.95) across R and that were significantly higher than the L1 SSIMs (p=0.04). The CCC values for the CD+r T1 maps across all R and subjects were greater than those corresponding to the L1 (p=0.15) and LR (p=0.13) T1 maps, respectively. For R > 12 (<4.2 minutes scan time), L1 and LR T1 maps exhibit a loss of spatially refined details compared to CD+r. We conclude that the use of an untrained neural network together with a physics-based regularization loss shows promise as a measure for determining the optimal stopping point in training without relying on fully-sampled ground truth data.Comment: 45 pages, 7 figures, 2 Tables, supplementary material included (10 figures, 4 tables

    Generalisation in named entity recognition: A quantitative analysis

    Get PDF
    Named Entity Recognition (NER) is a key NLP task, which is all the more challenging on Web and user-generated content with their diverse and continuously changing language. This paper aims to quantify how this diversity impacts state-of-the-art NER methods, by measuring named entity (NE) and context variability, feature sparsity, and their effects on precision and recall. In particular, our findings indicate that NER approaches struggle to generalise in diverse genres with limited training data. Unseen NEs, in particular, play an important role, which have a higher incidence in diverse genres such as social media than in more regular genres such as newswire. Coupled with a higher incidence of unseen features more generally and the lack of large training corpora, this leads to significantly lower F1 scores for diverse genres as compared to more regular ones. We also find that leading systems rely heavily on surface forms found in training data, having problems generalising beyond these, and offer explanations for this observation

    Phenotypic alterations in type II alveolar epithelial cells in CD4+ T cell mediated lung inflammation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although the contribution of alveolar type II epithelial cell (AEC II) activities in various aspects of respiratory immune regulation has become increasingly appreciated, our understanding of the contribution of AEC II transcriptosome in immunopathologic lung injury remains poorly understood. We have previously established a mouse model for chronic T cell-mediated pulmonary inflammation in which influenza hemagglutinin (HA) is expressed as a transgene in AEC II, in mice expressing a transgenic T cell receptor specific for a class II-restricted epitope of HA. Pulmonary inflammation in these mice occurs as a result of CD4<sup>+ </sup>T cell recognition of alveolar antigen. This model was utilized to assess the profile of inflammatory mediators expressed by alveolar epithelial target cells triggered by antigen-specific recognition in CD4<sup>+ </sup>T cell-mediated lung inflammation.</p> <p>Methods</p> <p>We established a method that allows the flow cytometric negative selection and isolation of primary AEC II of high viability and purity. Genome wide transcriptional profiling was performed on mRNA isolated from AEC II isolated from healthy mice and from mice with acute and chronic CD4<sup>+ </sup>T cell-mediated pulmonary inflammation.</p> <p>Results</p> <p>T cell-mediated inflammation was associated with expression of a broad array of cytokine and chemokine genes by AEC II cell, indicating a potential contribution of epithelial-derived chemoattractants to the inflammatory cell parenchymal infiltration. Morphologically, there was an increase in the size of activated epithelial cells, and on the molecular level, comparative transcriptome analyses of AEC II from inflamed versus normal lungs provide a detailed characterization of the specific inflammatory genes expressed in AEC II induced in the context of CD4<sup>+ </sup>T cell-mediated pneumonitis.</p> <p>Conclusion</p> <p>An important contribution of AEC II gene expression to the orchestration and regulation of interstitial pneumonitis is suggested by the panoply of inflammatory genes expressed by this cell population, and this may provide insight into the molecular pathogenesis of pulmonary inflammatory states. CD4<sup>+ </sup>T cell recognition of antigen presented by AEC II cells appears to be a potent trigger for activation of the alveolar cell inflammatory transcriptosome.</p

    Fatty Acid Biomarkers of Dairy Fat Consumption and Incidence of Type 2 Diabetes: A Pooled Analysis of Prospective Cohort Studies

    Get PDF
    Background We aimed to investigate prospective associations of circulating or adipose tissue odd-chain fatty acids 15:0 and 17:0 and trans-palmitoleic acid, t16:1n-7, as potential biomarkers of dairy fat intake, with incident type 2 diabetes (T2D). Methods and findings Sixteen prospective cohorts from 12 countries (7 from the United States, 7 from Europe, 1 from Australia, 1 from Taiwan) performed new harmonised individual-level analysis for the prospective associations according to a standardised plan. In total, 63,682 participants with a broad range of baseline ages and BMIs and 15,180 incident cases of T2D over the average of 9 years of follow-up were evaluated. Study-specific results were pooled using inverse-variance±weighted meta-analysis. Prespecified interactions by age, sex, BMI, and race/ethnicity were explored in each cohort and were meta-analysed. Potential heterogeneity by cohort-specific characteristics (regions, lipid compartments used for fatty acid assays) was assessed with metaregression. After adjustment for potential confounders, including measures of adiposity (BMI, waist circumference) and lipogenesis (levels of palmitate, triglycerides), higher levels of 15:0, 17:0, and t16:1n-7 were associated with lower incidence of T2D. In the most adjusted model, the hazard ratio (95% CI) for incident T2D per cohortspecific 10th to 90th percentile range of 15:0 was 0.80 (0.73±0.87); of 17:0, 0.65 (0.59± 0.72); of t16:1n7, 0.82 (0.70±0.96); and of their sum, 0.71 (0.63±0.79). In exploratory analyses, similar associations for 15:0, 17:0, and the sum of all three fatty acids were present in both genders but stronger in women than in men (pinteraction \u3c 0.001). Whereas studying associations with biomarkers has several advantages, as limitations, the biomarkers do not distinguish between different food sources of dairy fat (e.g., cheese, yogurt, milk), and residual confounding by unmeasured or imprecisely measured confounders may exist. Conclusions In a large meta-analysis that pooled the findings from 16 prospective cohort studies, higher levels of 15:0, 17:0, and t16:1n-7 were associated with a lower risk of T2D

    Omega-6 Fatty Acid Biomarkers and Incident Type 2 Diabetes: Pooled Analysis of Individual-Level Data for 39 740 Adults from 20 Prospective Cohort Studies

    Get PDF
    Background: The metabolic effects of omega-6 polyunsaturated fatty acids (PUFAs) remain contentious, and little evidence is available regarding their potential role in primary prevention of type 2 diabetes. We aimed to assess the associations of linoleic acid and arachidonic acid biomarkers with incident type 2 diabetes. Methods: We did a pooled analysis of new, harmonised, individual-level analyses for the biomarkers linoleic acid and its metabolite arachidonic acid and incident type 2 diabetes. We analysed data from 20 prospective cohort studies from ten countries (Iceland, the Netherlands, the USA, Taiwan, the UK, Germany, Finland, Australia, Sweden, and France), with biomarkers sampled between 1970 and 2010. Participants included in the analyses were aged 18 years or older and had data available for linoleic acid and arachidonic acid biomarkers at baseline. We excluded participants with type 2 diabetes at baseline. The main outcome was the association between omega-6 PUFA biomarkers and incident type 2 diabetes. We assessed the relative risk of type 2 diabetes prospectively for each cohort and lipid compartment separately using a prespecified analytic plan for exposures, covariates, effect modifiers, and analysis, and the findings were then pooled using inverse-variance weighted meta-analysis. Findings: Participants were 39 740 adults, aged (range of cohort means) 49-76 years with a BMI (range of cohort means) of 23·3-28·4 kg/m(2), who did not have type 2 diabetes at baseline. During a follow-up of 366 073 person-years, we identified 4347 cases of incident type 2 diabetes. In multivariable-adjusted pooled analyses, higher proportions of linoleic acid biomarkers as percentages of total fatty acid were associated with a lower risk of type 2 diabetes overall (risk ratio [RR] per interquintile range 0·65, 95% CI 0·60-0·72,

    The genome of the emerging barley pathogen Ramularia collo-cygni

    Get PDF
    Background Ramularia collo-cygni is a newly important, foliar fungal pathogen of barley that causes the disease Ramularia leaf spot. The fungus exhibits a prolonged endophytic growth stage before switching life habit to become an aggressive, necrotrophic pathogen that causes significant losses to green leaf area and hence grain yield and quality. Results The R. collo-cygni genome was sequenced using a combination of Illumina and Roche 454 technologies. The draft assembly of 30.3 Mb contained 11,617 predicted gene models. Our phylogenomic analysis confirmed the classification of this ascomycete fungus within the family Mycosphaerellaceae, order Capnodiales of the class Dothideomycetes. A predicted secretome comprising 1053 proteins included redox-related enzymes and carbohydrate-modifying enzymes and proteases. The relative paucity of plant cell wall degrading enzyme genes may be associated with the stealth pathogenesis characteristic of plant pathogens from the Mycosphaerellaceae. A large number of genes associated with secondary metabolite production, including homologs of toxin biosynthesis genes found in other Dothideomycete plant pathogens, were identified. Conclusions The genome sequence of R. collo-cygni provides a framework for understanding the genetic basis of pathogenesis in this important emerging pathogen. The reduced complement of carbohydrate-degrading enzyme genes is likely to reflect a strategy to avoid detection by host defences during its prolonged asymptomatic growth. Of particular interest will be the analysis of R. collo-cygni gene expression during interactions with the host barley, to understand what triggers this fungus to switch from being a benign endophyte to an aggressive necrotroph

    COVIDiSTRESS Global Survey dataset on psychological and behavioural consequences of the COVID-19 outbreak

    Get PDF
    This N = 173,426 social science dataset was collected through the collaborative COVIDiSTRESS Global Survey - an open science effort to improve understanding of the human experiences of the 2020 COVID-19 pandemic between 30th March and 30th May, 2020. The dataset allows a cross-cultural study of psychological and behavioural responses to the Coronavirus pandemic and associated government measures like cancellation of public functions and stay at home orders implemented in many countries. The dataset contains demographic background variables as well as measures of Asian Disease Problem, perceived stress (PSS-10), availability of social provisions (SPS-10), trust in various authorities, trust in governmental measures to contain the virus (OECD trust), personality traits (BFF-15), information behaviours, agreement with the level of government intervention, and compliance with preventive measures, along with a rich pool of exploratory variables and written experiences. A global consortium from 39 countries and regions worked together to build and translate a survey with variables of shared interests, and recruited participants in 47 languages and dialects. Raw plus cleaned data and dynamic visualizations are available.Measurement(s) psychological measurement center dot anxiety-related behavior trait center dot Stress center dot response to center dot Isolation center dot loneliness measurement center dot Emotional Distress Technology Type(s) Survey Factor Type(s) geographic location center dot language center dot age of participant center dot responses to the Coronavirus pandemic Sample Characteristic - Organism Homo sapiens Sample Characteristic - Location global Machine-accessible metadata file describing the reported data:Peer reviewe
    corecore