6 research outputs found

    Finding BERT’s Idiomatic Key

    Get PDF
    Sentence embeddings encode information relating to the usage of idioms in a sentence. This paper reports a set of experiments that combine a probing methodology with input masking to analyse where in a sentence this idiomatic information is taken from, and what form it takes. Our results indicate that BERT’s idiomatic key is primarily found within an idiomatic expression, but also draws on information from the surrounding context. Also, BERT can distinguish between the disruption in a sentence caused by words missing and the incongruity caused by idiomatic usage

    Shapley Idioms: Analysing BERT Sentence Embeddings for General Idiom Token Identification

    Get PDF
    This article examines the basis of Natural Language Understanding of transformer based language models, such as BERT. It does this through a case study on idiom token classification. We use idiom token identification as a basis for our analysis because of the variety of information types that have previously been explored in the literature for this task, including: topic, lexical, and syntactic features. This variety of relevant information types means that the task of idiom token identification enables us to explore the forms of linguistic information that a BERT language model captures and encodes in its representations. The core of this article presents three experiments. The first experiment analyzes the effectiveness of BERT sentence embeddings for creating a general idiom token identification model and the results indicate that the BERT sentence embeddings outperform Skip-Thought. In the second and third experiment we use the game theory concept of Shapley Values to rank the usefulness of individual idiomatic expressions for model training and use this ranking to analyse the type of information that the model finds useful. We find that a combination of idiom-intrinsic and topic-based properties contribute to an expression\u27s usefulness in idiom token identification. Overall our results indicate that BERT efficiently encodes a variety of information from topic, through lexical and syntactic information. Based on these results we argue that notwithstanding recent criticisms of language model based semantics, the ability of BERT to efficiently encode a variety of linguistic information types does represent a significant step forward in natural language understanding

    Predicting Solar Irradiance in Singapore

    Full text link
    Solar irradiance is the primary input for all solar energy generation systems. The amount of available solar radiation over time under the local weather conditions helps to decide the optimal location, technology and size of a solar energy project. We study the behaviour of incident solar irradiance on the earth's surface using weather sensors. In this paper, we propose a time-series based technique to forecast the solar irradiance values for shorter lead times of upto 15 minutes. Our experiments are conducted in the tropical region viz. Singapore, which receives a large amount of solar irradiance throughout the year. We benchmark our method with two common forecasting techniques, namely persistence model and average model, and we obtain good prediction performance. We report a root mean square of 147 W/m^2 for a lead time of 15 minutes.Comment: Published in Proc. Progress In Electromagnetics Research Symposium (PIERS), 201

    The Impact of Body Mass Index on Functional Rehabilitation Outcomes of Working-age Inpatients with Stroke

    Get PDF
    BACKGROUND: Stroke is the most relevant cause of acquired persistent disability in adulthood. The relationship between patient’s weight during rehabilitation and stroke functional outcome is controversial, previous research reported positive, negative and no effects, with scarce studies specifically addressing working-age patients.AIM: To evaluate the association between Body Mass Index (BMI) and the functional progress of adult (\u3c65 \u3eyears) patients with stroke admitted to a rehabilitation hospital.DESIGN: Retrospective observational cohort study.SETTING: Inpatient rehabilitation center.POPULATION: 178 stroke patients (ischemic or hemorrhagic).METHODS: Point-biserial and Spearman’s correlations, multivariate linear regressions and analysis of covariance were used to describe differences in functional outcomes after adjusting for age, sex, severity, dysphagia, depression and BMI category. Functional Independence Measure (FIM), FIM gain, efficiency and effectiveness were assessed.RESULTS: Participants were separated in 3 BMI categories: normal weight (47%), overweight (33%) and obese (20%). There were no significant differences between BMI categories in any functional outcome (total FIM (TFIM), cognitive (CFIM), motor (MFIM)) at discharge, admission, gain, efficiency or effectiveness. In regression models BMI (as continuous variable) was not significant predictor of TFIM at discharge after adjusting for age, sex, severity, dysphagia, depression and ataxia (R2=0.4813), significant predictors were TFIM at admission (β = 0.528) and NIHSS (β=-0.208). MFIM efficiency did not significantly differ by BMI subgroups, neither did CFIM efficiency. Length of stay (LOS) and TFIM effectiveness were associated for normal (r=0.33) and overweight (r=0.43), but not for obese. LOS and TFIM efficiency were strongly negatively associated only for obese (r=-0.50).CONCLUSIONS: FIM outcomes were not associated to BMI, nevertheless each BMI category when individually considered (normal weight, overweight or obese) was characterized by different associations involving FIM outcomes and clinical factors. CLINICAL REHABILITATION IMPACT: In sub-acute post-stroke working-age patients undergoing rehabilitation, BMI was not associated to FIM outcomes (no obesity paradox was reported in this sample). Distinctive significant associations emerged within each BMI category, (supporting their characterization) such as length of stay and TFIM effectiveness were associated for normal weight and overweight, but not for obese. Length of stay and TFIM efficiency were strongly negatively associated only for obese

    Using MT for multilingual covid-19 case load prediction from social media texts

    Get PDF
    In the context of an epidemiological study involving multilingual social media, this paper reports on the ability of machine translation systems to preserve content relevant for a document classification task designed to determine whether the social media text is related to covid-19. The results indicate that machine translation does provide a feasible basis for scaling epidemiological social media surveillance to multiple languages. Moreover, a qualitative error analysis revealed that the majority of classification errors are not caused by MT errors
    corecore