887 research outputs found
Polish Anglicisms in the fields of leisure, fashion and entertainment against historical background
Domain-specific loans mirror historical and cultural changes in specific areas of life and allow to identify the language which at a certain point in time is considered most prestigious by the recipient language users. Foreign loans are always evidence of cultural contacts of the recipient language speech community with other communities, and reflect the sharing of social, economic, political and cultural phenomena among various nations. The article offers a historical overview of Polish domain-specific Anglicisms and presents and analyses over a thousand English loans of four different types in the areas of leisure, fashion and entertainment, which have been enriching Polish since the 17thcentury
Recommended from our members
Cross-Lingual and Low-Resource Sentiment Analysis
Identifying sentiment in a low-resource language is essential for understanding opinions internationally and for responding to the urgent needs of locals affected by disaster incidents in different world regions. While tools and resources for recognizing sentiment in high-resource languages are plentiful, determining the most effective methods for achieving this task in a low-resource language which lacks annotated data is still an open research question. Most existing approaches for cross-lingual sentiment analysis to date have relied on high-resource machine translation systems, large amounts of parallel data, or resources only available for Indo-European languages.
This work presents methods, resources, and strategies for identifying sentiment cross-lingually in a low-resource language. We introduce a cross-lingual sentiment model which can be trained on a high-resource language and applied directly to a low-resource language. The model offers the feature of lexicalizing the training data using a bilingual dictionary, but can perform well without any translation into the target language.
Through an extensive experimental analysis, evaluated on 17 target languages, we show that the model performs well with bilingual word vectors pre-trained on an appropriate translation corpus. We compare in-genre and in-domain parallel corpora, out-of-domain parallel corpora, in-domain comparable corpora, and monolingual corpora, and show that a relatively small, in-domain parallel corpus works best as a transfer medium if it is available. We describe the conditions under which other resources and embedding generation methods are successful, and these include our strategies for leveraging in-domain comparable corpora for cross-lingual sentiment analysis.
To enhance the ability of the cross-lingual model to identify sentiment in the target language, we present new feature representations for sentiment analysis that are incorporated in the cross-lingual model: bilingual sentiment embeddings that are used to create bilingual sentiment scores, and a method for updating the sentiment embeddings during training by lexicalization of the target language. This feature configuration works best for the largest number of target languages in both untargeted and targeted cross-lingual sentiment experiments.
The cross-lingual model is studied further by evaluating the role of the source language, which has traditionally been assumed to be English. We build cross-lingual models using 15 source languages, including two non-European and non-Indo-European source languages: Arabic and Chinese. We show that language families play an important role in the performance of the model, as does the morphological complexity of the source language.
In the last part of the work, we focus on sentiment analysis towards targets. We study Arabic as a representative morphologically complex language and develop models and morphological representation features for identifying entity targets and sentiment expressed towards them in Arabic open-domain text. Finally, we adapt our cross-lingual sentiment models for the detection of sentiment towards targets. Through cross-lingual experiments on Arabic and English, we demonstrate that our findings regarding resources, features, and language also hold true for the transfer of targeted sentiment
Attempt to understand public-health relevant social dimensions of COVID-19 outbreak in Poland
Recently, the whole of Europe, including Poland, have been significantly affected by COVID-19 and its social and economic consequences which are already causing dozens of billions of euros monthly losses in Poland alone. Social behaviour has a fundamental impact on the dynamics of the spread of infectious diseases such as SARS-CoV-2, challenging the existing health infrastructure and social organization. Modelling and understanding mechanisms of social behaviour (e.g. panic and social distancing) and its contextualization with regard to Poland can contribute to better response to the outbreak on a national and local level. In the presented study we aim to investigate the impact of the COVID-19 on society by: (i) measuring the relevant activity in internet news and social media; (ii) analysing attitudes and demographic patterns in Poland. In the end, we are going to implement computational social science and digital epidemiology research approach to provide urgently needed information on social dynamics during the outbreak. This study is an ad hoc reaction only, and our goal is to signal the main areas of possible research to be done in the future and cover issues with direct or indirect relation to public health
Can humain association norm evaluate latent semantic analysis?
This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations
Monitoring Users’ Behavior: Anti-Immigration Speech Detection on Twitter
The proliferation of social media platforms changed the way people interact online. However, engagement with social media comes with a price, the users’ privacy. Breaches of users’ privacy, such as the Cambridge Analytica scandal, can reveal how the users’ data can be weaponized in political campaigns, which many times trigger hate speech and anti-immigration views. Hate speech detection is a challenging task due to the different sources of hate that can have an impact on the language used, as well as the lack of relevant annotated data. To tackle this, we collected and manually annotated an immigration-related dataset of publicly available Tweets in UK, US, and Canadian English. In an empirical study, we explored anti-immigration speech detection utilizing various language features (word n-grams, character n-grams) and measured their impact on a number of trained classifiers. Our work demonstrates that using word n-grams results in higher precision, recall, and f-score as compared to character n-grams. Finally, we discuss the implications of these results for future work on hate-speech detection and social media data analysis in general
Adapting a Constituency Parser to User-Generated Content in Polish Opinion Mining
The paper focuses on the adjustment of NLP tools for Polish; e.g., morphological analyzers and parsers, to user-generated content (UGC). The authors discuss two rule-based techniques applied to improve their efficiency: pre-processing (text normalization) and parser adaptation (modified segmentation and parsing rules). A new solution to handle OOVs based on inflectional translation is also offered
Design of a Controlled Language for Critical Infrastructures Protection
We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates
from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically
represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of
traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an
analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
Current issues of the Russian language teaching XIV
Collection of papers “Current issues of the Russian language teaching XIV” is devoted to issues of methodology of teaching Russian as a foreign language, to issues of linguistics and literary science and includes papers related to the use of online tools and resources in teaching Russian. This collection of papers is a result of the international scientific conference “Current issues of the Russian language teaching XIV”, which was scheduled for 8–10 May 2020, but due to the pandemic COVID-19 took place remotely
- …