306 research outputs found

    A plea for more interactions between psycholinguistics and natural language processing research

    Get PDF
    A new development in psycholinguistics is the use of regression analyses on tens of thousands of words, known as the megastudy approach. This development has led to the collection of processing times and subjective ratings (of age of acquisition, concreteness, valence, and arousal) for most of the existing words in English and Dutch. In addition, a crowdsourcing study in the Dutch language has resulted in information about how well 52,000 lemmas are known. This information is likely to be of interest to NLP researchers and computational linguists. At the same time, large-scale measures of word characteristics developed in the latter traditions are likely to be pivotal in bringing the megastudy approach to the next level

    Towards the Global SentiWordNet

    Get PDF

    Building sentiment lexicons based on recommending services for the Polish language

    Get PDF
    Sentiment analysis has become a prominent area of research in computer science. It has numerous practical applications; e.g., evaluating customer satisfaction, identifying product promoters. Many methods employed in this task require language resources such as sentiment lexicons, which are unavailable for the Polish language. Such lexicons contain words annotated with their emotional polarization, but the manual creation of sentiment lexicons is very tedious. Therefore, this paper addresses this issue and describes a new method of building sentiment lexicons automatically based on recommending services. Next, the built lexicons were used in the task of sentiment classification

    Creating emoji lexica from unsupervised sentiment analysis of their descriptions

    Get PDF
    Online media, such as blogs and social networking sites, generate massive volumes of unstructured data of great interest to analyze the opinions and sentiments of individuals and organizations. Novel approaches beyond Natural Language Processing are necessary to quantify these opinions with polarity metrics. So far, the sentiment expressed by emojis has received little attention. The use of symbols, however, has boomed in the past four years. About twenty billion are typed in Twitter nowadays, and new emojis keep appearing in each new Unicode version, making them increasingly relevant to sentiment analysis tasks. This has motivated us to propose a novel approach to predict the sentiments expressed by emojis in online textual messages, such as tweets, that does not require human effort to manually annotate data and saves valuable time for other analysis tasks. For this purpose, we automatically constructed a novel emoji sentiment lexicon using an unsupervised sentiment analysis system based on the definitions given by emoji creators in Emojipedia. Additionally, we automatically created lexicon variants by also considering the sentiment distribution of the informal texts accompanying emojis. All these lexica are evaluated and compared regarding the improvement obtained by including them in sentiment analysis of the annotated datasets provided by Kralj Novak, Smailovic, Sluban and Mozetic (2015). The results confirm the competitiveness of our approach.Agencia Estatal de Investigación | Ref. TEC2016-76465-C2-2-RXunta de Galicia | Ref. GRC2014/046Xunta de Galicia | Ref. ED341D R2016/01

    From Big to Small Without Losing It All: Text Augmentation with ChatGPT for Efficient Sentiment Analysis

    Full text link
    In the era of artificial intelligence, data is gold but costly to annotate. The paper demonstrates a groundbreaking solution to this dilemma using ChatGPT for text augmentation in sentiment analysis. We leverage ChatGPT's generative capabilities to create synthetic training data that significantly improves the performance of smaller models, making them competitive with, or even outperforming, their larger counterparts. This innovation enables models to be both efficient and effective, thereby reducing computational cost, inference time, and memory usage without compromising on quality. Our work marks a key advancement in the cost-effective development and deployment of robust sentiment analysis models.Comment: 10 pages, 9 figures, presented at ICDM Workshop: SENTIRE 202

    Mapping wordnets from the perspective of inter-lingual equivalence

    Get PDF
    Mapping wordnets from the perspective of inter-lingual equivalence This paper explores inter-lingual equivalence from the perspective of linking two large lexico-semantic databases, namely the Princeton WordNet of English and the plWordnet (pl. Słowosieć) of Polish. Wordnets are built as networks of lexico-semantic relations between words and their meanings, and constitute a type of monolingual dictionary cum thesaurus. The development of wordnets for different languages has given rise to many wordnet linking projects (e.g. EuroWordNet, Vossen, 2002). Regardless of a linking method used, these projects require defining rules for establishing equivalence links between wordnet building blocks, known as synsets (sets of synonymous lexical units, i.e., lemma-sense pairs). In this paper an analysis is carried out of a set of inter-wordnet relations used in the mapping of the plWordNet onto the Princeton WordNet, and an attempt is made to relate them to equivalence taxonomies described in specialist literature on bilingual lexicography and translation.   Rzutowanie wordnetów w perspektywie ekwiwalencji międzyjęzykowej Artykuł przedstawia analizę zjawiska ekwiwalencji międzyjęzykowej z perspektywy powiązania dwóch wielkich wordnetów: polskiej Słowosieci i angielskiego WordNetu princetońskiego. Wordnety są relacyjnymi bazami danych leksykalno-semantycznych opisującymi sieć relacji leksykalno-semantycznych pomiędzy słowami i ich znaczeniami. Stanowią zatem rodzaj słownika jednojęzycznego połączonego z tezaurusem. Rozwój wordnetów dla wielu języków świata zaowocował następnie ich wzajemnymi powiązaniami. Wymagało to zdefiniowania metodologii dla ustalenia ekwiwalencji pomiędzy ich podstawowymi elementami tzn. synsetami, które są zbiorami synonimicznych jednostek leksykalnych tzn. par lemat numer znaczenia. W artykule analizujemy zbiór relacji międzywordnetowych używanych w rzutowaniu pomiędzy Słowosiecią a WordNetem princetońskim, odnosząc je do taksonomii ekwiwalencji postulowanych w literaturze leksykograficznej i translatorycznej
    corecore