21 research outputs found

    Построение модели для извлечения оценочной лексики в различных предметных областях

    Get PDF
    In this paper we consider a new approach for domain-specific opinion word extraction in the Russian language. We propose a set of statistical features and an algorithm combination that can extract opinion words in a particular domain. The extraction model was trained in the movie domain and then applied to four other domains. The quality of the obtained sentiment lexicons was evaluated intrinsically on the base of an expert markup and remained on the high level during the model transfer to various domains. Finally, our method is adapted to the movie domain in English and it demonstrated good results.В данной работе предлагается новый подход к извлечению оценочных слов для различных предметных областей. В рамках этого подхода была разработана модель, включающая набор характеристик и комбинацию алгоритмов, которые позволяют извлекать оценочные слова в конкретной предметной области. Данная модель была обучена в предметной области о фильмах и затем применена в четырёх других областях. Качество работы метода оценивалось на основании разметки экспертов и оставалось на высоком уровне при переносе модели на различные предметные области. Кроме того, созданная модель была использована в предметной области о фильмах на английском языке и продемонстрировала высокое качество извлечения оценочных слов

    SentiRuEval: Testing Object-oriented sentiment analysis systems in Russian

    Get PDF
    The paper describes the data, rules and results of SentiRuEval, evaluation of Russian object-oriented sentiment analysis systems. Two tasks were proposed to participants. The first task was aspect-oriented analysis of reviews about restaurants and automobiles, that is the primary goal was to find word and expressions indicating important characteristics of an entity (aspect terms) and then classify them into polarity classes and aspect categories. The second task was the reputation-oriented analysis of tweets concerning banks and telecommunications companies. The goal of this analysis was to classify tweets in dependence of their influence on the reputation of the mentioned company. Such tweets could express the user's opinion or a positive or negative fact about the organization

    NEREL-BIO: A dataset of biomedical abstracts annotated with nested named entities

    Get PDF
    Motivation: This article describes NEREL-BIO-an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect. Results: NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: Annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL → NEREL-BIO) and cross-language (English → Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension models and report their results. © 2023 The Author(s). Published by Oxford University Press.Russian Science Foundation, RSF: 20-11-20166This work was supported by the Russian Science Foundation [20-11-20166]

    NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

    Full text link
    In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations between nested named entities, as well as relations on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL. © 2021 Incoma Ltd. All rights reserved.The project is supported by the Russian Science Foundation, grant # 20-11-20166. The experiments were partially carried out on computational resources of HPC facilities at HSE University. We are grateful to Alexey Yandutov and Igor Rozhkov for providing results of their experiments in named entity recognition and relation extraction

    Semantic Similarity of Words in RuWordNet Thesaurus and in Psychosemantic Experiment

    No full text
    In the paper we compare the structure of the Russian language thesaurus RuWordNet with the data of a psychosemantic experiment to identify semantically close words. The aim of the study is to find out to what extent the structure of RuWordNet corresponds to the intuitive ideas of native speakers about the semantic similarity of words. The respondents were asked to list synonyms to a given word. The words of the mental sphere were chosen for the experiment. As a result of the experiment, we found that the respondents mainly mentioned not only synonyms but words that are in paradigmatic relations with the stimuli. In 95% of cases, the words characterized in the experiment as semantically close were also close according to the thesaurus. In other cases, additions to the thesaurus were proposed

    Comparing similarity of words based on psychosemantic experiment and RuWordNet

    No full text
    In the paper we compare the structure of the Russian language thesaurus RuWordNet with the data of a psychosemantic experiment to identify semantically close words. The aim of the study is to find out to what extent the structure of RuWordNet corresponds to the intuitive ideas of native speakers about the semantic proximity of words. The respondents were asked to list synonyms to a given word. As a result of the experiment, we found that the respondents mainly mentioned not only synonyms but words that are in paradigmatic relations with the stimuli. The words of the mental sphere were chosen for the experiment. In 95% of cases, the words characterized in the experiment as semantically close were also close according to the thesaurus. In other cases, additions to the thesaurus were proposed

    Construction of a Model for the Cross-Domain Opinion Word Extraction

    No full text
    In this paper we consider a new approach for domain-specific opinion word extraction in the Russian language. We propose a set of statistical features and an algorithm combination that can extract opinion words in a particular domain. The extraction model was trained in the movie domain and then applied to four other domains. The quality of the obtained sentiment lexicons was evaluated intrinsically on the base of an expert markup and remained on the high level during the model transfer to various domains. Finally, our method is adapted to the movie domain in English and it demonstrated good results

    Construction of a Model for the Cross-Domain Opinion Word Extraction

    No full text
    In this paper we consider a new approach for domain-specific opinion word extraction in the Russian language. We propose a set of statistical features and an algorithm combination that can extract opinion words in a particular domain. The extraction model was trained in the movie domain and then applied to four other domains. The quality of the obtained sentiment lexicons was evaluated intrinsically on the base of an expert markup and remained on the high level during the model transfer to various domains. Finally, our method is adapted to the movie domain in English and it demonstrated good results

    Creating Russian Sentiment lexicon

    No full text
    В данной статье описан новый лексикон оценочных слов и выражений русского языка РуСентиЛекс. Данный лексикон был собран из нескольких источников: оценочные слова из тезауруса русского языка РуТез, сленговые слова из Твиттера и слова с позитивными или негативными ассоциациями (коннотациями) из корпуса новостей. Для многозначных слов, имеющих различную оценочную направленность (тональность) при использовании в разных значениях, установлены связи значений с соответствующими понятиями в тезаурусе русского языка РуТез, что может облегчить выбор соответствующего значения слова в конкретной предметной области или конкретном контексте. The paper describes the new Russian sentiment lexicon - RuSentiLex. The lexicon was gathered from several sources: opinionated words from domain-oriented Russian sentiment vocabularies, slang and curse words extracted from Twitter, objective words with positive or negative connotations from a news collection. The words in the lexicon having different sentiment orientations in specific senses are linked to appropriate concepts of the thesaurus of Russian language RuThes. All lexicon entries are classified according to four sentiment categories and three sources of sentiment (opinion, emotion, and fact). The lexicon can serve as the first version for the construction of domain-specific sentiment lexicons and be used for feature generation in machine-learning approaches

    Creating Russian Sentiment lexicon

    No full text
    В данной статье описан новый лексикон оценочных слов и выражений русского языка РуСентиЛекс. Данный лексикон был собран из нескольких источников: оценочные слова из тезауруса русского языка РуТез, сленговые слова из Твиттера и слова с позитивными или негативными ассоциациями (коннотациями) из корпуса новостей. Для многозначных слов, имеющих различную оценочную направленность (тональность) при использовании в разных значениях, установлены связи значений с соответствующими понятиями в тезаурусе русского языка РуТез, что может облегчить выбор соответствующего значения слова в конкретной предметной области или конкретном контексте. The paper describes the new Russian sentiment lexicon - RuSentiLex. The lexicon was gathered from several sources: opinionated words from domain-oriented Russian sentiment vocabularies, slang and curse words extracted from Twitter, objective words with positive or negative connotations from a news collection. The words in the lexicon having different sentiment orientations in specific senses are linked to appropriate concepts of the thesaurus of Russian language RuThes. All lexicon entries are classified according to four sentiment categories and three sources of sentiment (opinion, emotion, and fact). The lexicon can serve as the first version for the construction of domain-specific sentiment lexicons and be used for feature generation in machine-learning approaches
    corecore