47 research outputs found

    Multiword expressions in Russian thesauri RuThes and RuWordnet

    Get PDF
    © 2016 FRUCT.We present the types or multiword expressions included into the thesaurus or Russian language RuThes. Maoy of these expressions may look like compositiomd expressions but have specific relations that can be useful in appllcatlons. The rela· tion system or the RuThes thesaurus allows natural description of relations between an expression and its components if necessary. Transforming the RnThes knowledge into the Princeton WordNet structure for creating Russian wordnet (RuWordNet), we tronsfer also all the described expressions into the new resource and propose to automatically introduce additional relations for their better representation

    RuThes cloud: Towards a multilevel linguistic linked open data resource for Russian

    Get PDF
    © 2017, Springer International Publishing AG. In this paper we present a new multi-level Linguistic Linked Open Data resource for Russian. It covers four linguistic levels: semantic, lexical, morphological and syntactic. The resource has been constructed on base of the well-known RuThes thesaurus and the original hitherto unpublished Extended Zaliznyak grammatical dictionary. The resource is represented in terms of SKOS, Lemon, and LexInfo ontologies and a new custom ontology. Building the resource, we automatically completed the following tasks: merging source resources upon common lexical entries, decomposing complex lexical entries, and publishing constructed resource as LLOD-compatible dataset. We demonstrate the use case in which the developed resource is exploited in IR task. We hope that our work can serve as a crystallization point of the LLOD cloud in Russian

    Methods of automated design of application ontology

    Get PDF
    The control of completeness and information integrity of design specifications is an important problem of designing complex engineering systems. Computer aided design of textual technical documentation (technical documentation in natural language) is a complex problem. Its solution can be expected if natural (or given) limitations are imposed on the structure of the analyzed texts and an elaborated model of the application domain has been developed. In this paper, using the example of AviaOntology, the technological aspects of automatic design of applied ontologies that describe the application domain of the functioning of a complex technical system in various regimes of its operation are discussed. The problems of the use of the developed ontologies in problems of testing the information integrity of natural-language documents are considered

    Toward Domain-Specific Russian-Tatar Thesaurus Construction

    Get PDF
    © 2017 Association for Computing Machinery. The paper discusses the main principles and practical aspects of implementing a new bilingual lexical resource - the Russian-Tatar thesaurus on socio-political and IT issues. This thesaurus is developed on the basis of the Russian RuThes thesaurus format which is built as a hierarchy of concepts viewed as units of thought, with each concept linked to a set of language expressions that refer to it in texts (text entries). The paper discusses general methodology of translating concept names and their text entries, as well as ways of reflecting the specificity of the Tatar lexical-semantic system

    Построение модели для извлечения оценочной лексики в различных предметных областях

    Get PDF
    In this paper we consider a new approach for domain-specific opinion word extraction in the Russian language. We propose a set of statistical features and an algorithm combination that can extract opinion words in a particular domain. The extraction model was trained in the movie domain and then applied to four other domains. The quality of the obtained sentiment lexicons was evaluated intrinsically on the base of an expert markup and remained on the high level during the model transfer to various domains. Finally, our method is adapted to the movie domain in English and it demonstrated good results.В данной работе предлагается новый подход к извлечению оценочных слов для различных предметных областей. В рамках этого подхода была разработана модель, включающая набор характеристик и комбинацию алгоритмов, которые позволяют извлекать оценочные слова в конкретной предметной области. Данная модель была обучена в предметной области о фильмах и затем применена в четырёх других областях. Качество работы метода оценивалось на основании разметки экспертов и оставалось на высоком уровне при переносе модели на различные предметные области. Кроме того, созданная модель была использована в предметной области о фильмах на английском языке и продемонстрировала высокое качество извлечения оценочных слов

    SentiRuEval: Testing Object-oriented sentiment analysis systems in Russian

    Get PDF
    The paper describes the data, rules and results of SentiRuEval, evaluation of Russian object-oriented sentiment analysis systems. Two tasks were proposed to participants. The first task was aspect-oriented analysis of reviews about restaurants and automobiles, that is the primary goal was to find word and expressions indicating important characteristics of an entity (aspect terms) and then classify them into polarity classes and aspect categories. The second task was the reputation-oriented analysis of tweets concerning banks and telecommunications companies. The goal of this analysis was to classify tweets in dependence of their influence on the reputation of the mentioned company. Such tweets could express the user's opinion or a positive or negative fact about the organization

    NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

    Full text link
    In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations between nested named entities, as well as relations on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL. © 2021 Incoma Ltd. All rights reserved.The project is supported by the Russian Science Foundation, grant # 20-11-20166. The experiments were partially carried out on computational resources of HPC facilities at HSE University. We are grateful to Alexey Yandutov and Igor Rozhkov for providing results of their experiments in named entity recognition and relation extraction

    Erratum: All-words Word Sense Disambiguation for Russian Using Automatically Generated Text Collection (Cybernetics and Information Technologies 20:4 (90-107) DOI: 10.2478/cait-2020-0049)

    No full text
    This note concerns the word order of the authors names and correction of a printing error. 1. The names of the authors were written as Bolshina Angelina, Natalia Loukachevitch. For the reader’s convenience, the corrected line is: Angelina Bolshina, Natalia Loukachevitch 2. In Equation (1) on page 94 there is a printing error. For the reader’s convenience, the corrected line is: (Formula Presented)

    Ontological resources for representing security domain in information-analytical system

    Get PDF
    The paper presents the approach to the description of the broad domain of national security as a thesaurus for automatic document processing. The created Security thesaurus has the representation model of the RuThes thesaurus.The Security thesaurus includes terminology related to social, national and religious conflicts, extremism and terrorism, information security. It is used in a specialized information-analytical system and for automatic text categorization according to several cat- egorization schemes. The information-retrieval system provides several search instruments including word, phrase and concept search, category and facet search. It also supports the creation of analytical reports

    Semantic Similarity of Words in RuWordNet Thesaurus and in Psychosemantic Experiment

    No full text
    In the paper we compare the structure of the Russian language thesaurus RuWordNet with the data of a psychosemantic experiment to identify semantically close words. The aim of the study is to find out to what extent the structure of RuWordNet corresponds to the intuitive ideas of native speakers about the semantic similarity of words. The respondents were asked to list synonyms to a given word. The words of the mental sphere were chosen for the experiment. As a result of the experiment, we found that the respondents mainly mentioned not only synonyms but words that are in paradigmatic relations with the stimuli. In 95% of cases, the words characterized in the experiment as semantically close were also close according to the thesaurus. In other cases, additions to the thesaurus were proposed
    corecore