22 research outputs found

    TatWordNet: A Linguistic Linked Open Data-Integrated WordNet Resource for Tatar

    Get PDF
    We present the first release of TatWordNet (http://wordnet.tatar), a wordnet resource for Tatar. TatWordNet has been constructed by the combination of the expand and the merge approaches. The synsets of TatWordNet have been compiled by: (i) the automatic conversion of concepts of TatThes, a socio-political Tatar; (ii) semi-automatic translation of synsets of RuWordNet, a wordnet resource for Russian with the followed manual verification and correction; (iii) manual translation of base RuWordNet synsets; (iv) and manual translation of the all hypernyms of the previously translated RuWordNet synsets. The currents version of TatWordNet contains 18,583 synsets, 36,540 lexical entries and 49,525 senses. The resource has been published to the Linguistic Linked Open Data cloud and interlinked with the Global WordNet Grid

    Human Associations Help to Detect Conventionalized Multiword Expressions

    Full text link
    In this paper we show that if we want to obtain human evidence about conventionalization of some phrases, we should ask native speakers about associations they have to a given phrase and its component words. We have shown that if component words of a phrase have each other as frequent associations, then this phrase can be considered as conventionalized. Another type of conventionalized phrases can be revealed using two factors: low entropy of phrase associations and low intersection of component word and phrase associations. The association experiments were performed for the Russian language

    Comparing two thesaurus representations for Russian

    Get PDF
    © 2018 Global WordNet Association. All Rights Reserved. In the paper we presented a new Russian wordnet, RuWordNet, which was semi-automatically obtained by transformation of the existing Russian thesaurus RuThes. At the first step, the basic structure of wordnets was reproduced: synsets’ hierarchy for each part of speech and the basic set of relations between synsets (hyponym-hypernym, part-whole, antonyms). At the second stage, we added causation, entailment and domain relations between synsets. Also derivation relations were established for single words and the component structure for phrases included in RuWordNet. The described procedure of transformation highlights the specific features of each type of thesaurus representations

    Multiword expressions in Russian thesauri RuThes and RuWordnet

    Get PDF
    © 2016 FRUCT.We present the types or multiword expressions included into the thesaurus or Russian language RuThes. Maoy of these expressions may look like compositiomd expressions but have specific relations that can be useful in appllcatlons. The rela· tion system or the RuThes thesaurus allows natural description of relations between an expression and its components if necessary. Transforming the RnThes knowledge into the Princeton WordNet structure for creating Russian wordnet (RuWordNet), we tronsfer also all the described expressions into the new resource and propose to automatically introduce additional relations for their better representation

    Russian Lexicographic Landscape: a Tale of 12 Dictionaries

    Full text link
    The paper reports on quantitative analysis of 12 Russian dictionaries at three levels: 1) headwords: The size and overlap of word lists, coverage of large corpora, and presence of neologisms; 2) synonyms: Overlap of synsets in different dictionaries; 3) definitions: Distribution of definition lengths and numbers of senses, as well as textual similarity of same-headword definitions in different dictionaries. The total amount of data in the study is 805,900 dictionary entries, 892,900 definitions, and 84,500 synsets. The study reveals multiple connections and mutual influences between dictionaries, uncovers differences in modern electronic vs. traditional printed resources, as well as suggests directions for development of new and improvement of existing lexical semantic resources

    Combining Thesaurus Knowledge and Probabilistic Topic Models

    Full text link
    In this paper we present the approach of introducing thesaurus knowledge into probabilistic topic models. The main idea of the approach is based on the assumption that the frequencies of semantically related words and phrases, which are met in the same texts, should be enhanced: this action leads to their larger contribution into topics found in these texts. We have conducted experiments with several thesauri and found that for improving topic models, it is useful to utilize domain-specific knowledge. If a general thesaurus, such as WordNet, is used, the thesaurus-based improvement of topic models can be achieved with excluding hyponymy relations in combined topic models.Comment: Accepted to AIST-2017 conference (http://aistconf.ru/). The final publication will be available at link.springer.co

    Problem of transitivity of wikipedia category system

    Get PDF
    This paper analyses a violation of the transitivity principle of Wikipedia category system. Causes of the violation have been analyzed on base of ontological modeling methodologies such as Onto-Clean. A new approach for elimination of the violation has been proposed

    Ontological analysis of the wikipedia category system

    Get PDF
    Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved We analyse violations of the transitivity principle of the Wikipedia category system, i.e. the situations where articles from a subcategory doesn’t logically belong to its parent category. The causes of the violation have been analysed on the base of ontological modelling methodologies such as OntoClean. We propose a new approach to automatically eliminating the violations. This approach is based on analysis of the relation of ontological dependence between categories. As a theoretical foundation of such analysis we propose a new deflationistic interpretation of the essential account of ontological dependence. The proof of concept has been evaluated on the category C:Mathematics. We are going to apply the proposed approach to derive a new large-scale domains hierarchy from the Wikipedia category system, and use it to provide BabelNet and DBpedia with fine-grained domain annotations
    corecore