1,588 research outputs found

    Russian Lexicographic Landscape: a Tale of 12 Dictionaries

    Full text link
    The paper reports on quantitative analysis of 12 Russian dictionaries at three levels: 1) headwords: The size and overlap of word lists, coverage of large corpora, and presence of neologisms; 2) synonyms: Overlap of synsets in different dictionaries; 3) definitions: Distribution of definition lengths and numbers of senses, as well as textual similarity of same-headword definitions in different dictionaries. The total amount of data in the study is 805,900 dictionary entries, 892,900 definitions, and 84,500 synsets. The study reveals multiple connections and mutual influences between dictionaries, uncovers differences in modern electronic vs. traditional printed resources, as well as suggests directions for development of new and improvement of existing lexical semantic resources

    Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal Associations

    Full text link
    We present a quantitative analysis of human word association pairs and study the types of relations presented in the associations. We put our main focus on the correlation between response types and respondent characteristics such as occupation and gender by contrasting syntagmatic and paradigmatic associations. Finally, we propose a personalised distributed word association model and show the importance of incorporating demographic factors into the models commonly used in natural language processing.Comment: AIST 2017 camera-read

    A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus

    Full text link
    YARN (Yet Another RussNet) project started in 2013 aims at creating a large open thesaurus for Russian using crowdsourcing. This paper describes synset assembly interface developed within the project — motivation behind it, design, usage scenarios, implementation details, and first experimental results

    Controlled Multilingual Thesauri for Kazakh Industry-Specific Terms

    Get PDF
    This article discusses the practical issues of compiling controlled multilingual thesauri for the purposes of industry-specific translation (IST). In the multilingual, transnational and globally connected Kazakhstan, IST is a much-needed translation service. IST is an interdisciplinary field between terminology, computational linguistics, translation theory and practice. Most of the professional guides, dictionaries and glossaries are systemized in alphabetical order and contain multiple variants for the terms searched. Therefore, there is an urgent need to create a systemized controlled multilingual thesaurus of industry-specific Kazakh, English and Russian terms in order to provide multilingual users with an interoperable and relevant term base. Controlled multilingual thesauri for industry-specific terms are the most effective tools for describing individual subject areas. They are designed to promote communication and interaction among professionals, translators and all Automated Information System users of specific fields irrespective of their location and health conditions. Unlike traditional dictionaries, controlled thesauri allow users to identify the meaning with the help of definitions and translations, relations of terms with other concepts, and broader and narrower terms. The purpose of this research is to unify and systematize industry-specific terms in Kazakh, to provide Russian and English equivalents, and to classify the terms into essential rubrics and subjects. Based on the Zthes data scheme to create a controlled multilingual thesaurus of industryspecific terms, the major rubrics have been formulated, and about 10,000 Kazakh mining and metal terms approved by the Terminological Committee of Kazakhstan have been structured

    Enlightened Romanticism: Mary Gartside’s colour theory in the age of Moses Harris, Goethe and George Field

    Get PDF
    The aim of this paper is to evaluate the work of Mary Gartside, a British female colour theorist, active in London between 1781 and 1808. She published three books between 1805 and 1808. In chronological and intellectual terms Gartside can cautiously be regarded an exemplary link between Moses Harris, who published a short but important theory of colour in the second half of the eighteenth century, and J.W. von Goethe’s highly influential Zur Farbenlehre, published in Germany in 1810. Gartside’s colour theory was published privately under the disguise of a traditional water colouring manual, illustrated with stunning abstract colour blots (see example above). Until well into the twentieth century, she remained the only woman known to have published a theory of colour. In contrast to Goethe and other colour theorists in the late 18th and early 19th century Gartside was less inclined to follow the anti-Newtonian attitudes of the Romantic movement

    Fighting with the Sparsity of Synonymy Dictionaries

    Full text link
    Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph. However, such methods are sensitive to the structure of the input synonymy graph: sparseness of the input dictionary can substantially reduce the quality of the extracted synsets. In this paper, we propose two different approaches designed to alleviate the incompleteness of the input dictionaries. The first one performs a pre-processing of the graph by adding missing edges, while the second one performs a post-processing by merging similar synset clusters. We evaluate these approaches on two datasets for the Russian language and discuss their impact on the performance of synset induction methods. Finally, we perform an extensive error analysis of each approach and discuss prominent alternative methods for coping with the problem of the sparsity of the synonymy dictionaries.Comment: In Proceedings of the 6th Conference on Analysis of Images, Social Networks, and Texts (AIST'2017): Springer Lecture Notes in Computer Science (LNCS
    corecore