21 research outputs found

    Corpora in Text-Based Russian Studies

    Get PDF
    This chapter focuses on textual data that are collected for a specific purpose, which are usually referred to as corpora. Scholars use corpora when they examine existing instances of a certain phenomenon or to conduct systematic quantitative analyses of occurrences, which in turn reflect habits, attitudes, opinions, or trends. For these contexts, it is extremely useful to combine different approaches. For example, a linguist might analyze the frequency of a certain buzzword, whereas a scholar in the political, cultural, or sociological sciences might attempt to explain the change in language usage from the data in question.Peer reviewe

    YARN begins

    Full text link
    В статье представлен проект создания большого открытого тезауруса русского языка YARN (Yet Another RussNet). Основная особенность проекта — использование wiki-подхода к наполнению и редактированию ресурса. В статье описаны лингвистические принципы создания тезауруса YARN, формат данных, а также ближайшие практические шаги, которые планируется предпринять в рамках проекта.YARN (Yet Another RussNet) is a work-in-progress on development of a large and open WordNet-like thesaurus for Russian. The paper reports on linguistic design, development and organizational principles, and interchange format of YARN.Исследование осуществляется при финансовой поддержке РГНФ (проект № 13-04-12020 «Новый открытый электронный тезаурус русского языка»). Мы благодарим участников группы yarn_org за активность, замечания и предложения. Работа Андрея Крижановского выполнена при частичной финансовой поддержке РФФИ (проект № 11-01-00251, № 12-01-00481, № 12-07-00070) и РГНФ (проект № 12-04-12062). Работа Ольги Ляшевской и Анастасии Бонч-Осмоловской отражает результаты исследований, проведенных при поддержке Программы фундаментальных исследований НИУ Высшая школа экономики (2013), проект «Корпусные технологии в лингвистических и междисциплинарных исследованиях». Павел Браславский благодарит группу разработчиков GermaNet под руководством проф. Эрхарда Хинрихса из университета Тюбингена за гостеприимство, плодотворное обсуждение проекта и обмен опытом, а также MUMIA Network30 за финансовую поддержку визита в Тюбинген в рамках программы Short Term Scientific Missions (STSM)

    Predicting Russian aspect by frequency across genres

    No full text
    We ask whether the aspect of individual verbs can be predicted based on the statistical distribution of their inflectional forms and how this is influenced by genre. To address these questions, we present an analysis of the “grammatical profiles” (relative frequency distributions of inflectional forms) of three samples of verbs extracted from the Russian National Corpus, representing three genres: Journalistic prose, Fiction, and Scientific-Technical prose. We find that the aspect of a given verb can be correctly predicted from the distribution of its forms alone with an average accuracy of 92.7%. Remarkably, this accuracy is statistically indistinguishable from the accuracy of prediction of aspect based on morphological marking. We maintain that it would be possible for first language learners to use distributional tendencies, in addition to morphological and other cues (for example semantic and syntactic cues), in acquiring the verbal category of aspect in Russian

    Predicting Russian aspect by frequency across genres

    No full text
    We ask whether the aspect of individual verbs can be predicted based on the statistical distribution of their inflectional forms and how this is influenced by genre. To address these questions, we present an analysis of the “grammatical profiles” (relative frequency distributions of inflectional forms) of three samples of verbs extracted from the Russian National Corpus, representing three genres: Journalistic prose, Fiction, and Scientific-Technical prose. We find that the aspect of a given verb can be correctly predicted from the distribution of its forms alone with an average accuracy of 92.7%. Remarkably, this accuracy is statistically indistinguishable from the accuracy of prediction of aspect based on morphological marking. We maintain that it would be possible for first language learners to use distributional tendencies, in addition to morphological and other cues (for example semantic and syntactic cues), in acquiring the verbal category of aspect in Russian

    Evaluation tracks on plagiarism detection algorithms for the Russian language

    No full text
    The paper presents a methodology and preliminary results for evaluating plagiarism detection algorithms for the Russian language. We describe the goals and tasks of the PlagEvalRus workshop, dataset creation, evaluation setup, metrics, and results

    Evaluation tracks on plagiarism detection algorithms for the Russian language

    No full text
    The paper presents a methodology and preliminary results for evaluating plagiarism detection algorithms for the Russian language. We describe the goals and tasks of the PlagEvalRus workshop, dataset creation, evaluation setup, metrics, and results
    corecore