4 research outputs found

    Neural Network Recognition of Russian Noun and Adjective Cases in the Google Books Ngram Corpus

    No full text
    The article proposes a solution to the problem of automatic recognition of Russian noun and adjective cases in the Google Books Ngram corpus. The recognition was performed by using information on word co-occurrence statistics extracted from the corpus. Explicit Word Vectors composed of frequencies of ordinary and syntactic bigrams that include a given word were fed to the input of the recognizer. Comparative testing of several types of vector representation and preliminary data normalization were carried out. The trained model was a multi-layer perceptron with a softmax output layer. To train and test the model, we selected 50000 adjectives and 50000 nouns that were most frequently used in the Google Books Ngram Russian subcorpus between 1920 and 2009. Parts of speech and cases were determined using the OpenCorpora electronic morphological dictionary. The recognition accuracy of the cases obtained using the trained neural network model was 96.45% for the nouns and 99.63% for the adjectives

    Recognition of Named Entities in the Russian Subcorpus Google Books Ngram

    No full text
    Ā© 2020, Springer Nature Switzerland AG. This paper describes how to build a recognizer to identify named entities that occur in the Google Books Ngram corpus. In the previous studies, the text was usually input to the recognizer to solve the task of named entities recognition. In this paper, the decision is made based on the analysis of the word co-occurrence statistics. The recognizer is a neural network. A vector of frequencies of bigrams or syntactic bigrams including the studied word is fed at the input. The task is to recognize named entities denoted by one word. However, the proposed method can be further applied to recognize two- or multi-word named entities. The recognition error probability obtained on the test sample of 10 thousand words, which are free from homonymy, was 2.71% (F1-score is 0.963). Solving the problem of word classification in Google Books Ngram will allow one to create large dictionaries of named entities that will improve recognition quality of named entities in texts by existing algorithms
    corecore