14 research outputs found

    Π Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠ° Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° поиска ΠΊΠ»ΡŽΡ‡Π΅Π²Ρ‹Ρ… слов Π² корпусС тСкста Π½Π° казахском языкС

    Get PDF
    The issue of semantic text analysis occupies a special place in computational linguistics. Researchers in this field have an increased interest in developing an algorithm that will improve the quality of text corpus processing and probabilistic determination of text content. The results of the study on the application of methods, approaches, algorithms for semantic text analysis in computational linguistics in international and Kazakhstan science led to the development of an algorithm of keyword search in a Kazakh text. The first step of the algorithm was to compile a reference dictionary of keywords for the Kazakh language text corpus. The solution to this problem was to apply the Porter (stemmer) algorithm for the Kazakh language text corpus. The implementation of the stemmer allowed highlighting unique word stems and getting a reference dictionary, which was subsequently indexed. The next step is to collect learning data from the text corpus. To calculate the degree of semantic proximity between words, each word is assigned a vector of the corresponding word forms of the reference dictionary, which results in a pair of a keyword and a vector. And the last step of the algorithm is neural network learning. During learning, the error backpropagation method is used, which allows a semantic analysis of the text corpus and obtaining a probabilistic number of words close to the expected number of keywords. This process automates the processing of text material by creating digital learning models of keywords. The algorithm is used to develop a neurocomputer system that will automatically check the text works of online learners. The uniqueness of the keyword search algorithm is the use of neural network learning for texts in the Kazakh language. In Kazakhstan, scientists in the field of computational linguistics conducted a number of studies based on morphological analysis, lemmatization and other approaches and implemented linguistic tools (mainly translation dictionaries). The scope of neural network learning for parsing of the Kazakh language remains an open issue in the Kazakhstan science.The developed algorithm involves solving one of the problems of effective semantic analysis of the text in the Kazakh languageВопрос сСмантичСского Π°Π½Π°Π»ΠΈΠ·Π° тСкста Π·Π°Π½ΠΈΠΌΠ°Π΅Ρ‚ особоС мСсто Π² ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠΉ лингвистикС. Π˜ΡΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΠΈ Π΄Π°Π½Π½ΠΎΠΉ области ΠΈΠΌΠ΅ΡŽΡ‚ ΠΏΠΎΠ²Ρ‹ΡˆΠ΅Π½Π½Ρ‹ΠΉ интСрСс ΠΊ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠ΅ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ°, использованиС ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ³ΠΎ ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΡ‚ ΠΏΠΎΠ²Ρ‹ΡΠΈΡ‚ΡŒ качСство ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ корпуса тСкста ΠΈ вСроятностноС ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ содСрТания тСкста. Π Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹ исслСдования ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠΉ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊ, ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΎΠ², Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² для сСмантичСского Π°Π½Π°Π»ΠΈΠ·Π° тСкста Π² ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠΉ лингвистикС Π² ΠΌΠ΅ΠΆΠ΄ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½ΠΎΠΉ ΠΈ казахстанской Π½Π°ΡƒΠΊΠ΅ ΠΏΡ€ΠΈΠ²Π΅Π»Π° ΠΊ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠ΅ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° поиска ΠΊΠ»ΡŽΡ‡Π΅Π²Ρ‹Ρ… слов Π² тСкстС Π½Π° казахском языкС. ΠŸΠ΅Ρ€Π²Ρ‹ΠΌ этапом Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° Π±Ρ‹Π»ΠΎ составлСниС эталонного словаря ΠΊΠ»ΡŽΡ‡Π΅Π²Ρ‹Ρ… слов для корпуса тСкста Π½Π° казахском языкС. РСшСниСм этой ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡ‹ стало ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° ΠŸΠΎΡ€Ρ‚Π΅Ρ€Π° (стСммСра) для корпуса тСкстов Π½Π° казахском языкС. РСализация стСммСра ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΠ»Π° Π²Ρ‹Π΄Π΅Π»ΠΈΡ‚ΡŒ ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½Ρ‹Π΅ основы слов ΠΈ ΠΏΠΎΠ»ΡƒΡ‡ΠΈΡ‚ΡŒ эталонный ΡΠ»ΠΎΠ²Π°Ρ€ΡŒ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ впослСдствии проиндСксировали. Π‘Π»Π΅Π΄ΡƒΡŽΡ‰ΠΈΠΉ шаг – это сбор Π΄Π°Π½Π½Ρ‹Ρ… ΠΏΠΎ ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΡŽ ΠΈΠ· корпуса тСкстов. Для вычислСния стСпСни сСмантичСской близости ΠΌΠ΅ΠΆΠ΄Ρƒ словами ΠΊΠ°ΠΆΠ΄ΠΎΠΌΡƒ слову присваиваСтся Π²Π΅ΠΊΡ‚ΠΎΡ€ ΡΠΎΠΎΡ‚Π²Π΅Ρ‚ΡΡ‚Π²ΡƒΡŽΡ‰ΠΈΡ… Π΅ΠΌΡƒ словоформ эталонного словаря, Π² Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Π΅ ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ³ΠΎ получаСтся ΠΏΠ°Ρ€Π° – ΠΊΠ»ΡŽΡ‡Π΅Π²ΠΎΠ΅ слово ΠΈ Π²Π΅ΠΊΡ‚ΠΎΡ€. И послСдним шагом Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° являСтся ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΠ΅ Π½Π΅ΠΉΡ€ΠΎΠ½Π½Ρ‹Ρ… сСтСй. ΠŸΡ€ΠΈ ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΠΈ примСняСтся ΠΌΠ΅Ρ‚ΠΎΠ΄ ΠΎΠ±Ρ€Π°Ρ‚Π½ΠΎΠ³ΠΎ распространСния ошибок, Ρ‡Ρ‚ΠΎ позволяСт провСсти сСмантичСский Π°Π½Π°Π»ΠΈΠ· корпуса тСкста ΠΈ ΠΏΠΎΠ»ΡƒΡ‡ΠΈΡ‚ΡŒ вСроятностноС количСство слов, Π±Π»ΠΈΠ·ΠΊΠΎΠ΅ ΠΊ ΠΎΠΆΠΈΠ΄Π°Π΅ΠΌΠΎΠΌΡƒ количСству ΠΊΠ»ΡŽΡ‡Π΅Π²Ρ‹Ρ…. Π­Ρ‚ΠΎΡ‚ процСсс позволяСт Π°Π²Ρ‚ΠΎΠΌΠ°Ρ‚ΠΈΠ·ΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΡƒ тСкстового ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π»Π° ΠΏΡƒΡ‚Π΅ΠΌ создания Ρ†ΠΈΡ„Ρ€ΠΎΠ²Ρ‹Ρ… ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰ΠΈΡ… ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ ΠΊΠ»ΡŽΡ‡Π΅Π²Ρ‹Ρ… слов. Алгоритм ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ для Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ Π½Π΅ΠΉΡ€ΠΎΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠΉ систСмы, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Π±ΡƒΠ΄Π΅Ρ‚ ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚ΡŒ Π°Π²Ρ‚ΠΎΠΌΠ°Ρ‚ΠΈΡ‡Π΅ΡΠΊΡƒΡŽ ΠΏΡ€ΠΎΠ²Π΅Ρ€ΠΊΡƒ тСкстовых Ρ€Π°Π±ΠΎΡ‚ ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰ΠΈΡ…ΡΡ ΠΎΠ½Π»Π°ΠΉΠ½ курсов. Π£Π½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΡΡ‚ΡŒΡŽ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° поиска ΠΊΠ»ΡŽΡ‡Π΅Π²Ρ‹Ρ… слов являСтся ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ обучСния Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΠΉ сСти для тСкстов Π½Π° казахском языкС. Π’ ΠšΠ°Π·Π°Ρ…ΡΡ‚Π°Π½Π΅ ΡƒΡ‡Π΅Π½Ρ‹ΠΌΠΈ Π² области ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠΉ лингвистики Π±Ρ‹Π»ΠΈ ΠΏΡ€ΠΎΠ²Π΅Π΄Π΅Π½Ρ‹ ряд исслСдований Π½Π° основС примСнСния морфологичСского Π°Π½Π°Π»ΠΈΠ·Π°, Π»ΠΈΠΌΠΌΠΈΡ‚ΠΈΠ·Π°Ρ†ΠΈΠΈ ΠΈ Π΄Ρ€ΡƒΠ³ΠΈΡ… ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΎΠ² ΠΈ Ρ€Π΅Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½Ρ‹ лингвистичСскиС инструмСнты (Π² основном словари-ΠΏΠ΅Ρ€Π΅Π²ΠΎΠ΄Ρ‡ΠΈΠΊΠΈ). ΠžΠ±Π»Π°ΡΡ‚ΡŒ примСнСния обучСния Π½Π΅ΠΉΡ€ΠΎΠ½Π½Ρ‹Ρ… сСтСй для синтаксичСского Π°Π½Π°Π»ΠΈΠ·Π° казахского языков остаСтся ΠΎΡ‚ΠΊΡ€Ρ‹Ρ‚Ρ‹ΠΌ вопросом Π² казахстанской Π½Π°ΡƒΠΊΠ΅.Π Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½Π½Ρ‹ΠΉ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ ΠΏΡ€Π΅Π΄ΠΏΠΎΠ»Π°Π³Π°Π΅Ρ‚ Ρ€Π΅ΡˆΠ΅Π½ΠΈΠ΅ ΠΎΠ΄Π½ΠΎΠΉ ΠΈΠ· ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌ Π² ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½ΠΈΠΈ эффСктивного сСмантичСского Π°Π½Π°Π»ΠΈΠ·Π° тСкста Π½Π° казахском язык

    Π ΠΎΠ·Ρ€ΠΎΠ±ΠΊΠ° Ρ‚Π΅ΠΌΠ°Ρ‚ΠΈΡ‡Π½ΠΎΡ— Ρ‚Π° Π½Π΅ΠΉΡ€ΠΎΠΌΠ΅Ρ€Π΅ΠΆΠ΅Π²ΠΎΡ— ΠΌΠΎΠ΄Π΅Π»Ρ– для навчання Π΄Π°Π½ΠΈΡ…

    No full text
    Research in the field of semantic text analysis begins with the study of the structure of natural language. The Kazakh language is unique in that it belongs to agglutinative languages and requires careful study. The object of this study is the text in the Kazakh language. Existing approaches to the study of the semantic analysis of text in the Kazakh language do not consider text analysis using the methods of thematic modeling and learning of neural networks. The purpose of this study is to determine the quality of a topic model based on the LDA (Latent Dirichlet Allocation) method with Gibbs sampling, through neural network learning. The LDA model can determine the semantic probability of the keywords of a single document and give them a rating score. To build a neural network, one of the widely used LSTM architectures was used, which has proven itself well in working with NLP (Natural Language Processing). As a result of learning, it is possible to see to what extent the text was trained and how the semantic analysis of the text in the Kazakh language went. The system, developed on the basis of the LDA model and neural network learning, combines the detected keywords into separate topics. In general, the experimental results showed that the use of deep neural networks gives the expected results of the quality of the LDA model in the processing of the Kazakh language. The developed model of the neural network contributes to the assessment of the accuracy of the semantics of the used text in the Kazakh language. The results obtained can be applied in systems for processing text data, for example, when checking the compliance of the topic and content of the proposed texts (abstracts, term papers, theses, and other works).ДослідТСння Π² Π³Π°Π»ΡƒΠ·Ρ– сСмантичного Π°Π½Π°Π»Ρ–Π·Ρƒ тСксту ΠΏΠΎΡ‡ΠΈΠ½Π°ΡŽΡ‚ΡŒΡΡ Π· вивчСння структури ΠΏΡ€ΠΈΡ€ΠΎΠ΄Π½ΠΎΡ— ΠΌΠΎΠ²ΠΈ. ΠšΠ°Π·Π°Ρ…ΡΡŒΠΊΠ° ΠΌΠΎΠ²Π° ΡƒΠ½Ρ–ΠΊΠ°Π»ΡŒΠ½Π° Ρ‚ΠΈΠΌ, Ρ‰ΠΎ Π²Ρ–Π΄Π½ΠΎΡΠΈΡ‚ΡŒΡΡ Π΄ΠΎ Π°Π³Π»ΡŽΡ‚ΠΈΠ½Π°Ρ‚ΠΈΠ²Π½ΠΈΡ… ΠΌΠΎΠ² Ρ– ΠΏΠΎΡ‚Ρ€Π΅Π±ΡƒΡ” Ρ€Π΅Ρ‚Π΅Π»ΡŒΠ½ΠΎΠ³ΠΎ вивчСння. Об'Ρ”ΠΊΡ‚ΠΎΠΌ Ρ†ΡŒΠΎΠ³ΠΎ дослідТСння Ρ” тСкст ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΠΎΡŽ мовою. Π†ΡΠ½ΡƒΡŽΡ‡Ρ– ΠΏΡ–Π΄Ρ…ΠΎΠ΄ΠΈ Ρ‰ΠΎΠ΄ΠΎ дослідТСння сСмантичного Π°Π½Π°Π»Ρ–Π·Ρƒ тСксту ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΠΎΡŽ мовою Π½Π΅ Ρ€ΠΎΠ·Π³Π»ΡΠ΄Π°ΡŽΡ‚ΡŒ Π°Π½Π°Π»Ρ–Π· тСксту Π·Π° допомогою ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ–Π² Ρ‚Π΅ΠΌΠ°Ρ‚ΠΈΡ‡Π½ΠΎΠ³ΠΎ модСлювання Ρ‚Π° навчання Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΈΡ… ΠΌΠ΅Ρ€Π΅ΠΆ. ΠœΠ΅Ρ‚ΠΎΡŽ Π΄Π°Π½ΠΎΠ³ΠΎ дослідТСння Ρ” визначСння якості Ρ‚Π΅ΠΌΠ°Ρ‚ΠΈΡ‡Π½ΠΎΡ— ΠΌΠΎΠ΄Π΅Π»Ρ– Π½Π° основі ΠΌΠ΅Ρ‚ΠΎΠ΄Ρƒ LDA (Latent Dirichlet Allocation) Ρ–Π· сСмплюванням Гібса, Ρ‡Π΅Ρ€Π΅Π· навчання Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΡ— ΠΌΠ΅Ρ€Π΅ΠΆΡ–. LDA модСль ΠΌΠΎΠΆΠ΅ Π²ΠΈΠ·Π½Π°Ρ‡ΠΈΡ‚ΠΈ сСмантичну ΠΌΠΎΠΆΠ»ΠΈΠ²Ρ–ΡΡ‚ΡŒ ΠΊΠ»ΡŽΡ‡ΠΎΠ²ΠΈΡ… слів ΠΎΠ΄Π½ΠΎΠ³ΠΎ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π° Ρ– Π΄Π°Ρ‚ΠΈ Ρ—ΠΌ ΠΊΠΎΠ΅Ρ„Ρ–Ρ†Ρ–Ρ”Π½Ρ‚ ΠΎΡ†Ρ–Π½ΠΊΠΈ. Для ΠΏΠΎΠ±ΡƒΠ΄ΠΎΠ²ΠΈ Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΡ— ΠΌΠ΅Ρ€Π΅ΠΆΡ– Π±ΡƒΠ»Π° використана ΠΎΠ΄Π½Π° Π· ΠΏΠΎΡˆΠΈΡ€Π΅Π½ΠΈΡ… Π°Ρ€Ρ…Ρ–Ρ‚Π΅ΠΊΡ‚ΡƒΡ€ LSTM, яка Π΄ΠΎΠ±Ρ€Π΅ Π·Π°Ρ€Π΅ΠΊΠΎΠΌΠ΅Π½Π΄ΡƒΠ²Π°Π»Π° сСбС Π² Ρ€ΠΎΠ±ΠΎΡ‚Ρ– Π· NLP (Natural Language Processing). Π’ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ– навчання ΠΌΠΎΠΆΠ½Π° ΠΏΠΎΠ±Π°Ρ‡ΠΈΡ‚ΠΈ, якою ΠΌΡ–Ρ€ΠΎΡŽ тСкст навчився Ρ– як ΠΏΡ€ΠΎΠΉΡˆΠΎΠ² сСмантичний Π°Π½Π°Π»Ρ–Π· тСксту ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΠΎΡŽ мовою. БистСма, Ρ€ΠΎΠ·Ρ€ΠΎΠ±Π»Π΅Π½Π° Π½Π° основі LDA ΠΌΠΎΠ΄Π΅Π»Ρ– Ρ‚Π° навчання Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΡ— ΠΌΠ΅Ρ€Π΅ΠΆΡ–, ΠΏΠΎΡ”Π΄Π½ΡƒΡ” виявлСні ΠΊΠ»ΡŽΡ‡ΠΎΠ²Ρ– слова Π² ΠΎΠΊΡ€Π΅ΠΌΡ– Ρ‚Π΅ΠΌΠΈ. Π’ Ρ†Ρ–Π»ΠΎΠΌΡƒ Π΅ΠΊΡΠΏΠ΅Ρ€ΠΈΠΌΠ΅Π½Ρ‚Π°Π»ΡŒΠ½Ρ– Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΈ ΠΏΠΎΠΊΠ°Π·Π°Π»ΠΈ, Ρ‰ΠΎ використання Π³Π»ΠΈΠ±ΠΎΠΊΠΈΡ… Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΈΡ… ΠΌΠ΅Ρ€Π΅ΠΆ Π΄Π°ΡŽΡ‚ΡŒ ΠΏΠ΅Ρ€Π΅Π΄Π±Π°Ρ‡ΡƒΠ²Π°Π½Ρ– Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΈ якості LDA ΠΌΠΎΠ΄Π΅Π»Ρ– Π² ΠΎΠ±Ρ€ΠΎΠ±Ρ†Ρ– ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΠΎΡ— ΠΌΠΎΠ²ΠΈ. Π ΠΎΠ·Ρ€ΠΎΠ±Π»Π΅Π½Π° модСль Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΡ— ΠΌΠ΅Ρ€Π΅ΠΆΡ– сприяє ΠΎΡ†Ρ–Π½Ρ†Ρ– визначСння точності сСмантики тСксту, Ρ‰ΠΎ Π²ΠΈΠΊΠΎΡ€ΠΈΡΡ‚ΠΎΠ²ΡƒΡ”Ρ‚ΡŒΡΡ ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΠΎΡŽ мовою. ΠžΡ‚Ρ€ΠΈΠΌΠ°Π½Ρ– Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΈ ΠΌΠΎΠΆΠ½Π° застосувати Π² систСмах ΠΎΠ±Ρ€ΠΎΠ±ΠΊΠΈ тСкстових Π΄Π°Π½ΠΈΡ…, Π½Π°ΠΏΡ€ΠΈΠΊΠ»Π°Π΄, ΠΏΡ€ΠΈ ΠΏΠ΅Ρ€Π΅Π²Ρ–Ρ€Ρ†Ρ– відповідності Ρ‚Π΅ΠΌΠΈ Ρ‚Π° змісту Π·Π°ΠΏΡ€ΠΎΠΏΠΎΠ½ΠΎΠ²Π°Π½ΠΈΡ… тСкстів (Ρ€Π΅Ρ„Π΅Ρ€Π°Ρ‚Ρ–Π², курсових, Π΄ΠΈΠΏΠ»ΠΎΠΌΠ½ΠΈΡ… Ρ‚Π° Ρ–Π½ΡˆΠΈΡ… Ρ€ΠΎΠ±Ρ–Ρ‚)

    Development of the Algorithm of Keyword Search in the Kazakh Language Text Corpus

    Full text link
    The issue of semantic text analysis occupies a special place in computational linguistics. Researchers in this field have an increased interest in developing an algorithm that will improve the quality of text corpus processing and probabilistic determination of text content. The results of the study on the application of methods, approaches, algorithms for semantic text analysis in computational linguistics in International and Kazakhstan science led to the development of an algorithm of keyword search in a Kazakh text. The first step of the algorithm was to compile a reference dictionary of keywords for the Kazakh language text corpus. The solution to this problem was to apply the Porter (stemmer) algorithm for the Kazakh language text corpus. The implementation of the stemmer allowed highlighting unique word stems and getting a reference dictionary, which was subsequently indexed. The next step is to collect learning data from the text corpus. To calculate the degree of semantic proximity between words, each word is assigned a vector of the corresponding word forms of the reference dictionary, which results in a pair of a keyword and a vector. And the last step of the algorithm is neural network learning. During learning, the error backpropagation method is used, which allows a semantic analysis of the text corpus and obtaining a probabilistic number of words close to the expected number of keywords. This process automates the processing of text material by creating digital learning models of keywords. The algorithm is used to develop a neurocomputer system that will automatically check the text works of online learners. The uniqueness of the keyword search algorithm is the use of neural network learning for texts in the Kazakh language. In Kazakhstan, scientists in the field of computational linguistics conducted a number of studies based on morphological analysis, lemmatization and other approaches and implemented linguistic tools (mainly translation dictionaries). The scope of neural network learning for parsing of the Kazakh language remains an open issue in the Kazakhstan science.The developed algorithm involves solving one of the problems of effective semantic analysis of the text in the Kazakh languag

    Distribution of influenza virus types by age using case-based global surveillance data from twenty-nine countries, 1999-2014

    No full text
    Background: Influenza disease burden varies by age and this has important public health implications. We compared the proportional distribution of different influenza virus types within age strata using surveillance data from twenty-nine countries during 1999-2014 (N=358,796 influenza cases)

    Temporal Patterns of Influenza A and B in Tropical and Temperate Countries: What Are the Lessons for Influenza Vaccination?

    Get PDF
    <div><p>Introduction</p><p>Determining the optimal time to vaccinate is important for influenza vaccination programmes. Here, we assessed the temporal characteristics of influenza epidemics in the Northern and Southern hemispheres and in the tropics, and discuss their implications for vaccination programmes.</p><p>Methods</p><p>This was a retrospective analysis of surveillance data between 2000 and 2014 from the Global Influenza B Study database. The seasonal peak of influenza was defined as the week with the most reported cases (overall, A, and B) in the season. The duration of seasonal activity was assessed using the maximum proportion of influenza cases during three consecutive months and the minimum number of months with β‰₯80% of cases in the season. We also assessed whether co-circulation of A and B virus types affected the duration of influenza epidemics.</p><p>Results</p><p>212 influenza seasons and 571,907 cases were included from 30 countries. In tropical countries, the seasonal influenza activity lasted longer and the peaks of influenza A and B coincided less frequently than in temperate countries. Temporal characteristics of influenza epidemics were heterogeneous in the tropics, with distinct seasonal epidemics observed only in some countries. Seasons with co-circulation of influenza A and B were longer than influenza A seasons, especially in the tropics.</p><p>Discussion</p><p>Our findings show that influenza seasonality is less well defined in the tropics than in temperate regions. This has important implications for vaccination programmes in these countries. High-quality influenza surveillance systems are needed in the tropics to enable decisions about when to vaccinate.</p></div

    Distribution of influenza virus types by age using case-based global surveillance data from twenty-nine countries, 1999-2014

    Get PDF
    BACKGROUND : Influenza disease burden varies by age and this has important public health implications. We compared the proportional distribution of different influenza virus types within age strata using surveillance data from twenty-nine countries during 1999-2014 (N=358,796 influenza cases). METHODS : For each virus, we calculated a Relative Illness Ratio (defined as the ratio of the percentage of cases in an age group to the percentage of the country population in the same age group) for young children (0-4 years), older children (5-17 years), young adults (18-39 years), older adults (40-64 years), and the elderly (65+ years). We used random-effects meta-analysis models to obtain summary relative illness ratios (sRIRs), and conducted metaregression and sub-group analyses to explore causes of between-estimates heterogeneity. RESULTS : The influenza virus with highest sRIR was A(H1N1) for young children, B for older children, A(H1N1) pdm2009 for adults, and (A(H3N2) for the elderly. As expected, considering the diverse nature of the national surveillance datasets included in our analysis, between-estimates heterogeneity was high (I2>90%) for most sRIRs. The variations of countries’ geographic, demographic and economic characteristics and the proportion of outpatients among reported influenza cases explained only part of the heterogeneity, suggesting that multiple factors were at play. CONCLUSIONS : These results highlight the importance of presenting burden of disease estimates by age group and virus (sub)type.Table S1. Number of influenza cases caused by the difference influenza viruses that were included in the analysis. The Global Influenza B Study, 1999-2014.Figure S1. Forest plot of the Relative Illness Ratio for patients aged 0-4 years infected with A(H1N1) influenza virus. The Global Influenza B Study, 1999-2014. Figure S2. Forest plot of the Relative Illness Ratio for patients aged 5-17 years infected with A(H1N1) influenza virus. The Global Influenza B Study, 1999-2014. Figure S3. Forest plot of the Relative Illness Ratio for patients aged 18-39 years infected with A(H1N1) influenza virus. The Global Influenza B Study, 1999-2014. Figure S4. Forest plot of the Relative Illness Ratio for patients aged 40-64 years infected with A(H1N1) influenza virus. The Global Influenza B Study, 1999-2014. Figure S5. Forest plot of the Relative Illness Ratio for patients aged 65+ years infected with A(H1N1) influenza virus. The Global Influenza B Study, 1999-2014.Table S2. Summary Relative Illness Ratio (sRIR), 95% confidence intervals (95% CI) across age groups and influenza viruses by categories of country ageing index. The Global Influenza B Study, 1999- 2014. Table S3. Summary Relative Illness Ratio (sRIR), 95% confidence intervals (95% CI) across age groups and influenza viruses by percentage of outpatients among cases reported to the influenza surveillance system. The Global Influenza B Study, 1999-2014. Table S4. Summary Relative Illness Ratio (sRIR), 95% confidence intervals (95% CI) across age groups and influenza viruses by country latitude. The Global Influenza B Study, 1999-2014. Table S5. Summary Relative Illness Ratio (sRIR), 95% confidence intervals (95% CI) across age groups and influenza viruses by percentage of influenza cases caused by that influenza virus in the same season. The Global Influenza B Study, 1999-2014. Table S6. Summary Relative Illness Ratio (sRIR), 95% confidence intervals (95% CI) across age groups and influenza viruses by percentage of influenza cases caused by that influenza virus in the previous season. The Global Influenza B Study, 1999-2014. Table S7. Summary Relative Illness Ratio (sRIR), 95% confidence intervals (95% CI) across age groups and influenza viruses by categories of country gross domestic product (GDP) per capita. The Global Influenza B Study, 1999-2014.The Global Influenza B Study is funded by an unrestricted research grant from Sanofi Pasteur.https://bmcinfectdis.biomedcentral.comam2019Medical Virolog

    Influenza cases reported to the national influenza surveillance system by each participating country (from southern- to northern-most) and percentages of cases due to influenza type B virus.

    No full text
    <p>Influenza cases reported to the national influenza surveillance system by each participating country (from southern- to northern-most) and percentages of cases due to influenza type B virus.</p

    Mean percentage of influenza cases by month (black diamonds) and number of times the peak of the influenza season took place in each month (pink squares) for countries in the inter-tropical belt.

    No full text
    <p>Mean percentage of influenza cases by month (black diamonds) and number of times the peak of the influenza season took place in each month (pink squares) for countries in the inter-tropical belt.</p

    Mean percentage of influenza cases by month (black diamonds) and number of times the peak of the influenza season took place in each month (pink squares) for countries in the Southern hemisphere.

    No full text
    <p>Mean percentage of influenza cases by month (black diamonds) and number of times the peak of the influenza season took place in each month (pink squares) for countries in the Southern hemisphere.</p
    corecore