14 research outputs found
Π Π°Π·ΡΠ°Π±ΠΎΡΠΊΠ° Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΠΏΠΎΠΈΡΠΊΠ° ΠΊΠ»ΡΡΠ΅Π²ΡΡ ΡΠ»ΠΎΠ² Π² ΠΊΠΎΡΠΏΡΡΠ΅ ΡΠ΅ΠΊΡΡΠ° Π½Π° ΠΊΠ°Π·Π°Ρ ΡΠΊΠΎΠΌ ΡΠ·ΡΠΊΠ΅
The issue of semantic text analysis occupies a special place in computational linguistics. Researchers in this field have an increased interest in developing an algorithm that will improve the quality of text corpus processing and probabilistic determination of text content. The results of the study on the application of methods, approaches, algorithms for semantic text analysis in computational linguistics in international and Kazakhstan science led to the development of an algorithm of keyword search in a Kazakh text. The first step of the algorithm was to compile a reference dictionary of keywords for the Kazakh language text corpus. The solution to this problem was to apply the Porter (stemmer) algorithm for the Kazakh language text corpus. The implementation of the stemmer allowed highlighting unique word stems and getting a reference dictionary, which was subsequently indexed. The next step is to collect learning data from the text corpus. To calculate the degree of semantic proximity between words, each word is assigned a vector of the corresponding word forms of the reference dictionary, which results in a pair of a keyword and a vector. And the last step of the algorithm is neural network learning. During learning, the error backpropagation method is used, which allows a semantic analysis of the text corpus and obtaining a probabilistic number of words close to the expected number of keywords. This process automates the processing of text material by creating digital learning models of keywords. The algorithm is used to develop a neurocomputer system that will automatically check the text works of online learners. The uniqueness of the keyword search algorithm is the use of neural network learning for texts in the Kazakh language. In Kazakhstan, scientists in the field of computational linguistics conducted a number of studies based on morphological analysis, lemmatization and other approaches and implemented linguistic tools (mainly translation dictionaries). The scope of neural network learning for parsing of the Kazakh language remains an open issue in the Kazakhstan science.The developed algorithm involves solving one of the problems of effective semantic analysis of the text in the Kazakh languageΠΠΎΠΏΡΠΎΡ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° ΡΠ΅ΠΊΡΡΠ° Π·Π°Π½ΠΈΠΌΠ°Π΅Ρ ΠΎΡΠΎΠ±ΠΎΠ΅ ΠΌΠ΅ΡΡΠΎ Π² ΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠ½ΠΎΠΉ Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΠΊΠ΅. ΠΡΡΠ»Π΅Π΄ΠΎΠ²Π°ΡΠ΅Π»ΠΈ Π΄Π°Π½Π½ΠΎΠΉ ΠΎΠ±Π»Π°ΡΡΠΈ ΠΈΠΌΠ΅ΡΡ ΠΏΠΎΠ²ΡΡΠ΅Π½Π½ΡΠΉ ΠΈΠ½ΡΠ΅ΡΠ΅Ρ ΠΊ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠ΅ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ°, ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΊΠΎΡΠΎΡΠΎΠ³ΠΎ ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΡ ΠΏΠΎΠ²ΡΡΠΈΡΡ ΠΊΠ°ΡΠ΅ΡΡΠ²ΠΎ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΊΠΎΡΠΏΡΡΠ° ΡΠ΅ΠΊΡΡΠ° ΠΈ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠ½ΠΎΠ΅ ΠΎΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ ΡΠΎΠ΄Π΅ΡΠΆΠ°Π½ΠΈΡ ΡΠ΅ΠΊΡΡΠ°. Π Π΅Π·ΡΠ»ΡΡΠ°ΡΡ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠΉ ΠΌΠ΅ΡΠΎΠ΄ΠΈΠΊ, ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ΠΎΠ², Π°Π»Π³ΠΎΡΠΈΡΠΌΠΎΠ² Π΄Π»Ρ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° ΡΠ΅ΠΊΡΡΠ° Π² ΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠ½ΠΎΠΉ Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΠΊΠ΅ Π² ΠΌΠ΅ΠΆΠ΄ΡΠ½Π°ΡΠΎΠ΄Π½ΠΎΠΉ ΠΈ ΠΊΠ°Π·Π°Ρ
ΡΡΠ°Π½ΡΠΊΠΎΠΉ Π½Π°ΡΠΊΠ΅ ΠΏΡΠΈΠ²Π΅Π»Π° ΠΊ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠ΅ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΠΏΠΎΠΈΡΠΊΠ° ΠΊΠ»ΡΡΠ΅Π²ΡΡ
ΡΠ»ΠΎΠ² Π² ΡΠ΅ΠΊΡΡΠ΅ Π½Π° ΠΊΠ°Π·Π°Ρ
ΡΠΊΠΎΠΌ ΡΠ·ΡΠΊΠ΅. ΠΠ΅ΡΠ²ΡΠΌ ΡΡΠ°ΠΏΠΎΠΌ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° Π±ΡΠ»ΠΎ ΡΠΎΡΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ ΡΡΠ°Π»ΠΎΠ½Π½ΠΎΠ³ΠΎ ΡΠ»ΠΎΠ²Π°ΡΡ ΠΊΠ»ΡΡΠ΅Π²ΡΡ
ΡΠ»ΠΎΠ² Π΄Π»Ρ ΠΊΠΎΡΠΏΡΡΠ° ΡΠ΅ΠΊΡΡΠ° Π½Π° ΠΊΠ°Π·Π°Ρ
ΡΠΊΠΎΠΌ ΡΠ·ΡΠΊΠ΅. Π Π΅ΡΠ΅Π½ΠΈΠ΅ΠΌ ΡΡΠΎΠΉ ΠΏΡΠΎΠ±Π»Π΅ΠΌΡ ΡΡΠ°Π»ΠΎ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΠΠΎΡΡΠ΅ΡΠ° (ΡΡΠ΅ΠΌΠΌΠ΅ΡΠ°) Π΄Π»Ρ ΠΊΠΎΡΠΏΡΡΠ° ΡΠ΅ΠΊΡΡΠΎΠ² Π½Π° ΠΊΠ°Π·Π°Ρ
ΡΠΊΠΎΠΌ ΡΠ·ΡΠΊΠ΅. Π Π΅Π°Π»ΠΈΠ·Π°ΡΠΈΡ ΡΡΠ΅ΠΌΠΌΠ΅ΡΠ° ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΠ»Π° Π²ΡΠ΄Π΅Π»ΠΈΡΡ ΡΠ½ΠΈΠΊΠ°Π»ΡΠ½ΡΠ΅ ΠΎΡΠ½ΠΎΠ²Ρ ΡΠ»ΠΎΠ² ΠΈ ΠΏΠΎΠ»ΡΡΠΈΡΡ ΡΡΠ°Π»ΠΎΠ½Π½ΡΠΉ ΡΠ»ΠΎΠ²Π°ΡΡ, ΠΊΠΎΡΠΎΡΡΠΉ Π²ΠΏΠΎΡΠ»Π΅Π΄ΡΡΠ²ΠΈΠΈ ΠΏΡΠΎΠΈΠ½Π΄Π΅ΠΊΡΠΈΡΠΎΠ²Π°Π»ΠΈ. Π‘Π»Π΅Π΄ΡΡΡΠΈΠΉ ΡΠ°Π³ β ΡΡΠΎ ΡΠ±ΠΎΡ Π΄Π°Π½Π½ΡΡ
ΠΏΠΎ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ ΠΈΠ· ΠΊΠΎΡΠΏΡΡΠ° ΡΠ΅ΠΊΡΡΠΎΠ². ΠΠ»Ρ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΡ ΡΡΠ΅ΠΏΠ΅Π½ΠΈ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±Π»ΠΈΠ·ΠΎΡΡΠΈ ΠΌΠ΅ΠΆΠ΄Ρ ΡΠ»ΠΎΠ²Π°ΠΌΠΈ ΠΊΠ°ΠΆΠ΄ΠΎΠΌΡ ΡΠ»ΠΎΠ²Ρ ΠΏΡΠΈΡΠ²Π°ΠΈΠ²Π°Π΅ΡΡΡ Π²Π΅ΠΊΡΠΎΡ ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΡΡΡΠΈΡ
Π΅ΠΌΡ ΡΠ»ΠΎΠ²ΠΎΡΠΎΡΠΌ ΡΡΠ°Π»ΠΎΠ½Π½ΠΎΠ³ΠΎ ΡΠ»ΠΎΠ²Π°ΡΡ, Π² ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠ΅ ΠΊΠΎΡΠΎΡΠΎΠ³ΠΎ ΠΏΠΎΠ»ΡΡΠ°Π΅ΡΡΡ ΠΏΠ°ΡΠ° β ΠΊΠ»ΡΡΠ΅Π²ΠΎΠ΅ ΡΠ»ΠΎΠ²ΠΎ ΠΈ Π²Π΅ΠΊΡΠΎΡ. Π ΠΏΠΎΡΠ»Π΅Π΄Π½ΠΈΠΌ ΡΠ°Π³ΠΎΠΌ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΡΠ²Π»ΡΠ΅ΡΡΡ ΠΎΠ±ΡΡΠ΅Π½ΠΈΠ΅ Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ. ΠΡΠΈ ΠΎΠ±ΡΡΠ΅Π½ΠΈΠΈ ΠΏΡΠΈΠΌΠ΅Π½ΡΠ΅ΡΡΡ ΠΌΠ΅ΡΠΎΠ΄ ΠΎΠ±ΡΠ°ΡΠ½ΠΎΠ³ΠΎ ΡΠ°ΡΠΏΡΠΎΡΡΡΠ°Π½Π΅Π½ΠΈΡ ΠΎΡΠΈΠ±ΠΎΠΊ, ΡΡΠΎ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ ΠΏΡΠΎΠ²Π΅ΡΡΠΈ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΠΉ Π°Π½Π°Π»ΠΈΠ· ΠΊΠΎΡΠΏΡΡΠ° ΡΠ΅ΠΊΡΡΠ° ΠΈ ΠΏΠΎΠ»ΡΡΠΈΡΡ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠ½ΠΎΠ΅ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ ΡΠ»ΠΎΠ², Π±Π»ΠΈΠ·ΠΊΠΎΠ΅ ΠΊ ΠΎΠΆΠΈΠ΄Π°Π΅ΠΌΠΎΠΌΡ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²Ρ ΠΊΠ»ΡΡΠ΅Π²ΡΡ
. ΠΡΠΎΡ ΠΏΡΠΎΡΠ΅ΡΡ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·ΠΈΡΠΎΠ²Π°ΡΡ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΡ ΡΠ΅ΠΊΡΡΠΎΠ²ΠΎΠ³ΠΎ ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»Π° ΠΏΡΡΠ΅ΠΌ ΡΠΎΠ·Π΄Π°Π½ΠΈΡ ΡΠΈΡΡΠΎΠ²ΡΡ
ΠΎΠ±ΡΡΠ°ΡΡΠΈΡ
ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ ΠΊΠ»ΡΡΠ΅Π²ΡΡ
ΡΠ»ΠΎΠ². ΠΠ»Π³ΠΎΡΠΈΡΠΌ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ Π΄Π»Ρ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠΈ Π½Π΅ΠΉΡΠΎΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠ½ΠΎΠΉ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΊΠΎΡΠΎΡΡΠΉ Π±ΡΠ΄Π΅Ρ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΡ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΡΡ ΠΏΡΠΎΠ²Π΅ΡΠΊΡ ΡΠ΅ΠΊΡΡΠΎΠ²ΡΡ
ΡΠ°Π±ΠΎΡ ΠΎΠ±ΡΡΠ°ΡΡΠΈΡ
ΡΡ ΠΎΠ½Π»Π°ΠΉΠ½ ΠΊΡΡΡΠΎΠ². Π£Π½ΠΈΠΊΠ°Π»ΡΠ½ΠΎΡΡΡΡ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΠΏΠΎΠΈΡΠΊΠ° ΠΊΠ»ΡΡΠ΅Π²ΡΡ
ΡΠ»ΠΎΠ² ΡΠ²Π»ΡΠ΅ΡΡΡ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ Π½Π΅ΠΉΡΠΎΠ½Π½ΠΎΠΉ ΡΠ΅ΡΠΈ Π΄Π»Ρ ΡΠ΅ΠΊΡΡΠΎΠ² Π½Π° ΠΊΠ°Π·Π°Ρ
ΡΠΊΠΎΠΌ ΡΠ·ΡΠΊΠ΅. Π ΠΠ°Π·Π°Ρ
ΡΡΠ°Π½Π΅ ΡΡΠ΅Π½ΡΠΌΠΈ Π² ΠΎΠ±Π»Π°ΡΡΠΈ ΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠ½ΠΎΠΉ Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΠΊΠΈ Π±ΡΠ»ΠΈ ΠΏΡΠΎΠ²Π΅Π΄Π΅Π½Ρ ΡΡΠ΄ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠΉ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΡ ΠΌΠΎΡΡΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π°, Π»ΠΈΠΌΠΌΠΈΡΠΈΠ·Π°ΡΠΈΠΈ ΠΈ Π΄ΡΡΠ³ΠΈΡ
ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ΠΎΠ² ΠΈ ΡΠ΅Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½Ρ Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΈΠ½ΡΡΡΡΠΌΠ΅Π½ΡΡ (Π² ΠΎΡΠ½ΠΎΠ²Π½ΠΎΠΌ ΡΠ»ΠΎΠ²Π°ΡΠΈ-ΠΏΠ΅ΡΠ΅Π²ΠΎΠ΄ΡΠΈΠΊΠΈ). ΠΠ±Π»Π°ΡΡΡ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΡ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ Π΄Π»Ρ ΡΠΈΠ½ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° ΠΊΠ°Π·Π°Ρ
ΡΠΊΠΎΠ³ΠΎ ΡΠ·ΡΠΊΠΎΠ² ΠΎΡΡΠ°Π΅ΡΡΡ ΠΎΡΠΊΡΡΡΡΠΌ Π²ΠΎΠΏΡΠΎΡΠΎΠΌ Π² ΠΊΠ°Π·Π°Ρ
ΡΡΠ°Π½ΡΠΊΠΎΠΉ Π½Π°ΡΠΊΠ΅.Π Π°Π·ΡΠ°Π±ΠΎΡΠ°Π½Π½ΡΠΉ Π°Π»Π³ΠΎΡΠΈΡΠΌ ΠΏΡΠ΅Π΄ΠΏΠΎΠ»Π°Π³Π°Π΅Ρ ΡΠ΅ΡΠ΅Π½ΠΈΠ΅ ΠΎΠ΄Π½ΠΎΠΉ ΠΈΠ· ΠΏΡΠΎΠ±Π»Π΅ΠΌ Π² ΠΏΠΎΠ»ΡΡΠ΅Π½ΠΈΠΈ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠ³ΠΎ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° ΡΠ΅ΠΊΡΡΠ° Π½Π° ΠΊΠ°Π·Π°Ρ
ΡΠΊΠΎΠΌ ΡΠ·ΡΠΊ
Π ΠΎΠ·ΡΠΎΠ±ΠΊΠ° ΡΠ΅ΠΌΠ°ΡΠΈΡΠ½ΠΎΡ ΡΠ° Π½Π΅ΠΉΡΠΎΠΌΠ΅ΡΠ΅ΠΆΠ΅Π²ΠΎΡ ΠΌΠΎΠ΄Π΅Π»Ρ Π΄Π»Ρ Π½Π°Π²ΡΠ°Π½Π½Ρ Π΄Π°Π½ΠΈΡ
Research in the field of semantic text analysis begins with the study of the structure of natural language. The Kazakh language is unique in that it belongs to agglutinative languages and requires careful study. The object of this study is the text in the Kazakh language. Existing approaches to the study of the semantic analysis of text in the Kazakh language do not consider text analysis using the methods of thematic modeling and learning of neural networks. The purpose of this study is to determine the quality of a topic model based on the LDA (Latent Dirichlet Allocation) method with Gibbs sampling, through neural network learning. The LDA model can determine the semantic probability of the keywords of a single document and give them a rating score. To build a neural network, one of the widely used LSTM architectures was used, which has proven itself well in working with NLP (Natural Language Processing). As a result of learning, it is possible to see to what extent the text was trained and how the semantic analysis of the text in the Kazakh language went. The system, developed on the basis of the LDA model and neural network learning, combines the detected keywords into separate topics. In general, the experimental results showed that the use of deep neural networks gives the expected results of the quality of the LDA model in the processing of the Kazakh language. The developed model of the neural network contributes to the assessment of the accuracy of the semantics of the used text in the Kazakh language. The results obtained can be applied in systems for processing text data, for example, when checking the compliance of the topic and content of the proposed texts (abstracts, term papers, theses, and other works).ΠΠΎΡΠ»ΡΠ΄ΠΆΠ΅Π½Π½Ρ Π² Π³Π°Π»ΡΠ·Ρ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΡΠ·Ρ ΡΠ΅ΠΊΡΡΡ ΠΏΠΎΡΠΈΠ½Π°ΡΡΡΡΡ Π· Π²ΠΈΠ²ΡΠ΅Π½Π½Ρ ΡΡΡΡΠΊΡΡΡΠΈ ΠΏΡΠΈΡΠΎΠ΄Π½ΠΎΡ ΠΌΠΎΠ²ΠΈ. ΠΠ°Π·Π°Ρ
ΡΡΠΊΠ° ΠΌΠΎΠ²Π° ΡΠ½ΡΠΊΠ°Π»ΡΠ½Π° ΡΠΈΠΌ, ΡΠΎ Π²ΡΠ΄Π½ΠΎΡΠΈΡΡΡΡ Π΄ΠΎ Π°Π³Π»ΡΡΠΈΠ½Π°ΡΠΈΠ²Π½ΠΈΡ
ΠΌΠΎΠ² Ρ ΠΏΠΎΡΡΠ΅Π±ΡΡ ΡΠ΅ΡΠ΅Π»ΡΠ½ΠΎΠ³ΠΎ Π²ΠΈΠ²ΡΠ΅Π½Π½Ρ. ΠΠ±'ΡΠΊΡΠΎΠΌ ΡΡΠΎΠ³ΠΎ Π΄ΠΎΡΠ»ΡΠ΄ΠΆΠ΅Π½Π½Ρ Ρ ΡΠ΅ΠΊΡΡ ΠΊΠ°Π·Π°Ρ
ΡΡΠΊΠΎΡ ΠΌΠΎΠ²ΠΎΡ. ΠΡΠ½ΡΡΡΡ ΠΏΡΠ΄Ρ
ΠΎΠ΄ΠΈ ΡΠΎΠ΄ΠΎ Π΄ΠΎΡΠ»ΡΠ΄ΠΆΠ΅Π½Π½Ρ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΡΠ·Ρ ΡΠ΅ΠΊΡΡΡ ΠΊΠ°Π·Π°Ρ
ΡΡΠΊΠΎΡ ΠΌΠΎΠ²ΠΎΡ Π½Π΅ ΡΠΎΠ·Π³Π»ΡΠ΄Π°ΡΡΡ Π°Π½Π°Π»ΡΠ· ΡΠ΅ΠΊΡΡΡ Π·Π° Π΄ΠΎΠΏΠΎΠΌΠΎΠ³ΠΎΡ ΠΌΠ΅ΡΠΎΠ΄ΡΠ² ΡΠ΅ΠΌΠ°ΡΠΈΡΠ½ΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΡΠ²Π°Π½Π½Ρ ΡΠ° Π½Π°Π²ΡΠ°Π½Π½Ρ Π½Π΅ΠΉΡΠΎΠ½Π½ΠΈΡ
ΠΌΠ΅ΡΠ΅ΠΆ. ΠΠ΅ΡΠΎΡ Π΄Π°Π½ΠΎΠ³ΠΎ Π΄ΠΎΡΠ»ΡΠ΄ΠΆΠ΅Π½Π½Ρ Ρ Π²ΠΈΠ·Π½Π°ΡΠ΅Π½Π½Ρ ΡΠΊΠΎΡΡΡ ΡΠ΅ΠΌΠ°ΡΠΈΡΠ½ΠΎΡ ΠΌΠΎΠ΄Π΅Π»Ρ Π½Π° ΠΎΡΠ½ΠΎΠ²Ρ ΠΌΠ΅ΡΠΎΠ΄Ρ LDA (Latent Dirichlet Allocation) ΡΠ· ΡΠ΅ΠΌΠΏΠ»ΡΠ²Π°Π½Π½ΡΠΌ ΠΡΠ±ΡΠ°, ΡΠ΅ΡΠ΅Π· Π½Π°Π²ΡΠ°Π½Π½Ρ Π½Π΅ΠΉΡΠΎΠ½Π½ΠΎΡ ΠΌΠ΅ΡΠ΅ΠΆΡ. LDA ΠΌΠΎΠ΄Π΅Π»Ρ ΠΌΠΎΠΆΠ΅ Π²ΠΈΠ·Π½Π°ΡΠΈΡΠΈ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ½Ρ ΠΌΠΎΠΆΠ»ΠΈΠ²ΡΡΡΡ ΠΊΠ»ΡΡΠΎΠ²ΠΈΡ
ΡΠ»ΡΠ² ΠΎΠ΄Π½ΠΎΠ³ΠΎ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠ° Ρ Π΄Π°ΡΠΈ ΡΠΌ ΠΊΠΎΠ΅ΡΡΡΡΡΠ½Ρ ΠΎΡΡΠ½ΠΊΠΈ. ΠΠ»Ρ ΠΏΠΎΠ±ΡΠ΄ΠΎΠ²ΠΈ Π½Π΅ΠΉΡΠΎΠ½Π½ΠΎΡ ΠΌΠ΅ΡΠ΅ΠΆΡ Π±ΡΠ»Π° Π²ΠΈΠΊΠΎΡΠΈΡΡΠ°Π½Π° ΠΎΠ΄Π½Π° Π· ΠΏΠΎΡΠΈΡΠ΅Π½ΠΈΡ
Π°ΡΡ
ΡΡΠ΅ΠΊΡΡΡ LSTM, ΡΠΊΠ° Π΄ΠΎΠ±ΡΠ΅ Π·Π°ΡΠ΅ΠΊΠΎΠΌΠ΅Π½Π΄ΡΠ²Π°Π»Π° ΡΠ΅Π±Π΅ Π² ΡΠΎΠ±ΠΎΡΡ Π· NLP (Natural Language Processing). Π ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΡ Π½Π°Π²ΡΠ°Π½Π½Ρ ΠΌΠΎΠΆΠ½Π° ΠΏΠΎΠ±Π°ΡΠΈΡΠΈ, ΡΠΊΠΎΡ ΠΌΡΡΠΎΡ ΡΠ΅ΠΊΡΡ Π½Π°Π²ΡΠΈΠ²ΡΡ Ρ ΡΠΊ ΠΏΡΠΎΠΉΡΠΎΠ² ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ½ΠΈΠΉ Π°Π½Π°Π»ΡΠ· ΡΠ΅ΠΊΡΡΡ ΠΊΠ°Π·Π°Ρ
ΡΡΠΊΠΎΡ ΠΌΠΎΠ²ΠΎΡ. Π‘ΠΈΡΡΠ΅ΠΌΠ°, ΡΠΎΠ·ΡΠΎΠ±Π»Π΅Π½Π° Π½Π° ΠΎΡΠ½ΠΎΠ²Ρ LDA ΠΌΠΎΠ΄Π΅Π»Ρ ΡΠ° Π½Π°Π²ΡΠ°Π½Π½Ρ Π½Π΅ΠΉΡΠΎΠ½Π½ΠΎΡ ΠΌΠ΅ΡΠ΅ΠΆΡ, ΠΏΠΎΡΠ΄Π½ΡΡ Π²ΠΈΡΠ²Π»Π΅Π½Ρ ΠΊΠ»ΡΡΠΎΠ²Ρ ΡΠ»ΠΎΠ²Π° Π² ΠΎΠΊΡΠ΅ΠΌΡ ΡΠ΅ΠΌΠΈ. Π ΡΡΠ»ΠΎΠΌΡ Π΅ΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Π»ΡΠ½Ρ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΈ ΠΏΠΎΠΊΠ°Π·Π°Π»ΠΈ, ΡΠΎ Π²ΠΈΠΊΠΎΡΠΈΡΡΠ°Π½Π½Ρ Π³Π»ΠΈΠ±ΠΎΠΊΠΈΡ
Π½Π΅ΠΉΡΠΎΠ½Π½ΠΈΡ
ΠΌΠ΅ΡΠ΅ΠΆ Π΄Π°ΡΡΡ ΠΏΠ΅ΡΠ΅Π΄Π±Π°ΡΡΠ²Π°Π½Ρ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΈ ΡΠΊΠΎΡΡΡ LDA ΠΌΠΎΠ΄Π΅Π»Ρ Π² ΠΎΠ±ΡΠΎΠ±ΡΡ ΠΊΠ°Π·Π°Ρ
ΡΡΠΊΠΎΡ ΠΌΠΎΠ²ΠΈ. Π ΠΎΠ·ΡΠΎΠ±Π»Π΅Π½Π° ΠΌΠΎΠ΄Π΅Π»Ρ Π½Π΅ΠΉΡΠΎΠ½Π½ΠΎΡ ΠΌΠ΅ΡΠ΅ΠΆΡ ΡΠΏΡΠΈΡΡ ΠΎΡΡΠ½ΡΡ Π²ΠΈΠ·Π½Π°ΡΠ΅Π½Π½Ρ ΡΠΎΡΠ½ΠΎΡΡΡ ΡΠ΅ΠΌΠ°Π½ΡΠΈΠΊΠΈ ΡΠ΅ΠΊΡΡΡ, ΡΠΎ Π²ΠΈΠΊΠΎΡΠΈΡΡΠΎΠ²ΡΡΡΡΡΡ ΠΊΠ°Π·Π°Ρ
ΡΡΠΊΠΎΡ ΠΌΠΎΠ²ΠΎΡ. ΠΡΡΠΈΠΌΠ°Π½Ρ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΈ ΠΌΠΎΠΆΠ½Π° Π·Π°ΡΡΠΎΡΡΠ²Π°ΡΠΈ Π² ΡΠΈΡΡΠ΅ΠΌΠ°Ρ
ΠΎΠ±ΡΠΎΠ±ΠΊΠΈ ΡΠ΅ΠΊΡΡΠΎΠ²ΠΈΡ
Π΄Π°Π½ΠΈΡ
, Π½Π°ΠΏΡΠΈΠΊΠ»Π°Π΄, ΠΏΡΠΈ ΠΏΠ΅ΡΠ΅Π²ΡΡΡΡ Π²ΡΠ΄ΠΏΠΎΠ²ΡΠ΄Π½ΠΎΡΡΡ ΡΠ΅ΠΌΠΈ ΡΠ° Π·ΠΌΡΡΡΡ Π·Π°ΠΏΡΠΎΠΏΠΎΠ½ΠΎΠ²Π°Π½ΠΈΡ
ΡΠ΅ΠΊΡΡΡΠ² (ΡΠ΅ΡΠ΅ΡΠ°ΡΡΠ², ΠΊΡΡΡΠΎΠ²ΠΈΡ
, Π΄ΠΈΠΏΠ»ΠΎΠΌΠ½ΠΈΡ
ΡΠ° ΡΠ½ΡΠΈΡ
ΡΠΎΠ±ΡΡ)
Development of the Algorithm of Keyword Search in the Kazakh Language Text Corpus
The issue of semantic text analysis occupies a special place in computational linguistics. Researchers in this field have an increased interest in developing an algorithm that will improve the quality of text corpus processing and probabilistic determination of text content. The results of the study on the application of methods, approaches, algorithms for semantic text analysis in computational linguistics in International and Kazakhstan science led to the development of an algorithm of keyword search in a Kazakh text. The first step of the algorithm was to compile a reference dictionary of keywords for the Kazakh language text corpus. The solution to this problem was to apply the Porter (stemmer) algorithm for the Kazakh language text corpus. The implementation of the stemmer allowed highlighting unique word stems and getting a reference dictionary, which was subsequently indexed. The next step is to collect learning data from the text corpus. To calculate the degree of semantic proximity between words, each word is assigned a vector of the corresponding word forms of the reference dictionary, which results in a pair of a keyword and a vector. And the last step of the algorithm is neural network learning. During learning, the error backpropagation method is used, which allows a semantic analysis of the text corpus and obtaining a probabilistic number of words close to the expected number of keywords. This process automates the processing of text material by creating digital learning models of keywords. The algorithm is used to develop a neurocomputer system that will automatically check the text works of online learners. The uniqueness of the keyword search algorithm is the use of neural network learning for texts in the Kazakh language. In Kazakhstan, scientists in the field of computational linguistics conducted a number of studies based on morphological analysis, lemmatization and other approaches and implemented linguistic tools (mainly translation dictionaries). The scope of neural network learning for parsing of the Kazakh language remains an open issue in the Kazakhstan science.The developed algorithm involves solving one of the problems of effective semantic analysis of the text in the Kazakh languag
Distribution of influenza virus types by age using case-based global surveillance data from twenty-nine countries, 1999-2014
Background: Influenza disease burden varies by age and this has important public health implications. We compared the proportional distribution of different influenza virus types within age strata using surveillance data from twenty-nine countries during 1999-2014 (N=358,796 influenza cases)
Temporal Patterns of Influenza A and B in Tropical and Temperate Countries: What Are the Lessons for Influenza Vaccination?
<div><p>Introduction</p><p>Determining the optimal time to vaccinate is important for influenza vaccination programmes. Here, we assessed the temporal characteristics of influenza epidemics in the Northern and Southern hemispheres and in the tropics, and discuss their implications for vaccination programmes.</p><p>Methods</p><p>This was a retrospective analysis of surveillance data between 2000 and 2014 from the Global Influenza B Study database. The seasonal peak of influenza was defined as the week with the most reported cases (overall, A, and B) in the season. The duration of seasonal activity was assessed using the maximum proportion of influenza cases during three consecutive months and the minimum number of months with β₯80% of cases in the season. We also assessed whether co-circulation of A and B virus types affected the duration of influenza epidemics.</p><p>Results</p><p>212 influenza seasons and 571,907 cases were included from 30 countries. In tropical countries, the seasonal influenza activity lasted longer and the peaks of influenza A and B coincided less frequently than in temperate countries. Temporal characteristics of influenza epidemics were heterogeneous in the tropics, with distinct seasonal epidemics observed only in some countries. Seasons with co-circulation of influenza A and B were longer than influenza A seasons, especially in the tropics.</p><p>Discussion</p><p>Our findings show that influenza seasonality is less well defined in the tropics than in temperate regions. This has important implications for vaccination programmes in these countries. High-quality influenza surveillance systems are needed in the tropics to enable decisions about when to vaccinate.</p></div
Distribution of influenza virus types by age using case-based global surveillance data from twenty-nine countries, 1999-2014
BACKGROUND : Influenza disease burden varies by age and this has important public health implications. We
compared the proportional distribution of different influenza virus types within age strata using surveillance data
from twenty-nine countries during 1999-2014 (N=358,796 influenza cases).
METHODS : For each virus, we calculated a Relative Illness Ratio (defined as the ratio of the percentage of cases in an
age group to the percentage of the country population in the same age group) for young children (0-4 years),
older children (5-17 years), young adults (18-39 years), older adults (40-64 years), and the elderly (65+ years). We
used random-effects meta-analysis models to obtain summary relative illness ratios (sRIRs), and conducted metaregression
and sub-group analyses to explore causes of between-estimates heterogeneity.
RESULTS : The influenza virus with highest sRIR was A(H1N1) for young children, B for older children, A(H1N1)
pdm2009 for adults, and (A(H3N2) for the elderly. As expected, considering the diverse nature of the national
surveillance datasets included in our analysis, between-estimates heterogeneity was high (I2>90%) for most sRIRs.
The variations of countriesβ geographic, demographic and economic characteristics and the proportion of
outpatients among reported influenza cases explained only part of the heterogeneity, suggesting that multiple
factors were at play.
CONCLUSIONS : These results highlight the importance of presenting burden of disease estimates by age group and
virus (sub)type.Table S1. Number of influenza cases caused by the
difference influenza viruses that were included in the analysis. The Global
Influenza B Study, 1999-2014.Figure S1. Forest plot of the Relative Illness Ratio for
patients aged 0-4 years infected with A(H1N1) influenza virus. The Global
Influenza B Study, 1999-2014. Figure S2. Forest plot of the Relative Illness
Ratio for patients aged 5-17 years infected with A(H1N1) influenza virus.
The Global Influenza B Study, 1999-2014. Figure S3. Forest plot of the
Relative Illness Ratio for patients aged 18-39 years infected with A(H1N1)
influenza virus. The Global Influenza B Study, 1999-2014. Figure S4. Forest
plot of the Relative Illness Ratio for patients aged 40-64 years infected
with A(H1N1) influenza virus. The Global Influenza B Study, 1999-2014.
Figure S5. Forest plot of the Relative Illness Ratio for patients aged 65+
years infected with A(H1N1) influenza virus. The Global Influenza B Study,
1999-2014.Table S2. Summary Relative Illness Ratio (sRIR), 95%
confidence intervals (95% CI) across age groups and influenza viruses by
categories of country ageing index. The Global Influenza B Study, 1999-
2014. Table S3. Summary Relative Illness Ratio (sRIR), 95% confidence
intervals (95% CI) across age groups and influenza viruses by percentage
of outpatients among cases reported to the influenza surveillance system.
The Global Influenza B Study, 1999-2014. Table S4. Summary Relative
Illness Ratio (sRIR), 95% confidence intervals (95% CI) across age groups
and influenza viruses by country latitude. The Global Influenza B Study,
1999-2014. Table S5. Summary Relative Illness Ratio (sRIR), 95%
confidence intervals (95% CI) across age groups and influenza viruses by
percentage of influenza cases caused by that influenza virus in the same
season. The Global Influenza B Study, 1999-2014. Table S6. Summary
Relative Illness Ratio (sRIR), 95% confidence intervals (95% CI) across age
groups and influenza viruses by percentage of influenza cases caused by
that influenza virus in the previous season. The Global Influenza B Study, 1999-2014. Table S7. Summary Relative Illness Ratio (sRIR), 95%
confidence intervals (95% CI) across age groups and influenza viruses
by categories of country gross domestic product (GDP) per capita. The
Global Influenza B Study, 1999-2014.The Global Influenza B Study is funded by an unrestricted research grant
from Sanofi Pasteur.https://bmcinfectdis.biomedcentral.comam2019Medical Virolog
Influenza cases reported to the national influenza surveillance system by each participating country (from southern- to northern-most) and percentages of cases due to influenza type B virus.
<p>Influenza cases reported to the national influenza surveillance system by each participating country (from southern- to northern-most) and percentages of cases due to influenza type B virus.</p
Mean percentage of influenza cases by month (black diamonds) and number of times the peak of the influenza season took place in each month (pink squares) for countries in the inter-tropical belt.
<p>Mean percentage of influenza cases by month (black diamonds) and number of times the peak of the influenza season took place in each month (pink squares) for countries in the inter-tropical belt.</p
Mean percentage of influenza cases by month (black diamonds) and number of times the peak of the influenza season took place in each month (pink squares) for countries in the Southern hemisphere.
<p>Mean percentage of influenza cases by month (black diamonds) and number of times the peak of the influenza season took place in each month (pink squares) for countries in the Southern hemisphere.</p