8 research outputs found

    Clusterization by the K-means method when K is unknown

    Get PDF
    There are various methods of objects’ clusterization used in different areas of machine learning. Among the vast amount of clusterization methods, the K-means method is one of the most popular. Such a method has as pros as cons. Speaking about the advantages of this method, we can mention the rather high speed of objects clusterization. The main disadvantage is a necessity to know the number of clusters before the experiment. This paper describes the new way and the new method of clusterization, based on the K-means method. The method we suggest is also quite fast in terms of processing speed, however, it does not require the user to know in advance the exact number of clusters to be processed. The user only has to define the range within which the number of clusters is located. Besides, using suggested method there is a possibility to limit the radius of clusters, which would allow finding objects that express the criteria of one cluster in the most distinctive and accurate way, and it would also allow limiting the number of objects in each cluster within the certain range

    Voice Identification Using Classification Algorithms

    Get PDF
    This article discusses the classification algorithms for the problem of personality identification by voice using machine learning methods. We used the MFCC algorithm in the speech preprocessing process. To solve the problem, a comparative analysis of five classification algorithms was carried out. In the first experiment, the support vector method was determinedβ€”0.90 and multilayer perceptronβ€”0.83, that showed the best results. In the second experiment, a multilayer perceptron with an accuracy of 0.93 was proposed using the Robust scaler method for personal identification. Therefore, to solve this problem, it is possible to use a multi-layer perceptron, taking into account the specifics of the speech signal

    Continuous Speech Recognition of Kazakh Language

    Get PDF
    This article describes the methods of creating a system of recognizing the continuous speech of Kazakh language. Studies on recognition of Kazakh speech in comparison with other languages began relatively recently, that is after obtaining independence of the country, and belongs to low resource languages. A large amount of data is required to create a reliable system and evaluate it accurately. A database has been created for the Kazakh language, consisting of a speech signal and corresponding transcriptions. The continuous speech has been composed of 200 speakers of different genders and ages, and the pronunciation vocabulary of the selected language. Traditional models and deep neural networks have been used to train the system. As a result, a word error rate (WER) of 30.01% has been obtained

    Persian sentences to phoneme sequences conversion based on recurrent neural networks

    No full text
    Grapheme to phoneme conversion is one of the main subsystems of Text-to-Speech (TTS) systems. Converting sequence of written words to their corresponding phoneme sequences for the Persian language is more challenging than other languages; because in the standard orthography of this language the short vowels are omitted and the pronunciation ofwords depends on their positions in a sentence. Common approaches used in the Persian commercial TTS systems have several modules and complicated models for natural language processing and homograph disambiguation that make the implementation harder as well as reducing the overall precision of system. In this paper we define the grapheme-to-phoneme conversion as a sequential labeling problem; and use the modified Recurrent Neural Networks (RNN) to create a smart and integrated model for this purpose. The recurrent networks are modified to be bidirectional and equipped with Long-Short Term Memory (LSTM) blocks to acquire most of the past and future contextual information for decision making. The experiments conducted in this paper show that in addition to having a unified structure the bidirectional RNN-LSTM has a good performance in recognizing the pronunciation of the Persian sentences with the precision more than 98 percent

    Clusterization by the K-means method when K is unknown

    No full text
    There are various methods of objects’ clusterization used in different areas of machine learning. Among the vast amount of clusterization methods, the K-means method is one of the most popular. Such a method has as pros as cons. Speaking about the advantages of this method, we can mention the rather high speed of objects clusterization. The main disadvantage is a necessity to know the number of clusters before the experiment. This paper describes the new way and the new method of clusterization, based on the K-means method. The method we suggest is also quite fast in terms of processing speed, however, it does not require the user to know in advance the exact number of clusters to be processed. The user only has to define the range within which the number of clusters is located. Besides, using suggested method there is a possibility to limit the radius of clusters, which would allow finding objects that express the criteria of one cluster in the most distinctive and accurate way, and it would also allow limiting the number of objects in each cluster within the certain range

    ВизначСння Π³Ρ€Π°ΠΌΠ°Ρ‚ΠΈΡ‡Π½ΠΈΡ… ΠΊΠ°Ρ‚Π΅Π³ΠΎΡ€Ρ–ΠΉ Ρ‚ΡƒΡ€Π΅Ρ†ΡŒΠΊΠΎΡ— Ρ‚Π° ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΠΎΡ— ΠΌΠΎΠ² Π· використанням Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ–Π² машинного навчання Ρ‚Π° складання словників синтаксичного Π°Π½Π°Π»Ρ–Π·Π°Ρ‚ΠΎΡ€Π° Π½Π° основі Π³Ρ€Π°ΠΌΠ°Ρ‚ΠΈΠΊΠΈ Π·Π²'язків

    No full text
    This research is aimed at identifying the parts of speech for the Kazakh and Turkish languages in an information retrieval system. The proposed algorithms are based on machine learning techniques. In this paper, we consider the binary classification of words according to parts of speech. We decided to take the most popular machine learning algorithms. In this paper, the following approaches and well-known machine learning algorithms are studied and considered. We defined 7Β dictionaries and tagged 135Β million words in Kazakh and 9Β dictionaries and 50Β million words in the Turkish language. The main problem considered in the paper is to create algorithms for the execution of dictionaries of the so-called Link Grammar Parser (LGP) system, in particular for the Kazakh and Turkish languages, using machine learning techniques. The focus of the research is on the review and comparison of machine learning algorithms and methods that have accomplished results on various natural language processing tasks such as grammatical categories determination. For the operation of the LGP system, a dictionary is created in which a connector for each word is indicated – the type of connection that can be created using this word. The authors considered methods of filling in LGP dictionaries using machine learning.Β  The complexities of natural language processing, however, do not exclude the possibility of identifying narrower tasks that can already be solved algorithmically: for example, determining parts of speech or splitting texts into logical groups. However, some features of natural languages significantly reduce the effectiveness of these solutions. Thus, taking into account all word forms for each word in the Kazakh and Turkish languages increases the complexity of text processing by an order of magnitudeΠ”Π°Π½Π½ΠΎΠ΅ исслСдованиС Π½Π°ΠΏΡ€Π°Π²Π»Π΅Π½ΠΎ Π½Π° ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ частСй Ρ€Π΅Ρ‡ΠΈ казахского ΠΈ Ρ‚ΡƒΡ€Π΅Ρ†ΠΊΠΎΠ³ΠΎ языков Π² ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½ΠΎ-поисковой систСмС. ΠŸΡ€Π΅Π΄Π»Π°Π³Π°Π΅ΠΌΡ‹Π΅ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ‹ основаны Π½Π° ΠΌΠ΅Ρ‚ΠΎΠ΄Π°Ρ… машинного обучСния. Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ рассматриваСтся двоичная классификация слов ΠΏΠΎ частям Ρ€Π΅Ρ‡ΠΈ. ΠœΡ‹ Ρ€Π΅ΡˆΠΈΠ»ΠΈ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ самыС извСстныС Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ‹ машинного обучСния. Π’ Π΄Π°Π½Π½ΠΎΠΉ ΡΡ‚Π°Ρ‚ΡŒΠ΅ ΠΈΠ·ΡƒΡ‡Π°ΡŽΡ‚ΡΡ ΠΈ Ρ€Π°ΡΡΠΌΠ°Ρ‚Ρ€ΠΈΠ²Π°ΡŽΡ‚ΡΡ ΡΠ»Π΅Π΄ΡƒΡŽΡ‰ΠΈΠ΅ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄Ρ‹ ΠΈ извСстныС Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ‹ машинного обучСния. ΠœΡ‹ ΠΎΠΏΡ€Π΅Π΄Π΅Π»ΠΈΠ»ΠΈ 7 словарСй ΠΈ ΠΎΡ‚ΠΌΠ΅Ρ‚ΠΈΠ»ΠΈ 135 ΠΌΠΈΠ»Π»ΠΈΠΎΠ½ΠΎΠ² слов Π½Π° казахском языкС ΠΈ 9 словарСй ΠΈ 50 ΠΌΠΈΠ»Π»ΠΈΠΎΠ½ΠΎΠ² слов Π½Π° Ρ‚ΡƒΡ€Π΅Ρ†ΠΊΠΎΠΌ языкС. Π“Π»Π°Π²Π½ΠΎΠΉ Π·Π°Π΄Π°Ρ‡Π΅ΠΉ, рассматриваСмой Π² Ρ€Π°Π±ΠΎΡ‚Π΅, являСтся созданиС Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² составлСния словарСй Ρ‚Π°ΠΊ Π½Π°Π·Ρ‹Π²Π°Π΅ΠΌΠΎΠΉ систСмы синтаксичСского Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€Π° Π½Π° основС Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊΠΈ связСй (LGP), Π² частности для казахского ΠΈ Ρ‚ΡƒΡ€Π΅Ρ†ΠΊΠΎΠ³ΠΎ языков, с использованиСм ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ² машинного обучСния. ОсновноС Π²Π½ΠΈΠΌΠ°Π½ΠΈΠ΅ Π² исслСдовании удСляСтся Π°Π½Π°Π»ΠΈΠ·Ρƒ ΠΈ ΡΡ€Π°Π²Π½Π΅Π½ΠΈΡŽ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² ΠΈ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ² машинного обучСния, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π΄Π°Π»ΠΈ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹ Π² Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹Ρ… Π·Π°Π΄Π°Ρ‡Π°Ρ… ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ СстСствСнного языка, Ρ‚Π°ΠΊΠΈΡ… ΠΊΠ°ΠΊ ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ грамматичСских ΠΊΠ°Ρ‚Π΅Π³ΠΎΡ€ΠΈΠΉ. Для систСмы LGP создаСтся ΡΠ»ΠΎΠ²Π°Ρ€ΡŒ, Π² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ для ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ слова указываСтся связка – Ρ‚ΠΈΠΏ связки, ΠΊΠΎΡ‚ΠΎΡ€ΡƒΡŽ ΠΌΠΎΠΆΠ½ΠΎ ΡΠΎΠ·Π΄Π°Ρ‚ΡŒ с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ этого слова. Авторами рассмотрСны ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ составлСния словарСй LGP с использованиСм машинного обучСния. Однако слоТности ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ СстСствСнного языка Π½Π΅ ΠΈΡΠΊΠ»ΡŽΡ‡Π°ΡŽΡ‚ возмоТности опрСдСлСния Π±ΠΎΠ»Π΅Π΅ ΡƒΠ·ΠΊΠΈΡ… Π·Π°Π΄Π°Ρ‡, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ ΡƒΠΆΠ΅ ΠΌΠΎΠ³ΡƒΡ‚ Ρ€Π΅ΡˆΠ°Ρ‚ΡŒΡΡ алгоритмичСски: Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ частСй Ρ€Π΅Ρ‡ΠΈ ΠΈΠ»ΠΈ Ρ€Π°Π·Π±ΠΈΠ΅Π½ΠΈΠ΅ тСкстов Π½Π° логичСскиС Π³Ρ€ΡƒΠΏΠΏΡ‹. Π’ΠΏΡ€ΠΎΡ‡Π΅ΠΌ Π½Π΅ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ особСнности СстСствСнных языков Π·Π½Π°Ρ‡ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎ ΡΠ½ΠΈΠΆΠ°ΡŽΡ‚ ΡΡ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΡΡ‚ΡŒ этих Ρ€Π΅ΡˆΠ΅Π½ΠΈΠΉ. Π’Π°ΠΊΠΈΠΌ ΠΎΠ±Ρ€Π°Π·ΠΎΠΌ, ΡƒΡ‡Π΅Ρ‚ всСх словоформ для ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ слова Π² казахском ΠΈ Ρ‚ΡƒΡ€Π΅Ρ†ΠΊΠΎΠΌ языках ΡƒΠ²Π΅Π»ΠΈΡ‡ΠΈΠ²Π°Π΅Ρ‚ ΡΠ»ΠΎΠΆΠ½ΠΎΡΡ‚ΡŒ ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ тСкста Π½Π° порядокДанС дослідТСння спрямованС Π½Π° визначСння частин ΠΌΠΎΠ²ΠΈ ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΠΎΡ— Ρ‚Π° Ρ‚ΡƒΡ€Π΅Ρ†ΡŒΠΊΠΎΡ— ΠΌΠΎΠ² Π² Ρ–Π½Ρ„ΠΎΡ€ΠΌΠ°Ρ†Ρ–ΠΉΠ½ΠΎ-ΠΏΠΎΡˆΡƒΠΊΠΎΠ²Ρ–ΠΉ систСмі. Π—Π°ΠΏΡ€ΠΎΠΏΠΎΠ½ΠΎΠ²Π°Π½Ρ– Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΈ засновані Π½Π° ΠΌΠ΅Ρ‚ΠΎΠ΄Π°Ρ… машинного навчання. Π£ Ρ€ΠΎΠ±ΠΎΡ‚Ρ– Ρ€ΠΎΠ·Π³Π»ΡΠ΄Π°Ρ”Ρ‚ΡŒΡΡ Π΄Π²Ρ–ΠΉΠΊΠΎΠ²Π° класифікація слів Π·Π° частинами ΠΌΠΎΠ²ΠΈ. Ми Π²ΠΈΡ€Ρ–ΡˆΠΈΠ»ΠΈ використовувати Π½Π°ΠΉΠ²Ρ–Π΄ΠΎΠΌΡ–ΡˆΡ– Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΈ машинного навчання. Π£ Π΄Π°Π½Ρ–ΠΉ статті Π²ΠΈΠ²Ρ‡Π°ΡŽΡ‚ΡŒΡΡ Ρ– Ρ€ΠΎΠ·Π³Π»ΡΠ΄Π°ΡŽΡ‚ΡŒΡΡ наступні ΠΏΡ–Π΄Ρ…ΠΎΠ΄ΠΈ Ρ– Π²Ρ–Π΄ΠΎΠΌΡ– Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΈ машинного навчання. Ми Π²ΠΈΠ·Π½Π°Ρ‡ΠΈΠ»ΠΈ 7 словників Ρ– Π²Ρ–Π΄Π·Π½Π°Ρ‡ΠΈΠ»ΠΈ 135 ΠΌΡ–Π»ΡŒΠΉΠΎΠ½Ρ–Π² слів ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΠΎΡŽ мовою Ρ– 9 словників Ρ– 50 ΠΌΡ–Π»ΡŒΠΉΠΎΠ½Ρ–Π² слів Ρ‚ΡƒΡ€Π΅Ρ†ΡŒΠΊΠΎΡŽ мовою. Π“ΠΎΠ»ΠΎΠ²Π½ΠΈΠΌ завданням, Ρ‰ΠΎ Ρ€ΠΎΠ·Π³Π»ΡΠ΄Π°Ρ”Ρ‚ΡŒΡΡ Π² Ρ€ΠΎΠ±ΠΎΡ‚Ρ–, Ρ” створСння Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ–Π² складання словників Ρ‚Π°ΠΊ Π·Π²Π°Π½ΠΎΡ— систСми синтаксичного Π°Π½Π°Π»Ρ–Π·Π°Ρ‚ΠΎΡ€Π° Π½Π° основі Π³Ρ€Π°ΠΌΠ°Ρ‚ΠΈΠΊΠΈ Π·Π²'язків (LGP), Π·ΠΎΠΊΡ€Π΅ΠΌΠ° ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΠΎΡ— Ρ‚Π° Ρ‚ΡƒΡ€Π΅Ρ†ΡŒΠΊΠΎΡ— ΠΌΠΎΠ², Π· використанням ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ–Π² машинного навчання. Основна ΡƒΠ²Π°Π³Π° Π² дослідТСнні ΠΏΡ€ΠΈΠ΄Ρ–Π»ΡΡ”Ρ‚ΡŒΡΡ Π°Π½Π°Π»Ρ–Π·Ρƒ Ρ‚Π° ΠΏΠΎΡ€Ρ–Π²Π½ΡΠ½Π½ΡŽ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ–Π² Ρ– ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ–Π² машинного навчання, які Π΄Π°Π»ΠΈ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΈ Π² Ρ€Ρ–Π·Π½ΠΈΡ… Π·Π°Π΄Π°Ρ‡Π°Ρ… ΠΎΠ±Ρ€ΠΎΠ±ΠΊΠΈ ΠΏΡ€ΠΈΡ€ΠΎΠ΄Π½ΠΎΡ— ΠΌΠΎΠ²ΠΈ, Ρ‚Π°ΠΊΠΈΡ… як визначСння Π³Ρ€Π°ΠΌΠ°Ρ‚ΠΈΡ‡Π½ΠΈΡ… ΠΊΠ°Ρ‚Π΅Π³ΠΎΡ€Ρ–ΠΉ. Для систСми LGP ΡΡ‚Π²ΠΎΡ€ΡŽΡ”Ρ‚ΡŒΡΡ словник, Π² якому для ΠΊΠΎΠΆΠ½ΠΎΠ³ΠΎ слова Π²ΠΊΠ°Π·ΡƒΡ”Ρ‚ΡŒΡΡ Π·Π²'язка – Ρ‚ΠΈΠΏ Π·Π²'язки, яку ΠΌΠΎΠΆΠ½Π° створити Π·Π° допомогою Ρ†ΡŒΠΎΠ³ΠΎ слова. Авторами розглянуто ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈ складання словників LGP Π· використанням машинного навчання. Однак складності ΠΎΠ±Ρ€ΠΎΠ±ΠΊΠΈ ΠΏΡ€ΠΈΡ€ΠΎΠ΄Π½ΠΎΡ— ΠΌΠΎΠ²ΠΈ Π½Π΅ Π²ΠΈΠΊΠ»ΡŽΡ‡Π°ΡŽΡ‚ΡŒ моТливості визначСння Π±Ρ–Π»ΡŒΡˆ Π²ΡƒΠ·ΡŒΠΊΠΈΡ… Π·Π°Π΄Π°Ρ‡, які Π²ΠΆΠ΅ ΠΌΠΎΠΆΡƒΡ‚ΡŒ Π²ΠΈΡ€Ρ–ΡˆΡƒΠ²Π°Ρ‚ΠΈΡΡ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ–Ρ‡Π½ΠΎ: Π½Π°ΠΏΡ€ΠΈΠΊΠ»Π°Π΄, визначСння частин ΠΌΠΎΠ²ΠΈ Π°Π±ΠΎ розбиття тСкстів Π½Π° Π»ΠΎΠ³Ρ–Ρ‡Π½Ρ– Π³Ρ€ΡƒΠΏΠΈ. Π’Ρ‚Ρ–ΠΌ дСякі особливості ΠΏΡ€ΠΈΡ€ΠΎΠ΄Π½ΠΈΡ… ΠΌΠΎΠ² Π·Π½Π°Ρ‡Π½ΠΎ Π·Π½ΠΈΠΆΡƒΡŽΡ‚ΡŒ Π΅Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ–ΡΡ‚ΡŒ Ρ†ΠΈΡ… Ρ€Ρ–ΡˆΠ΅Π½ΡŒ. Π’Π°ΠΊΠΈΠΌ Ρ‡ΠΈΠ½ΠΎΠΌ, врахування всіх словоформ для ΠΊΠΎΠΆΠ½ΠΎΠ³ΠΎ слова Π² ΠΊΠ°Π·Π°Ρ…ΡΡŒΠΊΡ–ΠΉ Ρ– Ρ‚ΡƒΡ€Π΅Ρ†ΡŒΠΊΡ–ΠΉ ΠΌΠΎΠ²Π°Ρ… Π·Π±Ρ–Π»ΡŒΡˆΡƒΡ” ΡΠΊΠ»Π°Π΄Π½Ρ–ΡΡ‚ΡŒ ΠΎΠ±Ρ€ΠΎΠ±ΠΊΠΈ тСксту Π½Π° порядо

    Grammatical Categories Determination for Turkish and Kazakh Languages Based on Machine Learning Algorithms and Fulfilling Dictionaries of Link Grammar Parser

    Full text link
    This research is aimed at identifying the parts of speech for the Kazakh and Turkish languages in an information retrieval system. The proposed algorithms are based on machine learning techniques. In this paper, we consider the binary classification of words according to parts of speech. We decided to take the most popular machine learning algorithms. In this paper, the following approaches and well-known machine learning algorithms are studied and considered. We defined 7 dictionaries and tagged 135 million words in Kazakh and 9 dictionaries and 50 million words in the Turkish language. The main problem considered in the paper is to create algorithms for the execution of dictionaries of the so-called Link Grammar Parser (LGP) system, in particular for the Kazakh and Turkish languages, using machine learning techniques. The focus of the research is on the review and comparison of machine learning algorithms and methods that have accomplished results on various natural language processing tasks such as grammatical categories determination. For the operation of the LGP system, a dictionary is created in which a connector for each word is indicated – the type of connection that can be created using this word. The authors considered methods of filling in LGP dictionaries using machine learning. The complexities of natural language processing, however, do not exclude the possibility of identifying narrower tasks that can already be solved algorithmically: for example, determining parts of speech or splitting texts into logical groups. However, some features of natural languages significantly reduce the effectiveness of these solutions. Thus, taking into account all word forms for each word in the Kazakh and Turkish languages increases the complexity of text processing by an order of magnitud

    Continuous Speech Recognition of Kazakh Language

    No full text
    This article describes the methods of creating a system of recognizing the continuous speech of Kazakh language. Studies on recognition of Kazakh speech in comparison with other languages began relatively recently, that is after obtaining independence of the country, and belongs to low resource languages. A large amount of data is required to create a reliable system and evaluate it accurately. A database has been created for the Kazakh language, consisting of a speech signal and corresponding transcriptions. The continuous speech has been composed of 200 speakers of different genders and ages, and the pronunciation vocabulary of the selected language. Traditional models and deep neural networks have been used to train the system. As a result, a word error rate (WER) of 30.01% has been obtained
    corecore