Search CORE

8 research outputs found

Clusterization by the K-means method when K is unknown

Author: Litvinenko Natalya
Mamyrbayev Orken
Shayakhmetova Assem
Turdalyuly Mussa
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

There are various methods of objects’ clusterization used in different areas of machine learning. Among the vast amount of clusterization methods, the K-means method is one of the most popular. Such a method has as pros as cons. Speaking about the advantages of this method, we can mention the rather high speed of objects clusterization. The main disadvantage is a necessity to know the number of clusters before the experiment. This paper describes the new way and the new method of clusterization, based on the K-means method. The method we suggest is also quite fast in terms of processing speed, however, it does not require the user to know in advance the exact number of clusters to be processed. The user only has to define the range within which the number of clusters is located. Besides, using suggested method there is a possibility to limit the radius of clusters, which would allow finding objects that express the criteria of one cluster in the most distinctive and accurate way, and it would also allow limiting the number of objects in each cluster within the certain range

Directory of Open Access Journals

Voice Identification Using Classification Algorithms

Author: Mamyrbayev Orken
Medeni Tolga Ihsan
Mekebayev Nurbapa
Oshanova Nurzhamal
Turdalyuly Mussa
Yessentay Aigerim
Publication venue: 'IntechOpen'
Publication date: 21/08/2019
Field of study

This article discusses the classification algorithms for the problem of personality identification by voice using machine learning methods. We used the MFCC algorithm in the speech preprocessing process. To solve the problem, a comparative analysis of five classification algorithms was carried out. In the first experiment, the support vector method was determined—0.90 and multilayer perceptron—0.83, that showed the best results. In the second experiment, a multilayer perceptron with an accuracy of 0.93 was proposed using the Robust scaler method for personal identification. Therefore, to solve this problem, it is possible to use a multi-layer perceptron, taking into account the specifics of the speech signal

IntechOpen

Crossref

Continuous Speech Recognition of Kazakh Language

Author: Akhmetov Bekturgan
BabaAli Bagher
Duisenbayeva Aigerim
Keylan Alimukhan
Mamyrbayev Оrken
Mekebayev Nurbapa
Mukhsina Kuralay
Nabieva Gulnaz
Turdalyuly Mussa
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

This article describes the methods of creating a system of recognizing the continuous speech of Kazakh language. Studies on recognition of Kazakh speech in comparison with other languages began relatively recently, that is after obtaining independence of the country, and belongs to low resource languages. A large amount of data is required to create a reliable system and evaluate it accurately. A database has been created for the Kazakh language, consisting of a speech signal and corresponding transcriptions. The continuous speech has been composed of 200 speakers of different genders and ages, and the pronunciation vocabulary of the selected language. Traditional models and deep neural networks have been used to train the system. As a result, a word error rate (WER) of 30.01% has been obtained

Directory of Open Access Journals

Persian sentences to phoneme sequences conversion based on recurrent neural networks

Author: Babaali Bagher
Behbahani Yasser Mohseni
Turdalyuly Mussa
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/12/2016
Field of study

Grapheme to phoneme conversion is one of the main subsystems of Text-to-Speech (TTS) systems. Converting sequence of written words to their corresponding phoneme sequences for the Persian language is more challenging than other languages; because in the standard orthography of this language the short vowels are omitted and the pronunciation ofwords depends on their positions in a sentence. Common approaches used in the Persian commercial TTS systems have several modules and complicated models for natural language processing and homograph disambiguation that make the implementation harder as well as reducing the overall precision of system. In this paper we define the grapheme-to-phoneme conversion as a sequential labeling problem; and use the modified Recurrent Neural Networks (RNN) to create a smart and integrated model for this purpose. The recurrent networks are modified to be bidirectional and equipped with Long-Short Term Memory (LSTM) blocks to acquire most of the past and future contextual information for decision making. The experiments conducted in this paper show that in addition to having a unified structure the bidirectional RNN-LSTM has a good performance in recognizing the pronunciation of the Persian sentences with the precision more than 98 percent

Directory of Open Access Journals

Clusterization by the K-means method when K is unknown

Author: Assem Shayakhmetova
Mussa Turdalyuly
Natalya Litvinenko
Orken Mamyrbayev
Publication venue: EDP Sciences
Publication date: 01/02/2019
Field of study

EDP Sciences OAI-PMH repository (1.2.0)

Визначення граматичних категорій турецької та казахської мов з використанням алгоритмів машинного навчання та складання словників синтаксичного аналізатора на основі граматики зв'язків

Author: Sakenov Bakzhan
Sambetbayeva Madina
Turdalyuly Mussa
Tussupova Madina
Yerimbetova Aigerim
Publication venue: 'Private Company Technology Center'
Publication date: 31/10/2021
Field of study

This research is aimed at identifying the parts of speech for the Kazakh and Turkish languages in an information retrieval system. The proposed algorithms are based on machine learning techniques. In this paper, we consider the binary classification of words according to parts of speech. We decided to take the most popular machine learning algorithms. In this paper, the following approaches and well-known machine learning algorithms are studied and considered. We defined 7 dictionaries and tagged 135 million words in Kazakh and 9 dictionaries and 50 million words in the Turkish language. The main problem considered in the paper is to create algorithms for the execution of dictionaries of the so-called Link Grammar Parser (LGP) system, in particular for the Kazakh and Turkish languages, using machine learning techniques. The focus of the research is on the review and comparison of machine learning algorithms and methods that have accomplished results on various natural language processing tasks such as grammatical categories determination. For the operation of the LGP system, a dictionary is created in which a connector for each word is indicated – the type of connection that can be created using this word. The authors considered methods of filling in LGP dictionaries using machine learning. The complexities of natural language processing, however, do not exclude the possibility of identifying narrower tasks that can already be solved algorithmically: for example, determining parts of speech or splitting texts into logical groups. However, some features of natural languages significantly reduce the effectiveness of these solutions. Thus, taking into account all word forms for each word in the Kazakh and Turkish languages increases the complexity of text processing by an order of magnitudeДанное исследование направлено на определение частей речи казахского и турецкого языков в информационно-поисковой системе. Предлагаемые алгоритмы основаны на методах машинного обучения. В работе рассматривается двоичная классификация слов по частям речи. Мы решили использовать самые известные алгоритмы машинного обучения. В данной статье изучаются и рассматриваются следующие подходы и известные алгоритмы машинного обучения. Мы определили 7 словарей и отметили 135 миллионов слов на казахском языке и 9 словарей и 50 миллионов слов на турецком языке. Главной задачей, рассматриваемой в работе, является создание алгоритмов составления словарей так называемой системы синтаксического анализатора на основе грамматики связей (LGP), в частности для казахского и турецкого языков, с использованием методов машинного обучения. Основное внимание в исследовании уделяется анализу и сравнению алгоритмов и методов машинного обучения, которые дали результаты в различных задачах обработки естественного языка, таких как определение грамматических категорий. Для системы LGP создается словарь, в котором для каждого слова указывается связка – тип связки, которую можно создать с помощью этого слова. Авторами рассмотрены методы составления словарей LGP с использованием машинного обучения. Однако сложности обработки естественного языка не исключают возможности определения более узких задач, которые уже могут решаться алгоритмически: например, определение частей речи или разбиение текстов на логические группы. Впрочем некоторые особенности естественных языков значительно снижают эффективность этих решений. Таким образом, учет всех словоформ для каждого слова в казахском и турецком языках увеличивает сложность обработки текста на порядокДане дослідження спрямоване на визначення частин мови казахської та турецької мов в інформаційно-пошуковій системі. Запропоновані алгоритми засновані на методах машинного навчання. У роботі розглядається двійкова класифікація слів за частинами мови. Ми вирішили використовувати найвідоміші алгоритми машинного навчання. У даній статті вивчаються і розглядаються наступні підходи і відомі алгоритми машинного навчання. Ми визначили 7 словників і відзначили 135 мільйонів слів казахською мовою і 9 словників і 50 мільйонів слів турецькою мовою. Головним завданням, що розглядається в роботі, є створення алгоритмів складання словників так званої системи синтаксичного аналізатора на основі граматики зв'язків (LGP), зокрема казахської та турецької мов, з використанням методів машинного навчання. Основна увага в дослідженні приділяється аналізу та порівнянню алгоритмів і методів машинного навчання, які дали результати в різних задачах обробки природної мови, таких як визначення граматичних категорій. Для системи LGP створюється словник, в якому для кожного слова вказується зв'язка – тип зв'язки, яку можна створити за допомогою цього слова. Авторами розглянуто методи складання словників LGP з використанням машинного навчання. Однак складності обробки природної мови не виключають можливості визначення більш вузьких задач, які вже можуть вирішуватися алгоритмічно: наприклад, визначення частин мови або розбиття текстів на логічні групи. Втім деякі особливості природних мов значно знижують ефективність цих рішень. Таким чином, врахування всіх словоформ для кожного слова в казахській і турецькій мовах збільшує складність обробки тексту на порядо

ZENODO

Eastern-European Journal of Enterprise Technologies

Grammatical Categories Determination for Turkish and Kazakh Languages Based on Machine Learning Algorithms and Fulfilling Dictionaries of Link Grammar Parser

Author: Sakenov B. (Bakzhan)
Sambetbayeva M. (Madina)
Turdalyuly M. (Mussa)
Tussupova M. (Madina)
Yerimbetova A. (Aigerim)
Publication venue: PC TECHNOLOGY CENTER
Publication date: 01/01/2021
Field of study

Neliti

Continuous Speech Recognition of Kazakh Language

Author: Aigerim Duisenbayeva
Alimukhan Keylan
Bagher BabaAli
Bekturgan Akhmetov
Gulnaz Nabieva
Kuralay Mukhsina
Mussa Turdalyuly
Nurbapa Mekebayev
Оrken Mamyrbayev
Publication venue: 'EDP Sciences'
Publication date: 01/02/2019
Field of study

EDP Sciences OAI-PMH repository (1.2.0)