    Klasifikasi Suara Tangisan Bayi Berdasarkan Prosodic Features Menggunakan Metode Moments of Distribution dan K-Nearest Neighbours

    Bagi sebagian orang, suara tangisan bayi terdengar sangat mengganggu, apalagi jika tangisannya berlarut-larut. Sulit untuk dimengerti arti dari suara tangisan bayi. Di era teknologi informasi, pengenalan suara tangisan bayi dapat dilakukan secara otomatis menggunakan komputer. Hal tersebut tentu dapat membantu bagi orang tua untuk mengenali kebutuhan bayi agar dapat segera tenang. Untuk mengidentifikasi suara tangisan bayi dapat menggunakan salah satu algoritma klasifikasi di bidang Machine Learning, salah satunya adalah algoritma K-Nearest Neighbour. Langkah pertama untuk melakukan klasifikasi suara tangisan bayi, yakni data audio suara tangisan bayi diubah menjadi data numerik yang disebut proses ekstraksi fitur yang menghasilkan Prosodic Features. Setelah melewati proses ekstraksi fitur perlu dilakukan identifikasi pola untuk mendapatkan perbedaan pola  antara satu data suara tangisan bayi dengan data suara tangisan bayi yang lain menggunakan Metode Moment of Dsitribution. Pengenalan suara tangisan bayi dilakukan dengan menerapkan algoritma klasifikasi menggunakan K-Nearest Neighbour. Akurasi terbaik pada proses klasifikasi menggunakan data sampling Percentage Rate yaitu 76% dimana nilai K yang digunakan adalah 9. Sedangkan akurasi terbaik pada proses klasifikasi menggunakan data sampling Leave One Out yaitu 42% dengan nilai K yang digunakan adalah 5


    Penelitian pengenalan emosi berbasis suara telah banyak dilakukan dalam dunia industri. Salah satunya adalah industri telekomunikasi. Turun naiknya jumlah pelanggan atau disebut juga dengan Customer Churn merupakan salah satu permasalahan pada industri telekomunikasi. Penelitian ini membahas mengenai suatu cara untuk memprediksi churn pelanggan yang memanfaatkan proses pengenalan emosi melalui suara yang merujuk pada data panggilan pelanggan salah satu perusahaan telekomunikasi di periode tertentu. Prediksi churn didasari pada pengenalan empat jenis emosi yaitu senang, marah, sedih, dan takut. Metode prediksi customer churn menggunakan Bayesian Belief Network dengan tahapan meliputi penyiapan data, klasifikasi data, pembuatan diagram sebab-akibat, dan pembuatan Bayesian Belief Network. Metode Bayesian Belief Network akan digunakan dalam melakukan prediksi churn yang diambil dari data panggilan pelanggan telekomunikasi pada periode tertentu dalam empat jenis emosi senang, marah, sedih, dan takut, seluruh data tersebut dikelompokkan terlebih dahulu, dilakukan analisis secara terpisah antara fitur pengaruh churn dan fitur suara emosi. Hasil penelitian menunjukkan bahwa pemanfaatan klasifikasi suara emosi manusia sebagai variabel dalam prediksi churn dapat memberikan hasil prediksi pada BBN dengan nilai churn 60% dan tidak churn 40%. Kata Kunci : Prediksi Customer Churn, Suara Emosi Manusia, Bayesian Belief Networ

    Corpora compilation for prosody-informed speech processing

    Research on speech technologies necessitates spoken data, which is usually obtained through read recorded speech, and specifically adapted to the research needs. When the aim is to deal with the prosody involved in speech, the available data must reflect natural and conversational speech, which is usually costly and difficult to get. This paper presents a machine learning-oriented toolkit for collecting, handling, and visualization of speech data, using prosodic heuristic. We present two corpora resulting from these methodologies: PANTED corpus, containing 250 h of English speech from TED Talks, and Heroes corpus containing 8 h of parallel English and Spanish movie speech. We demonstrate their use in two deep learning-based applications: punctuation restoration and machine translation. The presented corpora are freely available to the research community

    Multimodality in Group Communication Research

    Team interactions are often multisensory, requiring members to pick up on verbal, visual, spatial and body language cues. Multimodal research, research that captures multiple modes of communication such as audio and visual signals, is therefore integral to understanding these multisensory group communication processes. This type of research has gained traction in biomedical engineering and neuroscience, but it is unclear the extent to which communication and management researchers conduct multimodal research. Our study finds that despite its' utility, multimodal research is underutilized in the communication and management literature's. This paper then covers introductory guidelines for creating new multimodal research including considerations for sensors, data integration and ethical considerations.Comment: 27 pages, 3 figure

    The Prosody of Uncertainty for Spoken Dialogue Intelligent Tutoring Systems

    The speech medium is more than an audio conveyance of word strings. It contains meta information about the content of the speech. The prosody of speech, pauses and intonation, adds an extra dimension of diagnostic information about the quality of a speaker\u27s answers, suggesting an important avenue of research for spoken dialogue tutoring systems. Tutoring systems that are sensitive to such cues may employ different tutoring strategies based on detected student uncertainty, and they may be able to perform more precise assessment of the area of student difficulty. However, properly identifying the cues can be challenging, typically requiring thousands of hand labeled utterances for training in machine learning. This study proposes and explores means of exploiting alternate automatically generated information, utterance correctness and the amount of practice a student has had, as indicators of student uncertainty. It finds correlations with various prosodic features and these automatic indicators and compares the result with a small set of annotated utterances, and finally demonstrates a Bayesian classifier based on correctness scores as class labels

    Extracting and using prosodic information for Turkish spoken language processing

    Bu projede genel olarak, konuşulan dili (Türkçe) anlamada, konuşulan dilin bürünsel/ezgisel (prosodic) ve sözcüksel (lexical) özelliklerinin ortaya çıkarılması ve bu özelliklerin konuşulan dilin bilgisayarla otomatik olarak işlenmesinde kullanılması amaçlanmaktadır. Bu daha özel olarak, otomatik konuşma tanıyıcısının (ASR) çıkışına ilişkin cümle bölütleme işlevini içermektedir. Otomatik konuşma tanıma sistemlerinden çıkan yazılı metnin özellikle noktalama (punctuation), büyük küçük harf farklılıkları ve vurgu, tonlama, perde, durak gibi konuşmaya ilişkin temel bazı parametrelerden yoksun olması veya bu özellikleri kaybetmiş olması, özellikle anlamda farklılıklara yol açmaktadır. Bu çıktının zenginleştirilmesi (enrichment) başka bir deyiş ile bu özelliklerin tekrar geriye kazandırılması, bu metinlerin hem insanlar tarafından okunmasını ve doğru algılanmasını hem de makineler tarafından işlenmesini kolaylaştıracaktır. Bu projedeki amaç, bu zenginleştirme ve geri kazandırım işleminin dilin bürünsel özelliklerinden yararlanarak yapılmasıdır.The text which the output of the Automatic Speech Recognition (ASR) system lacks especially punctuation, differences in the capitalization and the parameters related to the speaking such as stress, tone, pitch, pause cause some differences in the meaning. Enrichment of this output or another words to gain this features back to the output will provide either reading and understanding of the humans or processing of the machines easily. The aim of this project is doing this enrichment and the process of gaining back by using the prosodic features of the spoken language. In this proposal, we would like to examine the extraction and use of prosodic information in addition to lexical features for spoken language processing of Turkish. Specifically, we would like to research the use of prosodic features for sentence segmentation of Turkish speech. Another outcome of the project is to obtain a database of prosodic features at the word and morpheme level, which can be used for other purposes such as morphological disambiguation or word sense disambiguation. Turkish is an agglutinative language. Thus, the text should be analyzed morphologically in order to determine the root forms and the suffixes of the words before further analysis. In the framework of this project, we also would like to examine the interaction of prosodic features with morphological information. The role of sentence segmentation is to detect sentence boundaries in the stream of words provided by the ASR module for further downstream processing. This is helpful for various language processing tasks, such as parsing, machine translation and question answering. We formulate sentence segmentation as a binary classification task. For each position between two consecutive words the system must decide if the position marks a boundary between two sentences or if the two neighboring words belong to the same sentence. The sentence segmentation process is established by combining the Hidden Event Language Models (HELMs) with discriminative classification methods. The HELM takes into account the sequence of words and the output discriminative classification methods such as decision tree that is based on prosodic features such as pause durations. The new approach combines the HELMs for exploiting lexical information, with maximum entropy and boosting classifiers that tightly integrate lexical, as well as prosodic, speaker change and syntactic features. The boostingbased classifier alone performs better than all the other classification schemes. When combined with a hidden event language model the improvement is even more pronounced.Publisher's Versio

    Bürünsel, sözcüksel ve biçimbilgisel bilgiyi kullanan co-training ile Türkçe konuşma dilinin otomatik cümle bölütlemesi

    Co-training, web sayfası sınıflandırması, kelime anlam açıklaştırma ve adlandırılmış varlık tanıma gibi pek çok sınıflandırma işlevinde başarı ile kullanılan oldukça etkili bir makine öğrenme algoritmasıdır. Co-training, elle etiketlenmiş eğitim veri setine, etiketlenmemiş büyük miktarlardaki veriyi belirli miktarlarda etiketleyerek katmak suretiyle öğreticili öğrenme algoritmalarının performansını arttıran bir yarı öğreticili öğrenme metodudur. Co-training algoritmaları etiketlenmiş giriş verisine ilişkin farklı bakışlar üzerinde eğitilmiş iki veya daha fazla sınıflandırıcının üretilmesi ve daha sonra bu sınıflandırıcıların etiketlenmemiş veriyi ayrı ayrı etiketlemesi için kullanıldığı algoritmalardır. Otomatik olarak en güvenilir biçimde etiketlenmiş örnekler daha sonra insanlar tarafından elle etiketlenmiş veriye katılmaktadır. Bu işlem pekçok defa devam ettirilmektedir. Bu projede konuşma verisine ilişkin bürünsel, sözcüksel ve biçimbilgisel bilgilerin bakış olarak kullanıldığı co-training ile cümle bölütlemenin gerçekleştirilmesi ele alınmıştır. Cümle Bölütleme işlevi standart konuşma tanıyıcılarının çıkışından elde edilen işlenmemiş kelime dizisi biçimindeki veriyi zenginleştirmeyi amaçlayan bir işlemdir. Bu işlemin rolü, kelime dizisi biçiminde olan verinin cümle ünitelerine ayrılmasını sağlamaktır. Cümle Bölütleme konuşma anlamaya kadar olan süreçte ilk adımdır. Cümle bölütleme işlevi, çözümleme, makine çevirimi, bilgi çıkarımı gibi cümle bölütlemenin yapıldığının varsayıldığı konuşma işlemenin daha ileri uygulamaları için bir ön adım olarak gerçekleştirilmektedir. Cümle sınırları belirlendikten sonra bu cümleler üzerinde daha ileri düzeydeki sözdizimsel ve/veya anlamsal analizler gerçekleştirilebilmektedir. Bu projede konuşma özellikleri (bürünsel, sözcüksel ve biçimbilgisel) ayrışık ve doğal özellik seti olarak ele alınmış ve bu özellik setlerinin co-training algoritması ile kullanılması ile baseline sistemin performansının arttırılmasına çalışılmıştır. Ayrıca, co-training için uzlaşma ve uzlaşmama adı verilen farklı öğrenme stratejileri de araştırılmıştır. Buna ek olarak, self-combined adını verdiğimiz ve kendi kendine eğitme ile co-training yaklaşımlarının bir araya getirildiği bir yaklaşım da öne sürülmüştür.Co-training is a very effective machine learning technique that has been used successfully in several classification tasks like web page classification, word sense disambiguation, and named entity recognition. Co-training is a semi-supervised learning method that aims to improve performance of a supervised learning algorithm by incorporating large amounts of unlabeled data into the training data set. Co-training algorithms work by generating two or more classifiers trained on different views of the input labeled data that are then used to label the unlabeled data separately. The most confidently labeled examples of the automatically labeled data can then be added to the set of manually labeled data. The process may continue for several iterations. In this project, we have described the application of the co-training method for sentence segmentation where we used the prosodic, lexical and morphological information as the views of the data. Sentence segmentation from speech is part of a process that aims at enriching the unstructured stream of words that are the output of standard speech recognizers. Its role is to find the sentence units in this stream of words. Sentence segmentation is a preliminary step toward speech understanding. It is of particular importance for speech related applications, as most of the further processing steps, such as parsing, machine translation and information extraction, assume the presence of sentence boundaries. In this project, we consider the speech features (prosodic, lexical and morphological) as disjoint and natural feature sets or views and we try to improve performance of the baseline by using these feature sets with the co-training algorithm, Furthermore we have tried to investigate the different learning strategies for the co-training such as agreement and disagreement. In addition to these strategies it has been proposed that a new approach that we called self-combined which is the mixed version of the self-training and co-training approaches.TÜBİTA

    Extraction and comparison of various prosodic feature sets on sentence segmentation task for Turkish broadcast news data

    In this work, prosodic features of the Turkish Broadcast News (BN) data are extracted using an open source prosodic feature extraction tool based on Praat. The profiles and effectiveness of these features are also investigated for the sentence segmentation task on the Turkish BN data. We not only used some combinations of the feature sets but also collected some of them in one prosodic feature model in order to achieve one of the best performance. The results of the experiments show that some combinations of the prosodic feature sets are very useful for the automatic sentence segmentation task on the Turkish BN data.Publisher's Versio