13 research outputs found

    Rancang Bangun Aplikasi MusicMoo dengan Metode MIR (Music Information Retrieval) pada Modul Mood, Genre Recognition, dan Tempo Estimation

    Full text link
    Saat ini, metode pemanggilan kembali informasi suatu musik atau yang sering disebut Music Information Retrieval (MIR) telah banyak diterapkan. Contohnya pada suatu aplikasi Shazam ataupun SounHound. Kedua aplikasi ini hanya menangani sebatas suatu lagu berjudul apakah ketika diperdengarkan. Untuk itu, tujuan penelitian ini adalah pengembangan lebih lanjut MIR yang lebih spesifik lagi, yaitu melakukan pemanggilan informasi lagu yang terkait kembali beserta detail lagu di antaranya adalah mood, genre, dan tempo lagu. Penelitian ini memakai ekstraksi fitur berbasis MPEG-7 yang oleh library Java bernama MPEG7AudioEnc. Hasil ekstraksi fitur ini berupa metadata dalam bentuk angka digital yang merepresentasikan karakteristik suatu sinyal pada tiap fiturnya. Setelah fitur didapatkan, tahap berikutnya adalah melakukan pengambilan suatu fitur sesuai dengan masing-masing modul dengan metode Xquery yang diimplementasikan oleh library Java bernama BaseX. Fitur yang diambil dipakai untuk proses pengolahan dengan Discrete Wavelet Transform (DWT) beserta level dekomposisi terbaik oleh library Python bernama Pywt. Setelah fitur-fitur diproses, maka dilakukan penggabungan fitur pada suatu list beserta penyamaan panjang fitur untuk proses klasifikasi. Tahap terakhir adalah melakukan klasifikasi dengan menggunakan Support Vector Machine (SVM). Terdiri dari 2 tahap yaitu tahap training dan prediksi. Hasil akurasi keberhasilan pada penelitian ini untuk modul mood 75%, genre 87,5% dan tempo 80%

    Evaluation of MPEG-7-based audio descriptors for animal voice recognition over wireless acoustic sensor networks

    Get PDF
    This article belongs to the Special Issue State-of-the-Art Sensors Technology in Spain 2015Environmental audio monitoring is a huge area of interest for biologists all over the world. This is why some audio monitoring system have been proposed in the literature, which can be classified into two different approaches: acquirement and compression of all audio patterns in order to send them as raw data to a main server; or specific recognition systems based on audio patterns. The first approach presents the drawback of a high amount of information to be stored in a main server. Moreover, this information requires a considerable amount of effort to be analyzed. The second approach has the drawback of its lack of scalability when new patterns need to be detected. To overcome these limitations, this paper proposes an environmental Wireless Acoustic Sensor Network architecture focused on use of generic descriptors based on an MPEG-7 standard. These descriptors demonstrate it to be suitable to be used in the recognition of different patterns, allowing a high scalability. The proposed parameters have been tested to recognize different behaviors of two anuran species that live in Spanish natural parks; the Epidalea calamita and the Alytes obstetricans toads, demonstrating to have a high classification performance.Consejería de Innovación, Ciencia y Empresa, Junta de Andalucía, Spain TIC-570

    Rancang Bangun Aplikasi MusicMoo dengan Metode MIR (Music Information Retrieval) pada Modul Mood, Genre Recognition, dan Tempo Estimation

    Full text link
    Saat ini, metode pemanggilan kembali informasi suatu musik atau yang sering disebut Music Information Retrieval (MIR) telah banyak diterapkan. Contohnya pada suatu aplikasi Shazam ataupun SounHound. Kedua aplikasi ini hanya menangani sebatas suatu lagu berjudul apakah ketika diperdengarkan. Untuk itu, tujuan penelitian ini adalah pengembangan lebih lanjut MIR yang lebih spesifik lagi, yaitu melakukan pemanggilan informasi lagu yang terkait kembali beserta detail lagu di antaranya adalah mood, genre, dan tempo lagu. Penelitian ini memakai ekstraksi fitur berbasis MPEG-7 yang oleh library Java bernama MPEG7AudioEnc. Hasil ekstraksi fitur ini berupa metadata dalam bentuk angka digital yang merepresentasikan karakteristik suatu sinyal pada tiap fiturnya. Setelah fitur didapatkan, tahap berikutnya adalah melakukan pengambilan suatu fitur sesuai dengan masing-masing modul dengan metode Xquery yang diimplementasikan oleh library Java bernama BaseX. Fitur yang diambil dipakai untuk proses pengolahan dengan Discrete Wavelet Transform (DWT) beserta level dekomposisi terbaik oleh library Python bernama Pywt. Setelah fitur-fitur diproses, maka dilakukan penggabungan fitur pada suatu list beserta penyamaan panjang fitur untuk proses klasifikasi. Tahap terakhir adalah melakukan klasifikasi dengan menggunakan Support Vector Machine (SVM). Terdiri dari 2 tahap yaitu tahap training dan prediksi. Hasil akurasi keberhasilan pada penelitian ini untuk modul mood 75%, genre 87,5% dan tempo 80%

    Rancang Bangun Aplikasi MusicMoo Dengan Metode MIR (Music Information Retrieval) Pada Modul Mood, Genre Recognition, dan Tempo Estimation

    Get PDF
    Saat ini,metode pemanggilan kembali informasi suatu musik atau yang sering disebut Music Information Retrieval (MIR) telah banyak diterapkan. Contohnya adalah pada suatu aplikasi Shazam ataupun Soundhound. Tetapi kedua aplikasi ini hanya menangani sebatas lagu apakah yang terkait ketika diperdengarkan. Untuk itu, tujuan penelitian ini adalah pengembangan lebih lanjut MIR yang lebih spesifik lagi, yaitu melakukan pemanggilan informasi lagu yang terkait kembali beserta detail lagu di antaranya adalah mood, genre, dan tempo lagu. Penelitian ini memakai ekstraksi fitur berbasis MPEG-7 yang oleh library Java bernama MPEG7AudioEnc. Hasil ekstraksi fiur ini berupa metadata yang terkandung fitur-fitur dalam bentuk angka digital yang merepresentasikan karakteristik suatu sinyal. Lalu melakukan pengambilan suatu fitur sesuai dengan masing-masing dengan metode Xquery yang diimplementasikan oleh library Java bernama BaseX. Fitur yang diambil akan diproses dengan melakukan Discrete Wavelet Transform (DWT) beserta level dekomposisi terbaik oleh library Python bernama Pywt. Setelah fitur-fitur dilakukan DWT, maka dilakukan penggabungan fitur pada suatu list beserta penyamaan panjang fitur untuk proses klasifikasi. Tahap terakhir adalah melakukan klasifikasi dengan menggunakan Support Vector Machine (SVM). Terdiri dari 2 tahap yaitu tahap training dan prediksi. Hasil akurasi keberhasilan pada penelitian ini untuk modul mood 75%, genre 87,5% dan tempo 80%

    Multi-Sensory Emotion Recognition with Speech and Facial Expression

    Get PDF
    Emotion plays an important role in human beings’ daily lives. Understanding emotions and recognizing how to react to others’ feelings are fundamental to engaging in successful social interactions. Currently, emotion recognition is not only significant in human beings’ daily lives, but also a hot topic in academic research, as new techniques such as emotion recognition from speech context inspires us as to how emotions are related to the content we are uttering. The demand and importance of emotion recognition have highly increased in many applications in recent years, such as video games, human computer interaction, cognitive computing, and affective computing. Emotion recognition can be done from many sources including text, speech, hand, and body gesture as well as facial expression. Presently, most of the emotion recognition methods only use one of these sources. The emotion of human beings changes every second and using a single way to process the emotion recognition may not reflect the emotion correctly. This research is motivated by the desire to understand and evaluate human beings’ emotion from multiple ways such as speech and facial expressions. In this dissertation, multi-sensory emotion recognition has been exploited. The proposed framework can recognize emotion from speech, facial expression, and both of them. There are three important parts in the design of the system: the facial emotion recognizer, the speech emotion recognizer, and the information fusion. The information fusion part uses the results from the speech emotion recognition and facial emotion recognition. Then, a novel weighted method is used to integrate the results, and a final decision of the emotion is given after the fusion. The experiments show that with the weighted fusion methods, the accuracy can be improved to an average of 3.66% compared to fusion without adding weight. The improvement of the recognition rate can reach 18.27% and 5.66% compared to the speech emotion recognition and facial expression recognition, respectively. By improving the emotion recognition accuracy, the proposed multi-sensory emotion recognition system can help to improve the naturalness of human computer interaction

    Multi-Sensory Emotion Recognition with Speech and Facial Expression

    Get PDF
    Emotion plays an important role in human beings’ daily lives. Understanding emotions and recognizing how to react to others’ feelings are fundamental to engaging in successful social interactions. Currently, emotion recognition is not only significant in human beings’ daily lives, but also a hot topic in academic research, as new techniques such as emotion recognition from speech context inspires us as to how emotions are related to the content we are uttering. The demand and importance of emotion recognition have highly increased in many applications in recent years, such as video games, human computer interaction, cognitive computing, and affective computing. Emotion recognition can be done from many sources including text, speech, hand, and body gesture as well as facial expression. Presently, most of the emotion recognition methods only use one of these sources. The emotion of human beings changes every second and using a single way to process the emotion recognition may not reflect the emotion correctly. This research is motivated by the desire to understand and evaluate human beings’ emotion from multiple ways such as speech and facial expressions. In this dissertation, multi-sensory emotion recognition has been exploited. The proposed framework can recognize emotion from speech, facial expression, and both of them. There are three important parts in the design of the system: the facial emotion recognizer, the speech emotion recognizer, and the information fusion. The information fusion part uses the results from the speech emotion recognition and facial emotion recognition. Then, a novel weighted method is used to integrate the results, and a final decision of the emotion is given after the fusion. The experiments show that with the weighted fusion methods, the accuracy can be improved to an average of 3.66% compared to fusion without adding weight. The improvement of the recognition rate can reach 18.27% and 5.66% compared to the speech emotion recognition and facial expression recognition, respectively. By improving the emotion recognition accuracy, the proposed multi-sensory emotion recognition system can help to improve the naturalness of human computer interaction

    Rancang Bangun Aplikasi Musicmoo Dengan Metode MIR (Music Information Retrieval) Pada Modul Mood, Genre Recognition, Dan Tempo Estimation

    Get PDF
    Saat ini, metode pemanggilan kembali informasi suatu musik atau yang sering disebut Music Information Retrieval (MIR) telah banyak diterapkan. Contohnya adalah pada suatu aplikasi Shazam ataupun Soundhound. Kedua aplikasi ini hanya mampu menangani sebatas deteksi judul lagu apa ketika diperdengarkan suatu musik. Untuk itu, Tugas Akhir ini adalah pengembangan lebih lanjut MIR yang lebih spesifik lagi, yaitu melakukan pemanggilan informasi lagu yang terkait kembali beserta detail lagu di antaranya adalah mood, genre, dan tempo lagu. Tujuan Tugas Akhir ini yaitu untuk meningkatkan pengetahuan akan informasi suatu lagu dan bisa juga untuk deteksi plagiarisme karya lagu seseorang. Langkah pertama yang dilakukan adalah melakukan ekstraksi fitur audio berbasis MPEG-7 dengan library Java bernama MPEG7AudioEnc. Hasil dari ekstraksi fitur ini berupa metadata dengan XML yang di dalamnya terdapat fitur-fitur dalam bentuk angka-angka digital yang merepresentasikan karakteristik suatu sinyal. Kedua, melakukan pemilihan fitur yang dipakai dan diambil menggunakan Xquery untuk dilakukan proses sesuai dengan masing-masing modul. Pemrosesan pada fitur-fitur yang dipilih adalah melakukan Discrete Wavelet Transform (DWT) beserta level dekomposisi terbaik menggunakan library pywt. Setelah fitur-fitur dilakukan DWT, maka dilakukan penggabungan fitur ke suatu list beserta penyamaan panjang fitur untuk proses klasifikasi. Tahap terakhir adalah melakukan klasifikasi menggunakan Support Vector Machine (SVM). Tahapan proses SVM terdiri dari 2 tahap yaitu tahap training dan prediksi. Tahap training adalah melakukan pembelajaran karakteristik sinyal sesuai dengan label yang dimaksud, sedangkan tahap prediksi adalah memprediksi data baru yang belum diketahui dan akan memberikan suatu penilaian sesuai dengan tahap training. Hasilnya adalah detail informasi mood, genre, tempo pada suatu lagu berdasarkan karakteristik sinyal. ==================================================================================== Currently, the method of recalling information a music or often called Music Information Retrieval (MIR) has been widely applied. For example are application Shazam or Soundhound. Both of these applications can only handle detection of music titles when played some music. Therefore, this Final Project is a further development of more specific MIR, ie to call back the track information associated with specific songs among which are the mood, genre, and tempo of the music. The purpose of this Final Project is to improve the knowledge and information of a music and track music for plagiarism detection. The first step taken is extracting audio features based on MPEG-7 with a Java library named MPEG7AudioEnc. The results of this feature extraction are metadata XML which are includes digital numbers that represent the characteristics of a signal. Second, do a selection of used features and retrieved using XQuery to do the process in accordance with each module. Processing on features selected is doing Discrete Wavelet Transform (DWT) and their the best decomposition level using Python library named pywt. After doing DWT for selected features, then do a merger of features on a list along with a feature length equation for the classification process. The last step is a classification using Support Vector Machine (SVM). SVM stage of the process consists of two phases: training and prediction. The training phase is doing the learning signal characteristics in accordance with the corresponding label, while the prediction phase is predicting new data is not yet known and will provide an assessment in accordance with the training phase. The results are a detail of information mood, genre, tempo on a music based on the signal characteristics

    Analysis and automatic identification of spontaneous emotions in speech from human-human and human-machine communication

    Get PDF
    383 p.This research mainly focuses on improving our understanding of human-human and human-machineinteractions by analysing paricipants¿ emotional status. For this purpose, we have developed andenhanced Speech Emotion Recognition (SER) systems for both interactions in real-life scenarios,explicitly emphasising the Spanish language. In this framework, we have conducted an in-depth analysisof how humans express emotions using speech when communicating with other persons or machines inactual situations. Thus, we have analysed and studied the way in which emotional information isexpressed in a variety of true-to-life environments, which is a crucial aspect for the development of SERsystems. This study aimed to comprehensively understand the challenge we wanted to address:identifying emotional information on speech using machine learning technologies. Neural networks havebeen demonstrated to be adequate tools for identifying events in speech and language. Most of themaimed to make local comparisons between some specific aspects; thus, the experimental conditions weretailored to each particular analysis. The experiments across different articles (from P1 to P19) are hardlycomparable due to our continuous learning of dealing with the difficult task of identifying emotions inspeech. In order to make a fair comparison, additional unpublished results are presented in the Appendix.These experiments were carried out under identical and rigorous conditions. This general comparisonoffers an overview of the advantages and disadvantages of the different methodologies for the automaticrecognition of emotions in speech

    A survey of the application of soft computing to investment and financial trading

    Get PDF

    Machine Learning for Auditory Hierarchy

    Get PDF
    Coleman, W. (2021). Machine Learning for Auditory Hierarchy. This dissertation is submitted for the degree of Doctor of Philosophy, Technological University Dublin. Audio content is predominantly delivered in a stereo audio file of a static, pre-formed mix. The content creator makes volume, position and effects decisions, generally for presentation in stereo speakers, but has no control ultimately over how the content will be consumed. This leads to poor listener experience when, for example, a feature film is mixed such that the dialogue is at a low level relative to the sound effects. Consumers can complain that they must turn the volume up to hear the words, but back down again because the effects levels are too loud. Addressing this problem requires a television mix optimised for the stereo speakers used in the vast majority of homes, which is not always available
    corecore