17 research outputs found

    RELEVANCE OF THE TYPES AND THE STATISTICAL PROPERTIES OF FEATURES IN THE RECOGNITION OF BASIC EMOTIONS IN SPEECH

    Get PDF
    Due to the advance of speech technologies and their increasing usage in various applications, automatic recognition of emotions in speech represents one of the emerging fields in human-computer interaction. This paper deals with several topics related to automatic emotional speech recognition, most notably with the improvement of recognition accuracy by lowering the dimensionality of the feature space and evaluation of the relevance of particular feature types. The research is focused on the classification of emotional speech into five basic emotional classes (anger, joy, fear, sadness and neutral speech) using a recorded corpus of emotional speech in Serbian

    Word recognition in speach audiometry

    Get PDF
    It was noticed that the standard set of words used for speech audiometry contained some archaic words, as well as words which were much more difficult to understand out of context. The first aim of this paper is to determine the words which are significantly easier or more difficult to recognize than the rest in speech audiometry at the ENT Clinic in Novi Sad (we have dedicated more attention to the incorrectly recognized words), as well as their distribution across sets containing 10 words which are used during one measurement. The second aim of the paper is to account for the errors from the point of view of linguistics and medicine. The results that we have analyzed belong to different intensity levels (5-80 dB and 25-40 dB). The research participants were 66 patients suffering from multiple sclerosis. The study has shown that there are 14 words (out of 160) whose recognition accuracy is significantly worse than that of the other words in their 10-word group. Most poorly recognized words constitute minimal pairs with some other words, and most of these words contains plosives. Even though consonants cause a higher number of errors, hearing-impaired patients sometimes misunderstand and therefore mispronounced vowel segments as well, e.g. the vowel /i/ is replaced with the vowel /u/. Another important factor which influences perception is the part of speech ā€“ nouns, adjectives and adverbs are identified more easily that other parts of speech.Uočeno je da se u standardnom setu reči za govornu audiometriju nalaze neke arhaične reči, kao i reči koje je mnogo teže razumeti bez konteksta. Prvi cilj ovog rada jeste da odredi reči koje se znatno lak- Å”e ili teže prepoznaju od ostalih u govornoj audiometriji na ORL klinici u Novom Sadu (veću pažnju posvetili smo pogreÅ”no prepozna- vanim rečima), kao i njihovu raspodelu po setovima od po 10 reči koje se koriste pri jednom merenju. Drugi cilj rada je da objasni greÅ”ke sa lingvističkog i medicinskog aspekta. Rezultati koje smo analizira- li pripadaju različitim nivoima intenziteta (5-80 dB i 25-40 dB). U istraživanju je učestvovalo 66 pacijenata obolelih od multiple skle- roze. Istraživanje je pokazalo da postoji 14 reči (od ukupno 160) čija je tačnost prepoznavanja znatno loÅ”ija od tačnosti drugih reči u nji- hovoj grupi od 10 reči. Većina loÅ”e prepoznavanih reči čini minimalne parove sa nekim drugim rečima, a većina ovih reči sadrži plozive. Iako konsonanti uzrokuju veći broj greÅ”aka, ispitanici sa oÅ”tećenjem slu- ha mogu pogreÅ”no razumeti pa izgovoriti i vokalske segmente, npr. vokal /i/ zamenjuju vokalom /u/. JoÅ” jedan bitan faktor koji utiče na percepciju jeste vrsta reči ā€“ imenice, pridevi i prilozi identifikuju se lakÅ”e od drugih vrsta reči

    Cross-Lingual Neural Network Speech Synthesis Based on Multiple Embeddings

    Get PDF
    The paper presents a novel architecture and method for speech synthesis in multiple languages, in voices of multiple speakers and in multiple speaking styles, even in cases when speech from a particular speaker in the target language was not present in the training data. The method is based on the application of neural network embedding to combinations of speaker and style IDs, but also to phones in particular phonetic contexts, without any prior linguistic knowledge on their phonetic properties. This enables the network not only to efficiently capture similarities and differences between speakers and speaking styles, but to establish appropriate relationships between phones belonging to different languages, and ultimately to produce synthetic speech in the voice of a certain speaker in a language that he/she has never spoken. The validity of the proposed approach has been confirmed through experiments with models trained on speech corpora of American English and Mexican Spanish. It has also been shown that the proposed approach supports the use of neural vocoders, i.e. that they are able to produce synthesized speech of good quality even in languages that they were not trained on

    USER-AWARENESS AND ADAPTATION IN CONVERSATIONAL AGENTS

    Get PDF
    This paper considers the research question of developing user-aware and adaptive conversational agents. The conversational agent is a system which is user-aware to the extent that it recognizes the user identity and his/her emotional states that are relevant in a given interaction domain. The conversational agent is user-adaptive to the extent that it dynamically adapts its dialogue behavior according to the user and his/her emotional state. The paper summarizes some aspects of our previous work and presents work-in-progress in the field of speech-based human-machine interaction. It focuses particularly on the development of speech recognition modules in cooperation with both modules for emotion recognition and speaker recognition, as well as the dialogue management module. Finally, it proposes an architecture of a conversational agent that integrates those modules and improves each of them based on some kind of synergies among themselves

    Influence of genotype, year and locations on yield, oil and protein content of soybean - Glycine max (L.) Merr.

    Get PDF
    Tijekom 2017. i 2018. godine provedena su poljska istraživanja o utjecaju genotipa, godine, lokacije te interakcija na prinos, udio ulja i proteina u soji. U pokus je uvrÅ”teno dvadeset najzastupljenijih genotipova soje koji u strukturi sjetve zauzimaju 75% sjetvenih povrÅ”ina. Pokus je postavljen na lokacijama Osijek i Kutjevo u dva ponavljanja po slučajnom blok-rasporedu. U 2018. godini ostvaren je prosječno veći prinos zrna, udio ulja i proteina prvenstveno zbog pravilnog rasporeda oborina. Lokacija Osijek u svim godinama istraživanja imala je prosječno veće prinose zrna, udio ulja i proteina. Prema dobivenim rezultatima analize varijance za genotip, interakciju genotipa x lokacija i genotip x godina dobivene su statistički visoko opravdane razlike (P<0,01) za prinos zrna. Za genotip i interakciju genotip x godina dobivene su statistički opravdane razlike (P<0,05) za udio ulja i proteina. Dobiveni rezultati istraživanja doprinijet će pravilnom izboru genotipova ovisno o namjeni proizvodnje kako bi se iskoristio genetski potencijal genotipa koji je najpogodniji za određenu lokaciju.During the years 2017 and 2018, field studies were carried out on the impact of year, location and interaction of genotype x location and genotype x year on yield, oil and protein content of soybean seed. The experiment included twenty most common soybean genotypes of different maturation groups, which have a 75% share in the sowing structure. The experiment was set up on locations Osijek and Kutjevo, in two repetitions, in a randomized block design. In 2018, an average higher seed yield, oil content and protein content were achieved primarily due to proper distribution of rainfall. Location Osijek in years of research had an average higher seed yield, and oil and protein content. According to the obtained results of the variance analysis for enotype, interaction of genotype x location and genotype x year, statistically highly justified differences (P<0.01) were obtained for seed yield. For genotype and interaction of genotype x year, statistically justified differences (P<0.05) were obtained for the oil and protein content. Research results will contribute to the proper selection of genotypes depending on the purpose of production in order to exploit the genetic potential of the genotype which is most suitable for a particular location

    Humanoid robot Marko - an assistant in therapy for children

    Get PDF
    This paper reports on work in progress towards development of a robot to be used as assistive technology in treatment of children with developmental disorders (cerebral palsy). This work integrates two activities. The first one is mechanical device design (humanoid robot) of sufficient capabilities for demonstration of therapeutical exercises for habilitation of gross and fine motor functions and for acquiring spatial relationships. The second one is design of appropriate communication capabilities of the robot. The basic therapeutical role of the robot is to motivate children to practice therapy harder and longer. To achieve this, robot must fulfil two requirements: it must have appropriate appearance to be able to establish affective attachment of the child to the robot, and must be able to communicate with children verbally (speech recognition and synthesis,) and non-verbally (facial expressions, gestures...). Thus, conversational abilities are unavoidable and among the most important capabilities. In short, robot should be able to manage three-party natural language conversation ā€“ between the child, the therapist and the robot ā€“ in clinical settings

    Automatic Emotion Recognition in Speech: Possibilities and Significance

    No full text
    Automatic speech recognition and spoken language understanding are crucial steps towards a natural humanmachine interaction. The main task of the speech communication process is the recognition of the word sequence, but the recognition of prosody, emotion and stress tags may be of particular importance as well. This paper discusses thepossibilities of recognition emotion from speech signal in order to improve ASR, and also provides the analysis of acoustic features that can be used for the detection of speakerā€™s emotion and stress. The paper also provides a short overview of emotion and stress classification techniques. The importance and place of emotional speech recognition is shown in the domain of human-computer interactive systems and transaction communication model. The directions for future work are given at the end of this work

    UDC 621.391:004.4, DOI:10.2298/CSIS090710007B QoS Testing In a Live Private IP MPLS Network with CoS Implemented

    No full text
    Abstract. This paper describes a testing conducted on a private IP/MPLS network of a Telecom operator during service introduction. We have applied DiffServ and E-LSP policies for bandwidth allocation for predefined classes of service (voice, video, data and VPN). We used a traffic generator to create the worst possible situations during the testing, and measured QoS for individual services. UML considerations about NGN structure and packet networks traffic testing are also presented using the deployment, class and state diagrams. Testing results are given in tabular and graphical forms, and the conclusions derived will be subsequently used as a basis for defining the stochastic traffic generator/simulator

    A comparison of multi-style DNN-based TTS approaches using small datasets

    No full text
    Studies have shown that people already perceive the interaction with computers, robots and media in the same way as they perceive social communication with other people. For that reason it is critical for a high-quality text-to-speech system (TTS) to sound as human-like as possible. However, a major obstacle in creating expressive TTS voices is that the amount of style-specific speech needed for training such a system is often not sufficient. This paper presents a comparison between different approaches to multi-style TTS, with focus on cases when only a small dataset per style is available. The described approaches have been originally proposed for efficient modelling of multiple speakers with a limited amount of data per speaker. Among the suggested approaches the approach based on style codes has emerged as the best, regardless of the target speech style
    corecore