1,160 research outputs found

    A Review on MAS-Based Sentiment and Stress Analysis User-Guiding and Risk-Prevention Systems in Social Network Analysis

    Full text link
    [EN] In the current world we live immersed in online applications, being one of the most present of them Social Network Sites (SNSs), and different issues arise from this interaction. Therefore, there is a need for research that addresses the potential issues born from the increasing user interaction when navigating. For this reason, in this survey we explore works in the line of prevention of risks that can arise from social interaction in online environments, focusing on works using Multi-Agent System (MAS) technologies. For being able to assess what techniques are available for prevention, works in the detection of sentiment polarity and stress levels of users in SNSs will be reviewed. We review with special attention works using MAS technologies for user recommendation and guiding. Through the analysis of previous approaches on detection of the user state and risk prevention in SNSs we elaborate potential future lines of work that might lead to future applications where users can navigate and interact between each other in a more safe way.This work was funded by the project TIN2017-89156-R of the Spanish government.Aguado-Sarrió, G.; Julian Inglada, VJ.; García-Fornes, A.; Espinosa Minguet, AR. (2020). A Review on MAS-Based Sentiment and Stress Analysis User-Guiding and Risk-Prevention Systems in Social Network Analysis. Applied Sciences. 10(19):1-29. https://doi.org/10.3390/app10196746S1291019Vanderhoven, E., Schellens, T., Vanderlinde, R., & Valcke, M. (2015). Developing educational materials about risks on social network sites: a design based research approach. Educational Technology Research and Development, 64(3), 459-480. doi:10.1007/s11423-015-9415-4Teens and ICT: Risks and Opportunities. Belgium: TIRO http://www.belspo.be/belspo/fedra/proj.asp?l=en&COD=TA/00/08Risks and Safety on the Internet: The Perspective of European Children: Full Findings and Policy Implications From the EU Kids Online Survey of 9–16 Year Olds and Their Parents in 25 Countries http://eprints.lse.ac.uk/33731/Vanderhoven, E., Schellens, T., & Valcke, M. (2014). Educating teens about the risks on social network sites. An intervention study in Secondary Education. Comunicar, 22(43), 123-132. doi:10.3916/c43-2014-12Christofides, E., Muise, A., & Desmarais, S. (2012). Risky Disclosures on Facebook. Journal of Adolescent Research, 27(6), 714-731. doi:10.1177/0743558411432635George, J. M., & Dane, E. (2016). Affect, emotion, and decision making. Organizational Behavior and Human Decision Processes, 136, 47-55. doi:10.1016/j.obhdp.2016.06.004Thelwall, M. (2017). TensiStrength: Stress and relaxation magnitude detection for social media texts. Information Processing & Management, 53(1), 106-121. doi:10.1016/j.ipm.2016.06.009Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544-2558. doi:10.1002/asi.21416Shoumy, N. J., Ang, L.-M., Seng, K. P., Rahaman, D. M. M., & Zia, T. (2020). Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals. Journal of Network and Computer Applications, 149, 102447. doi:10.1016/j.jnca.2019.102447Zhang, C., Zeng, D., Li, J., Wang, F.-Y., & Zuo, W. (2009). Sentiment analysis of Chinese documents: From sentence to document level. Journal of the American Society for Information Science and Technology, 60(12), 2474-2487. doi:10.1002/asi.21206Lu, B., Ott, M., Cardie, C., & Tsou, B. K. (2011). Multi-aspect Sentiment Analysis with Topic Models. 2011 IEEE 11th International Conference on Data Mining Workshops. doi:10.1109/icdmw.2011.125Nasukawa, T., & Yi, J. (2003). Sentiment analysis. Proceedings of the international conference on Knowledge capture - K-CAP ’03. doi:10.1145/945645.945658Borth, D., Ji, R., Chen, T., Breuel, T., & Chang, S.-F. (2013). Large-scale visual sentiment ontology and detectors using adjective noun pairs. Proceedings of the 21st ACM international conference on Multimedia - MM ’13. doi:10.1145/2502081.2502282Deb, S., & Dandapat, S. (2019). Emotion Classification Using Segmentation of Vowel-Like and Non-Vowel-Like Regions. IEEE Transactions on Affective Computing, 10(3), 360-373. doi:10.1109/taffc.2017.2730187Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition. 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. doi:10.1109/acii.2013.90Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space. IEEE Transactions on Affective Computing, 2(2), 92-105. doi:10.1109/t-affc.2011.9Hossain, M. S., Muhammad, G., Alhamid, M. F., Song, B., & Al-Mutib, K. (2016). Audio-Visual Emotion Recognition Using Big Data Towards 5G. Mobile Networks and Applications, 21(5), 753-763. doi:10.1007/s11036-016-0685-9Zhou, F., Jianxin Jiao, R., & Linsey, J. S. (2015). Latent Customer Needs Elicitation by Use Case Analogical Reasoning From Sentiment Analysis of Online Product Reviews. Journal of Mechanical Design, 137(7). doi:10.1115/1.4030159Ceci, F., Goncalves, A. L., & Weber, R. (2016). A model for sentiment analysis based on ontology and cases. IEEE Latin America Transactions, 14(11), 4560-4566. doi:10.1109/tla.2016.7795829Vizer, L. M., Zhou, L., & Sears, A. (2009). Automated stress detection using keystroke and linguistic features: An exploratory study. International Journal of Human-Computer Studies, 67(10), 870-886. doi:10.1016/j.ijhcs.2009.07.005Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89. doi:10.1145/2436256.2436274Schouten, K., & Frasincar, F. (2016). Survey on Aspect-Level Sentiment Analysis. IEEE Transactions on Knowledge and Data Engineering, 28(3), 813-830. doi:10.1109/tkde.2015.2485209Ji, R., Cao, D., Zhou, Y., & Chen, F. (2016). Survey of visual sentiment prediction for social media analysis. Frontiers of Computer Science, 10(4), 602-611. doi:10.1007/s11704-016-5453-2Li, L., Cao, D., Li, S., & Ji, R. (2015). Sentiment analysis of Chinese micro-blog based on multi-modal correlation model. 2015 IEEE International Conference on Image Processing (ICIP). doi:10.1109/icip.2015.7351718Lee, P.-M., Tsui, W.-H., & Hsiao, T.-C. (2015). The Influence of Emotion on Keyboard Typing: An Experimental Study Using Auditory Stimuli. PLOS ONE, 10(6), e0129056. doi:10.1371/journal.pone.0129056Matsiola, M., Dimoulas, C., Kalliris, G., & Veglis, A. A. (2018). Augmenting User Interaction Experience Through Embedded Multimodal Media Agents in Social Networks. Information Retrieval and Management, 1972-1993. doi:10.4018/978-1-5225-5191-1.ch088Rosaci, D. (2007). CILIOS: Connectionist inductive learning and inter-ontology similarities for recommending information agents. Information Systems, 32(6), 793-825. doi:10.1016/j.is.2006.06.003Buccafurri, F., Comi, A., Lax, G., & Rosaci, D. (2016). Experimenting with Certified Reputation in a Competitive Multi-Agent Scenario. IEEE Intelligent Systems, 31(1), 48-55. doi:10.1109/mis.2015.98Rosaci, D., & Sarnè, G. M. L. (2014). Multi-agent technology and ontologies to support personalization in B2C E-Commerce. Electronic Commerce Research and Applications, 13(1), 13-23. doi:10.1016/j.elerap.2013.07.003Singh, A., & Sharma, A. (2017). MAICBR: A Multi-agent Intelligent Content-Based Recommendation System. Lecture Notes in Networks and Systems, 399-411. doi:10.1007/978-981-10-3920-1_41Villavicencio, C., Schiaffino, S., Diaz-Pace, J. A., Monteserin, A., Demazeau, Y., & Adam, C. (2016). A MAS Approach for Group Recommendation Based on Negotiation Techniques. Lecture Notes in Computer Science, 219-231. doi:10.1007/978-3-319-39324-7_19Rincon, J. A., de la Prieta, F., Zanardini, D., Julian, V., & Carrascosa, C. (2017). Influencing over people with a social emotional model. Neurocomputing, 231, 47-54. doi:10.1016/j.neucom.2016.03.107Aguado, G., Julian, V., Garcia-Fornes, A., & Espinosa, A. (2020). A Multi-Agent System for guiding users in on-line social environments. Engineering Applications of Artificial Intelligence, 94, 103740. doi:10.1016/j.engappai.2020.103740Aguado, G., Julián, V., García-Fornes, A., & Espinosa, A. (2020). Using Keystroke Dynamics in a Multi-Agent System for User Guiding in Online Social Networks. Applied Sciences, 10(11), 3754. doi:10.3390/app10113754Camara, M., Bonham-Carter, O., & Jumadinova, J. (2015). A multi-agent system with reinforcement learning agents for biomedical text mining. Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics. doi:10.1145/2808719.2812596Lombardo, G., Fornacciari, P., Mordonini, M., Tomaiuolo, M., & Poggi, A. (2019). A Multi-Agent Architecture for Data Analysis. Future Internet, 11(2), 49. doi:10.3390/fi11020049Schweitzer, F., & Garcia, D. (2010). An agent-based model of collective emotions in online communities. The European Physical Journal B, 77(4), 533-545. doi:10.1140/epjb/e2010-00292-

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Improving the Generalizability of Speech Emotion Recognition: Methods for Handling Data and Label Variability

    Full text link
    Emotion is an essential component in our interaction with others. It transmits information that helps us interpret the content of what others say. Therefore, detecting emotion from speech is an important step towards enabling machine understanding of human behaviors and intentions. Researchers have demonstrated the potential of emotion recognition in areas such as interactive systems in smart homes and mobile devices, computer games, and computational medical assistants. However, emotion communication is variable: individuals may express emotion in a manner that is uniquely their own; different speech content and environments may shape how emotion is expressed and recorded; individuals may perceive emotional messages differently. Practically, this variability is reflected in both the audio-visual data and the labels used to create speech emotion recognition (SER) systems. SER systems must be robust and generalizable to handle the variability effectively. The focus of this dissertation is on the development of speech emotion recognition systems that handle variability in emotion communications. We break the dissertation into three parts, according to the type of variability we address: (I) in the data, (II) in the labels, and (III) in both the data and the labels. Part I: The first part of this dissertation focuses on handling variability present in data. We approximate variations in environmental properties and expression styles by corpus and gender of the speakers. We find that training on multiple corpora and controlling for the variability in gender and corpus using multi-task learning result in more generalizable models, compared to the traditional single-task models that do not take corpus and gender variability into account. Another source of variability present in the recordings used in SER is the phonetic modulation of acoustics. On the other hand, phonemes also provide information about the emotion expressed in speech content. We discover that we can make more accurate predictions of emotion by explicitly considering both roles of phonemes. Part II: The second part of this dissertation addresses variability present in emotion labels, including the differences between emotion expression and perception, and the variations in emotion perception. We discover that it is beneficial to jointly model both the perception of others and how one perceives one’s own expression, compared to focusing on either one. Further, we show that the variability in emotion perception is a modelable signal and can be captured using probability distributions that describe how groups of evaluators perceive emotional messages. Part III: The last part of this dissertation presents methods that handle variability in both data and labels. We reduce the data variability due to non-emotional factors using deep metric learning and model the variability in emotion perception using soft labels. We propose a family of loss functions and show that by pairing examples that potentially vary in expression styles and lexical content and preserving the real-valued emotional similarity between them, we develop systems that generalize better across datasets and are more robust to over-training. These works demonstrate the importance of considering data and label variability in the creation of robust and generalizable emotion recognition systems. We conclude this dissertation with the following future directions: (1) the development of real-time SER systems; (2) the personalization of general SER systems.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147639/1/didizbq_1.pd

    Voice source characterization for prosodic and spectral manipulation

    Get PDF
    The objective of this dissertation is to study and develop techniques to decompose the speech signal into its two main components: voice source and vocal tract. Our main efforts are on the glottal pulse analysis and characterization. We want to explore the utility of this model in different areas of speech processing: speech synthesis, voice conversion or emotion detection among others. Thus, we will study different techniques for prosodic and spectral manipulation. One of our requirements is that the methods should be robust enough to work with the large databases typical of speech synthesis. We use a speech production model in which the glottal flow produced by the vibrating vocal folds goes through the vocal (and nasal) tract cavities and its radiated by the lips. Removing the effect of the vocal tract from the speech signal to obtain the glottal pulse is known as inverse filtering. We use a parametric model fo the glottal pulse directly in the source-filter decomposition phase. In order to validate the accuracy of the parametrization algorithm, we designed a synthetic corpus using LF glottal parameters reported in the literature, complemented with our own results from the vowel database. The results show that our method gives satisfactory results in a wide range of glottal configurations and at different levels of SNR. Our method using the whitened residual compared favorably to this reference, achieving high quality ratings (Good-Excellent). Our full parametrized system scored lower than the other two ranking in third place, but still higher than the acceptance threshold (Fair-Good). Next we proposed two methods for prosody modification, one for each of the residual representations explained above. The first method used our full parametrization system and frame interpolation to perform the desired changes in pitch and duration. The second method used resampling on the residual waveform and a frame selection technique to generate a new sequence of frames to be synthesized. The results showed that both methods are rated similarly (Fair-Good) and that more work is needed in order to achieve quality levels similar to the reference methods. As part of this dissertation, we have studied the application of our models in three different areas: voice conversion, voice quality analysis and emotion recognition. We have included our speech production model in a reference voice conversion system, to evaluate the impact of our parametrization in this task. The results showed that the evaluators preferred our method over the original one, rating it with a higher score in the MOS scale. To study the voice quality, we recorded a small database consisting of isolated, sustained Spanish vowels in four different phonations (modal, rough, creaky and falsetto) and were later also used in our study of voice quality. Comparing the results with those reported in the literature, we found them to generally agree with previous findings. Some differences existed, but they could be attributed to the difficulties in comparing voice qualities produced by different speakers. At the same time we conducted experiments in the field of voice quality identification, with very good results. We have also evaluated the performance of an automatic emotion classifier based on GMM using glottal measures. For each emotion, we have trained an specific model using different features, comparing our parametrization to a baseline system using spectral and prosodic characteristics. The results of the test were very satisfactory, showing a relative error reduction of more than 20% with respect to the baseline system. The accuracy of the different emotions detection was also high, improving the results of previously reported works using the same database. Overall, we can conclude that the glottal source parameters extracted using our algorithm have a positive impact in the field of automatic emotion classification

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Human roars communicate upper-body strength more effectively than do screams or aggressive and distressed speech

    Get PDF
    Despite widespread evidence that nonverbal components of human speech (e.g., voice pitch) communicate information about physical attributes of vocalizers and that listeners can judge traits such as strength and body size from speech, few studies have examined the communicative functions of human nonverbal vocalizations (such as roars, screams, grunts and laughs). Critically, no previous study has yet to examine the acoustic correlates of strength in nonverbal vocalisations, including roars, nor identified reliable vocal cues to strength in human speech. In addition to being less acoustically constrained than articulated speech, agonistic nonverbal vocalizations function primarily to express motivation and emotion, such as threat, and may therefore communicate strength and body size more effectively than speech. Here, we investigated acoustic cues to strength and size in roars compared to screams and speech sentences produced in both aggressive and distress contexts. Using playback experiments, we then tested whether listeners can reliably infer a vocalizer’s actual strength and height from roars, screams, and valenced speech equivalents, and which acoustic features predicted listeners’ judgments. While there were no consistent acoustic cues to strength in any vocal stimuli, listeners accurately judged inter-individual differences in strength, and did so most effectively from aggressive voice stimuli (roars and aggressive speech). In addition, listeners more accurately judged strength from roars than from aggressive speech. In contrast, listeners’ judgments of height were most accurate for speech stimuli. These results support the prediction that vocalizers maximize impressions of physical strength in aggressive compared to distress contexts, and that inter-individual variation in strength may only be honestly communicated in vocalizations that function to communicate threat, particularly roars. Thus, in continuity with nonhuman mammals, the acoustic structure of human aggressive roars may have been selected to communicate, and to some extent exaggerate, functional cues to physical formidability

    USING DEEP LEARNING-BASED FRAMEWORK FOR CHILD SPEECH EMOTION RECOGNITION

    Get PDF
    Biological languages of the body through which human emotion can be detected abound including heart rate, facial expressions, movement of the eyelids and dilation of the eyes, body postures, skin conductance, and even the speech we make. Speech emotion recognition research started some three decades ago, and the popular Interspeech Emotion Challenge has helped to propagate this research area. However, most speech recognition research is focused on adults and there is very little research on child speech. This dissertation is a description of the development and evaluation of a child speech emotion recognition framework. The higher-level components of the framework are designed to sort and separate speech based on the speaker’s age, ensuring that focus is only on speeches made by children. The framework uses Baddeley’s Theory of Working Memory to model a Working Memory Recurrent Network that can process and recognize emotions from speech. Baddeley’s Theory of Working Memory offers one of the best explanations on how the human brain holds and manipulates temporary information which is very crucial in the development of neural networks that learns effectively. Experiments were designed and performed to provide answers to the research questions, evaluate the proposed framework, and benchmark the performance of the framework with other methods. Satisfactory results were obtained from the experiments and in many cases, our framework was able to outperform other popular approaches. This study has implications for various applications of child speech emotion recognition such as child abuse detection and child learning robots

    Development of Markerless Systems for Automatic Analysis of Movements and Facial Expressions: Applications in Neurophysiology

    Get PDF
    This project is focused on the development of markerless methods for studying facial expressions and movements in neurology, focusing on Parkinson’s disease (PD) and disorders of consciousness (DOC). PD is a neurodegenerative illness that affects around 2% of the population over 65 years old. Impairments of voice/speech are among the main signs of PD. This set of impairments is called hypokinetic dysarthria, because of the reduced range of movements involved in speech. This reduction can be visible also in other facial muscles, leading to a hypomimia. Despite the high percentage of patients that suffer from dysarthria and hypomimia, only a few of them undergo speech therapy with the aim to improve the dynamic of articulatory/facial movements. The main reason is the lack of low cost methodologies that could be implemented at home. DOC after coma are Vegetative State (VS), characterized by the absence of self-awareness and awareness of the environment, and Minimally Conscious State (MCS), in which certain behaviors are sufficiently reproducible to be distinguished from reflex responses. The differential diagnosis between VS and MCS can be hard and prone to a high rate of misdiagnosis (~40%). This differential diagnosis is mainly based on neuro-behavioral scales. A key role to plan the rehabilitation in DOC patients is played by the first diagnosis after coma. In fact, MCS patients are more prone to a consciousness recovery than VS patients. Concerning PD the aim is the development of contactless systems that could be used to study symptoms related to speech and facial movements/expressions. The methods proposed here, based on acoustical analysis and video processing techniques could support patients during speech therapy also at home. Concerning DOC patients the project is focused on the assessment of reflex and cognitive responses to standardized stimuli. This would allow objectifying the perceptual analysis performed by clinicians
    corecore