43 research outputs found

    Music emotion recognition: a multimodal machine learning approach

    Get PDF
    Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major selection preferred by web users. As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for innovative and adaptable search mechanisms, which can be personalized according to users’ emotional state, has gained increasing consideration in recent years. This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semisupervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the crossvalidated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performanc

    Affective Brain-Computer Interfaces

    Get PDF

    Avatar augmented online conversation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003.Includes bibliographical references (p. 167-175).One of the most important roles played by technology is connecting people and mediating their communication with one another. Building technology that mediates conversation presents a number of challenging research and design questions. Apart from the fundamental issue of what exactly gets mediated, two of the more crucial questions are how the person being mediated interacts with the mediating layer and how the receiving person experiences the mediation. This thesis is concerned with both of these questions and proposes a theoretical framework of mediated conversation by means of automated avatars. This new approach relies on a model of face-to-face conversation, and derives an architecture for implementing these features through automation. First the thesis describes the process of face-to-face conversation and what nonverbal behaviors contribute to its success. It then presents a theoretical framework that explains how a text message can be automatically analyzed in terms of its communicative function based on discourse context, and how behaviors, shown to support those same functions in face-to-face conversation, can then be automatically performed by a graphical avatar in synchrony with the message delivery. An architecture, Spark, built on this framework demonstrates the approach in an actual system design that introduces the concept of a message transformation pipeline, abstracting function from behavior, and the concept of an avatar agent, responsible for coordinated delivery and continuous maintenance of the communication channel. A derived application, MapChat, is an online collaboration system where users represented by avatars in a shared virtual environment can chat and manipulate an interactive map while their avatars generate face-to-face behaviors.(cont.) A study evaluating the strength of the approach compares groups collaborating on a route-planning task using MapChat with and without the animated avatars. The results show that while task outcome was equally good for both groups, the group using these avatars felt that the task was significantly less difficult, and the feeling of efficiency and consensus were significantly stronger. An analysis of the conversation transcripts shows a significant improvement of the overall conversational process and significantly fewer messages spent on channel maintenance in the avatar groups. The avatars also significantly improved the users' perception of each others' effort. Finally, MapChat with avatars was found to be significantly more personal, enjoyable, and easier to use. The ramifications of these findings with respect to mediating conversation are discussed.by Hannes Högni. Vilhjálmsson.Ph.D

    Shaping the auditory peripersonal space with motor planning in immersive virtual reality

    Get PDF
    Immersive audio technologies require personalized binaural synthesis through headphones to provide perceptually plausible virtual and augmented reality (VR/AR) simulations. We introduce and apply for the first time in VR contexts the quantitative measure called premotor reaction time (pmRT) for characterizing sonic interactions between humans and the technology through motor planning. In the proposed basic virtual acoustic scenario, listeners are asked to react to a virtual sound approaching from different directions and stopping at different distances within their peripersonal space (PPS). PPS is highly sensitive to embodied and environmentally situated interactions, anticipating the motor system activation for a prompt preparation for action. Since immersive VR applications benefit from spatial interactions, modeling the PPS around the listeners is crucial to reveal individual behaviors and performances. Our methodology centered around the pmRT is able to provide a compact description and approximation of the spatiotemporal PPS processing and boundaries around the head by replicating several well-known neurophysiological phenomena related to PPS, such as auditory asymmetry, front/back calibration and confusion, and ellipsoidal action fields

    Social interaction in collaborative engineering environments

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 1999.Includes bibliographical references (leaves 128-131).by Joon Suk Hor.M.Eng

    Safe and Sound: Proceedings of the 27th Annual International Conference on Auditory Display

    Get PDF
    Complete proceedings of the 27th International Conference on Auditory Display (ICAD2022), June 24-27. Online virtual conference

    Understanding misinformation on Twitter in the context of controversial issues

    Get PDF
    Social media is slowly supplementing, or even replacing, traditional media outlets such as television, newspapers, and radio. However, social media presents some drawbacks when it comes to circulating information. These drawbacks include spreading false information, rumors, and fake news. At least three main factors create these drawbacks: The filter bubble effect, misinformation, and information overload. These factors make gathering accurate and credible information online very challenging, which in turn may affect public trust in online information. These issues are even more challenging when the issue under discussion is a controversial topic. In this thesis, four main controversial topics are studied, each of which comes from a different domain. This variation of domains can give a broad view of how misinformation is manifested in social media, and how it is manifested differently in different domains. This thesis aims to understand misinformation in the context of controversial issue discussions. This can be done through understanding how misinformation is manifested in social media as well as by understanding people’s opinions towards these controversial issues. In this thesis, three different aspects of a tweet are studied. These aspects are 1) the user sharing the information, 2) the information source shared, and 3) whether specific linguistic cues can help in assessing the credibility of information on social media. Finally, the web application tool TweetChecker is used to allow online users to have a more in-depth understanding of the discussions about five different controversial health issues. The results and recommendations of this study can be used to build solutions for the problem of trustworthiness of user-generated content on different social media platforms, especially for controversial issues

    A multimodal approach to persuasion in oral presentations : the case of conference presentations, research dissemination talks and product pitches

    Get PDF
    Esta tesis presenta un estudio multimodal y etnográfico del uso de estrategias persuasivas en tres géneros orales: presentaciones en conferencias, charlas de divulgación científica, y presentaciones de productos. Estos géneros comparten un importante componente persuasivo: los tres se dirigen a una audiencia tratando de convencerles del valor de un producto, servicio, o investigación. Sin embargo, se usan en dos contextos profesionales diferentes: el académico y el económico, por lo que cabe esperar que consigan su propósito comunicativo de forma diferente. Por otra parte, recientes estudios muestran como distintos discursos, tienden a adoptar cada vez más rasgos promocionales (promocionalización del discurso). En vista de ello, es factible establecer como hipótesis que los tres géneros están relacionados interdiscursivamente, y un estudio multimodal y etnográfico del uso de la persuasión en dichos géneros puede ayudar a clarificar las relaciones existentes entre ellos, así como sus diferencias.This thesis is a multimodal and ethnographic study of the use of persuasive strategies in three oral genres conference presentations, research dissemination talks and product pitches. These presentations share a strong persuasive component in their communicative purpose: the three of them address an audience to convince them of the value of a product, a service or a piece of research. However, they are used in business and academia by different discourse communities in different contexts, and therefore they can be expected to achieve their communicative goals in different ways. In addition, research suggests that there is a trend towards promotionalization of different discourses, among which academic discourse is included. In view of this, I hypothesize that these three genres are intertextually and interdiscursively related, and that a multimodal and ethnographic study of the use of persuasion in them can help to shed some light on these relationships and differences
    corecore