63 research outputs found

    Exquisitor:Interactive Learning for Multimedia

    Get PDF

    Non-acted multi-view audio-visual dyadic interactions. Project non-verbal emotion recognition in dyadic scenarios and speaker segmentation

    Get PDF
    Treballs finals del MĆ ster de Fonaments de CiĆØncia de Dades, Facultat de matemĆ tiques, Universitat de Barcelona, Any: 2019, Tutor: Sergio Escalera Guerrero i Cristina Palmero[en] In particular, this Master Thesis is focused on the development of baseline Emotion Recognition System in a dyadic environment using raw and handcraft audio features and cropped faces from the videos. This system is analyzed at frame and utterance level without temporal information. As well, a baseline Speaker Segmenta- tion System has been developed to facilitate the annotation task. For this reason, an exhaustive study of the state-of-the-art on emotion recognition and speaker segmentation techniques has been conducted, paying particular attention on Deep Learning techniques for emotion recognition and clustering for speaker aegmentation. While studying the state-of-the-art from the theoretical point of view, a dataset consisting of videos of sessions of dyadic interactions between individuals in different scenarios has been recorded. Different attributes were captured and labelled from these videos: body pose, hand pose, emotion, age, gender, etc. Once the ar- chitectures for emotion recognition have been trained with other dataset, a proof of concept is done with this new database in order to extract conclusions. In addition, this database can help future systems to achieve better results. A large number of experiments with audio and video are performed to create the emotion recognition system. The IEMOCAP database is used to perform the training and evaluation experiments of the emotion recognition system. Once the audio and video are trained separately with two different architectures, a fusion of both methods is done. In this work, the importance of preprocessing data (face detection, windows analysis length, handcrafted features, etc.) and choosing the correct parameters for the architectures (network depth, fusion, etc.) has been demonstrated and studied. On the other hand, the experiments for the speaker segmentation system are performed with a piece of audio from IEMOCAP database. In this work, the prerprocessing steps, the problems of an unsupervised system such as clustering and the feature representation are studied and discussed. Finally, the conclusions drawn throughout this work are exposed, as well as the possible lines of future work including new systems for emotion recognition and the experiments with the database recorded in this work

    Non-acted multi-view audio-visual dyadic Interactions. Project master thesis: multi-modal local and recurrent non-verbal emotion recognition in dyadic scenarios

    Get PDF
    Treballs finals del MĆ ster de Fonaments de CiĆØncia de Dades, Facultat de matemĆ tiques, Universitat de Barcelona, Any: 2019, Tutor: Sergio Escalera Guerrero i Cristina Palmero[en] In particular, this master thesis is focused on the development of baseline emotion recognition system in a dyadic environment using raw and handcraft audio features and cropped faces from the videos. This system is analyzed at frame and utterance level with and without temporal information. For this reason, an exhaustive study of the state-of-the-art on emotion recognition techniques has been conducted, paying particular attention on Deep Learning techniques for emotion recognition. While studying the state-of-the-art from the theoretical point of view, a dataset consisting of videos of sessions of dyadic interactions between individuals in different scenarios has been recorded. Different attributes were captured and labelled from these videos: body pose, hand pose, emotion, age, gender, etc. Once the architectures for emotion recognition have been trained with other dataset, a proof of concept is done with this new database in order to extract conclusions. In addition, this database can help future systems to achieve better results. A large number of experiments with audio and video are performed to create the emotion recognition system. The IEMOCAP database is used to perform the training and evaluation experiments of the emotion recognition system. Once the audio and video are trained separately with two different architectures, a fusion of both methods is done. In this work, the importance of preprocessing data (i.e. face detection, windows analysis length, handcrafted features, etc.) and choosing the correct parameters for the architectures (i.e. network depth, fusion, etc.) has been demonstrated and studied, while some experiments to study the influence of the temporal information are performed using some recurrent models for the spatiotemporal utterance level recognition of emotion. Finally, the conclusions drawn throughout this work are exposed, as well as the possible lines of future work including new systems for emotion recognition and the experiments with the database recorded in this work

    Creation, Enrichment and Application of Knowledge Graphs

    Get PDF
    The world is in constant change, and so is the knowledge about it. Knowledge-based systems - for example, online encyclopedias, search engines and virtual assistants - are thus faced with the constant challenge of collecting this knowledge and beyond that, to understand it and make it accessible to their users. Only if a knowledge-based system is capable of this understanding - that is, it is capable of more than just reading a collection of words and numbers without grasping their semantics - it can recognise relevant information and make it understandable to its users. The dynamics of the world play a unique role in this context: Events of various kinds which are relevant to different communities are shaping the world, with examples ranging from the coronavirus pandemic to the matches of a local football team. Vital questions arise when dealing with such events: How to decide which events are relevant, and for whom? How to model these events, to make them understood by knowledge-based systems? How is the acquired knowledge returned to the users of these systems? A well-established concept for making knowledge understandable by knowledge-based systems are knowledge graphs, which contain facts about entities (persons, objects, locations, ...) in the form of graphs, represent relationships between these entities and make the facts understandable by means of ontologies. This thesis considers knowledge graphs from three different perspectives: (i) Creation of knowledge graphs: Even though the Web offers a multitude of sources that provide knowledge about the events in the world, the creation of an event-centric knowledge graph requires recognition of such knowledge, its integration across sources and its representation. (ii) Knowledge graph enrichment: Knowledge of the world seems to be infinite, and it seems impossible to grasp it entirely at any time. Therefore, methods that autonomously infer new knowledge and enrich the knowledge graphs are of particular interest. (iii) Knowledge graph interaction: Even having all knowledge of the world available does not have any value in itself; in fact, there is a need to make it accessible to humans. Based on knowledge graphs, systems can provide their knowledge with their users, even without demanding any conceptual understanding of knowledge graphs from them. For this to succeed, means for interaction with the knowledge are required, hiding the knowledge graph below the surface. In concrete terms, I present EventKG - a knowledge graph that represents the happenings in the world in 15 languages - as well as Tab2KG - a method for understanding tabular data and transforming it into a knowledge graph. For the enrichment of knowledge graphs without any background knowledge, I propose HapPenIng, which infers missing events from the descriptions of related events. I demonstrate means for interaction with knowledge graphs at the example of two web-based systems (EventKG+TL and EventKG+BT) that enable users to explore the happenings in the world as well as the most relevant events in the lives of well-known personalities.Die Welt befindet sich im steten Wandel, und mit ihr das Wissen Ć¼ber die Welt. Wissensbasierte Systeme - seien es Online-EnzyklopƤdien, Suchmaschinen oder Sprachassistenten - stehen somit vor der konstanten Herausforderung, dieses Wissen zu sammeln und darĆ¼ber hinaus zu verstehen, um es so Menschen verfĆ¼gbar zu machen. Nur wenn ein wissensbasiertes System in der Lage ist, dieses VerstƤndnis aufzubringen - also zu mehr in der Lage ist, als auf eine unsortierte Ansammlung von Wƶrtern und Zahlen zurĆ¼ckzugreifen, ohne deren Bedeutung zu erkennen -, kann es relevante Informationen erkennen und diese seinen Nutzern verstƤndlich machen. Eine besondere Rolle spielt hierbei die Dynamik der Welt, die von Ereignissen unterschiedlichster Art geformt wird, die fĆ¼r unterschiedlichste Bevƶlkerungsgruppe relevant sind; Beispiele hierfĆ¼r erstrecken sich von der Corona-Pandemie bis hin zu den Spielen lokaler FuƟballvereine. Doch stellen sich hierbei bedeutende Fragen: Wie wird die Entscheidung getroffen, ob und fĆ¼r wen derlei Ereignisse relevant sind? Wie sind diese Ereignisse zu modellieren, um von wissensbasierten Systemen verstanden zu werden? Wie wird das angeeignete Wissen an die Nutzer dieser Systeme zurĆ¼ckgegeben? Ein bewƤhrtes Konzept, um wissensbasierten Systemen das Wissen verstƤndlich zu machen, sind Wissensgraphen, die Fakten Ć¼ber EntitƤten (Personen, Objekte, Orte, ...) in der Form von Graphen sammeln, ZusammenhƤnge zwischen diesen EntitƤten darstellen, und darĆ¼ber hinaus anhand von Ontologien verstƤndlich machen. Diese Arbeit widmet sich der Betrachtung von Wissensgraphen aus drei aufeinander aufbauenden Blickwinkeln: (i) Erstellung von Wissensgraphen: Auch wenn das Internet eine Vielzahl an Quellen anbietet, die Wissen Ć¼ber Ereignisse in der Welt bereithalten, so erfordert die Erstellung eines ereigniszentrierten Wissensgraphen, dieses Wissen zu erkennen, miteinander zu verbinden und zu reprƤsentieren. (ii) Anreicherung von Wissensgraphen: Das Wissen Ć¼ber die Welt scheint schier unendlich und so scheint es unmƶglich, dieses je vollstƤndig (be)greifen zu kƶnnen. Von Interesse sind also Methoden, die selbststƤndig das vorhandene Wissen erweitern. (iii) Interaktion mit Wissensgraphen: Selbst alles Wissen der Welt bereitzuhalten, hat noch keinen Wert in sich selbst, vielmehr muss dieses Wissen Menschen verfĆ¼gbar gemacht werden. Basierend auf Wissensgraphen, kƶnnen wissensbasierte Systeme Nutzern ihr Wissen darlegen, auch ohne von diesen ein konzeptuelles VerstƤndis von Wissensgraphen abzuverlangen. Damit dies gelingt, sind Mƶglichkeiten der Interaktion mit dem gebotenen Wissen vonnƶten, die den genutzten Wissensgraphen unter der OberflƤche verstecken. Konkret prƤsentiere ich EventKG - einen Wissensgraphen, der Ereignisse in der Welt reprƤsentiert und in 15 Sprachen verfĆ¼gbar macht, sowie Tab2KG - eine Methode, um in Tabellen enthaltene Daten anhand von Hintergrundwissen zu verstehen und in Wissensgraphen zu wandeln. Zur Anreicherung von Wissensgraphen ohne weiteres Hintergrundwissen stelle ich HapPenIng vor, das fehlende Ereignisse aus den vorliegenden Beschreibungen Ƥhnlicher Ereignisse inferiert. Interaktionsmƶglichkeiten mit Wissensgraphen demonstriere ich anhand zweier web-basierter Systeme (EventKG+TL und EventKG+BT), die Nutzern auf einfache Weise die Exploration von Geschehnissen in der Welt sowie der wichtigsten Ereignisse in den Leben bekannter Persƶnlichkeiten ermƶglichen

    Feature based dynamic intra-video indexing

    Get PDF
    A thesis submitted in partial fulfillment for the degree of Doctor of PhilosophyWith the advent of digital imagery and its wide spread application in all vistas of life, it has become an important component in the world of communication. Video content ranging from broadcast news, sports, personal videos, surveillance, movies and entertainment and similar domains is increasing exponentially in quantity and it is becoming a challenge to retrieve content of interest from the corpora. This has led to an increased interest amongst the researchers to investigate concepts of video structure analysis, feature extraction, content annotation, tagging, video indexing, querying and retrieval to fulfil the requirements. However, most of the previous work is confined within specific domain and constrained by the quality, processing and storage capabilities. This thesis presents a novel framework agglomerating the established approaches from feature extraction to browsing in one system of content based video retrieval. The proposed framework significantly fills the gap identified while satisfying the imposed constraints of processing, storage, quality and retrieval times. The output entails a framework, methodology and prototype application to allow the user to efficiently and effectively retrieved content of interest such as age, gender and activity by specifying the relevant query. Experiments have shown plausible results with an average precision and recall of 0.91 and 0.92 respectively for face detection using Haar wavelets based approach. Precision of age ranges from 0.82 to 0.91 and recall from 0.78 to 0.84. The recognition of gender gives better precision with males (0.89) compared to females while recall gives a higher value with females (0.92). Activity of the subject has been detected using Hough transform and classified using Hiddell Markov Model. A comprehensive dataset to support similar studies has also been developed as part of the research process. A Graphical User Interface (GUI) providing a friendly and intuitive interface has been integrated into the developed system to facilitate the retrieval process. The comparison results of the intraclass correlation coefficient (ICC) shows that the performance of the system closely resembles with that of the human annotator. The performance has been optimised for time and error rate

    ASCCbot: An Open Mobile Robot Platform

    Get PDF
    ASCCbot, an open mobile platform built in ASCC lab, is presented in this thesis. The hardware and software design of the ASCCbot makes it a robust, extendable and duplicable robot platform which is suitable for most mobile robotics research including navigation, mapping, localization, etc. ROS is adopted as the major software framework, which not only makes ASCCbot a open-source project, but also extends its network functions so that multi-robot network applications can be easily implemented based on multiple ASCCbots. Collaborative localization is designed to test the network features of the ASCCbot. A telepresence robot is built based on the ASCCbot. A Kinect-based human gesture recognition method is implemented for intuitive human-robot interaction on it. For the telepresence robot, a GUI is also created in which basic control commands, video streaming and 2D metric map rendering are presented. Last but not least, semantic mapping through human activity recognition is proposed as a novel approach to semantic mapping. For the human activity recognition part, a power-aware wireless motion sensor is designed and evaluated. The overall semantic mapping system is explained and tested in a mock apartment. The experiment results show that the activity recognition results are reliable, and the semantic map updating process is able to create an accurate semantic map which matches the real furniture layout. To sum up, the ASCCbot is a versatile mobile robot platform with basic functions as well as feature functions implemented. Complex high-level functions can be built upon the existing functions from the ASCCbot. With its duplicability, extendability and open-source feature, the ASCCbot will be very useful for mobile robotics research.School of Electrical & Computer Engineerin

    Trust-based algorithms for fusing crowdsourced estimates of continuous quantities

    No full text
    Crowdsourcing has provided a viable way of gathering information at unprecedented volumes and speed by engaging individuals to perform simple microā€“tasks. In particular, the crowdsourcing paradigm has been successfully applied to participatory sensing, in which the users perform sensing tasks and provide data using their mobile devices. In this way, people can help solve complex environmental sensing tasks, such as weather monitoring, nuclear radiation monitoring and cell tower mapping, in a highly decentralised and parallelised fashion. Traditionally, crowdsourcing technologies were primarily used for gathering data for classifications and image labelling tasks. In contrast, such crowdā€“based participatory sensing poses new challenges that relate to (i) dealing with humanā€“reported sensor data that are available in the form of continuous estimates of an observed quantity such as a location, a temperature or a sound reading, (ii) dealing with possible spatial and temporal correlations within the data and (ii) issues of data trustworthiness due to the unknown capabilities and incentives of the participants and their devices. Solutions to these challenges need to be able to combine the data provided by multiple users to ensure the accuracy and the validity of the aggregated results. With this in mind, our goal is to provide methods to better aid the aggregation process of crowdā€“reported sensor estimates of continuous quantities when data are provided by individuals of varying trustworthiness. To achieve this, we develop a trustā€“based in- formation fusion framework that incorporates latent trustworthiness traits of the users within the data fusion process. Through this framework, we develop a set of four novel algorithms (MaxTrust, BACE, TrustGP and TrustLGCP) to compute reliable aggregations of the usersā€™ reports in both the settings of observing a stationary quantity (Max- Trust and BACE) and a spatially distributed phenomenon (TrustGP and TrustLGCP). The key feature of all these algorithm is the ability of (i) learning the trustworthiness of each individual who provide the data and (ii) exploit this latent userā€™s trustworthiness information to compute a more accurate fused estimate. In particular, this is achieved by using a probabilistic framework that allows our methods to simultaneously learn the fused estimate and the usersā€™ trustworthiness from the crowd reports. We validate our algorithms in four key application areas (cell tower mapping, WiFi networks mapping, nuclear radiation monitoring and disaster response) that demonstrate the practical impact of our framework to achieve substantially more accurate and informative predictions compared to the existing fusion methods. We expect that results of this thesis will allow to build more reliable data fusion algorithms for the broad class of humanā€“centred information systems (e.g., recommendation systems, peer reviewing systems, student grading tools) that are based on making decisions upon subjective opinions provided by their users
    • ā€¦
    corecore