182 research outputs found

    Detección acústica de eventos violentos

    Get PDF
    Los eventos violentos son situaciones que cada día escuchamos y vemos en la vida, sobre todo en películas a las que estamos acostumbrados a ver diariamente. Estas películas están clasificadas por edades en función a la cantidad de escenas de gran crueldad o violencia. Este es quizás la aplicación más común del tema que tratamos en este proyecto, pero no es el único. ¿Cómo sería poder predecir situaciones de violencia y poder avisar a los servicios de emergencia lo antes posible? En este estudio buscamos el poder predecir ciertos eventos violentos en archivos de audio. Utilizando los datos proporcionados por la base de datos utilizada, extraeremos las características del audio a estudiar y tendremos que modelar los datos proporcionados para adecuarlos al estudio y así poder manejarlos de mejor manera. Para ello, nos basaremos en herramientas basadas en el aprendizaje máquina para intentar obtener unos buenos resultados de predicción. Dentro del aprendizaje máquina buscaremos los algoritmos o métodos que más se ajusten a las necesidades del proyecto, evaluando sobre todo aquellos relacionados con la clasificación.Ingeniería de Sistemas de Comunicacione

    An audio-visual approach to web video categorization

    Get PDF
    International audienceIn this paper we address the issue of automatic video genre categorization of web media using an audio-visual approach. To this end, we propose content descriptors which exploit audio, temporal structure and color information. The potential of our descriptors is experimentally validated both from the perspective of a classification system and as an information retrieval approach. Validation is carried out on a real scenario, namely on more than 288 hours of video footage and 26 video genres specific to blip.tv media platform. Additionally, to reduce semantic gap, we propose a new relevance feedback technique which is based on hierarchical clustering. Experimental tests prove that retrieval performance can be significantly increased in this case, becoming comparable to the one obtained with high level semantic textual descriptors

    Detección de la saliencia auditiva en registros de audio

    Get PDF
    La percepción humana es un proceso por el cual nuestro cerebro recibe información a través de los sentidos del mundo que nos rodea. Sin embargo, durante este proceso, algunos estímulos son considerados más importantes que otros, es decir, se priorizan. La saliencia auditiva define, por tanto, el mecanismo que utiliza nuestro cerebro para priorizar ciertos estímulos, en este caso de tipo sonoro. Durante los últimos años, los avances tecnológicos y la adaptación de modelos para saliencia visual, han supuesto el comienzo definitivo de la investigación en el campo de la detección de eventos auditivos salientes. Además, el entrenamiento de redes neuronales para su aplicación en estos modelos permite obtener una aproximación más cercana a la estructura biológica real que genera el proceso de priorización. Diversos tipos de redes neuronales son implementados en función del objetivo del modelo desarrollado. En algunos casos, la finalidad será clasificar eventos, en otros la detección. Para el caso de este proyecto, se utiliza la regresión como modelo para obtener valores numéricos que permitan ajustar los pesos de la red neuronal en función de los valores objetivo, obtenidos mediante mediciones fisiológicas para formar un ground truth, es decir, un valor fiable de referencia. En los últimos años, ya están surgiendo modelos más complejos que comprenden la detección de saliencia auditiva y visual conjuntamente, ya que en ámbitos como el cinematográfico o incluso en nuestra vida diaria es más natural utilizar ambos sentidos, el de la vista y el del oído, de manera combinada.Human perception is a process that our brain receives information through the senses from the world around us. However, during this process, some stimuli are considered more important than the others, i.e, they are prioritized. Aural saliency defines the mechanism that our brain use to prioritize certain stimuli, in this case sounds. During the latest years, the technology advances and the adaptation of models for visual saliency, have been the beginning of the aural salience event detection research. Furthermore, the neural network training for the application in these models let us to obtain an approach to the biological structure that generates the priority process. Several neural networks types are implemented depending on the objective of the model developed. In some cases, the finality will be the event classification, other times the detection. In this project, we use the regression model to obtain number values that allow adjust the weights of the neural network in accordance with the objective values, which are obtain through physiological measurements to form the ground truth, i.e., the reference. In this years, more complex models are emerging. This models include de aural and visual saliency because some contexts as the cinema or even the daily life is more natural to use both senses, the sense of sight and hearing combined.Ingeniería de Sistemas Audiovisuale

    Emotional State Recognition Based on Physiological Signals

    Get PDF
    Emotsionaalsete seisundite tuvastamine on väga tähtis inimese ja arvuti vahelise suhtlemise (Human-Computer Interaction, HCI) jaoks. Tänapäeval leiavad masinõppe meetodid ühe enam rakendust paljudes inimtegevuse valdkondades. Viimased uuringud näitavad, et füsioloogiliste signaalide analüüs masinõppe meetoditega võiks võimaldada inimese emotsionaalse seisundi tuvastamist hea täpsusega. Vaadates emotsionaalse sisuga videosid, või kuulates helisid, tekib inimesel spetsifiline füsiloogiline vastus. Antud uuringus me kasutame masinõpet ja heuristilist lähenemist, et tuvastada emotsionaalseid seisundeid füsioloogiliste signaalide põhjal. Meetodite võrdlus näitas, et kõrgeim täpsus saavutati juhuslike metsade (Random Forest) meetodiga rakendades seda EEG signaalile, mis teisendati sagedusintervallideks. Ka kombineerides EEG-d teiste füsioloogiliste signaalidega oli tuvastamise täpsus suhteliselt kõrge. Samas heuristilised meetodid ja EEG signaali klassifitseerimise rekurrentse närvivõrkude abil ebaõnnestusid. Andmeallikaks oli MAHNOB-HCI mitmemodaalne andmestik, mis koosneb 27 isikult kogutud füsioloogilistest signaalidest, kus igaüks neist vaatas 20 emotsionaalset videolõiku. Ootamatu tulemusena saime teada, et klassikaline Eckman'i emotsionaalsete seisundite nimekiri oli parem emotsioonide kirjeldamiseks ja klassifitseerimiseks kui kaasaegne mudel, mis esitab emotsioone valentsuse ja ärrituse teljestikul. Meie töö näitab, et emotsiooni märgistamise meetod on väga tähtis hea klassifitseerimismudeli loomiseks, ning et kasutatav andmestik peab sobima masinõppe meetodite jaoks. Saadud tulemused võivad aidata valida õigeid füsioloogilisi signaale ja emotsioonide märkimise meetodeid uue andmestiku loomisel ja töötlemisel.Emotional state recognition is a crucial task for achieving a new level of Human-Computer Interaction (HCI). Machine Learning applications penetrate more and more spheres of everyday life. Recent studies are showing promising results in analyzing physiological signals (EEG, ECG, GSR) using Machine Learning for accessing emotional state. Commonly, specific emotion is invoked by playing affective videos or sounds. However, there is no canonical way for emotional state interpretation. In this study, we classified affective physiological signals with labels obtained from two emotional state estimation approaches using machine learning algorithms and heuristic formulas. Comparison of the method has shown that the highest accuracy was achieved using Random Forest classifier on spectral features from the EEG records, a combination of features for the peripheral physiological signal also shown relatively high classification performance. However, heuristic formulas and novel approach for ECG signal classification using recurrent neural network ultimately failed. Data was taken from the MAHNOB-HCI dataset which is a multimodal database collected on 27 subjects by showing 20 emotional movie fragment`s. We obtained an unexpected result, that description of emotional states using discrete Eckman's paradigm provides better classification results comparing to the contemporary dimensional model which represents emotions by matching them onto the Cartesian plane with valence and arousal axis. Our study shows the importance of label selection in emotion recognition task. Moreover, obtained dataset have to be suitable for Machine Learning algorithms. Acquired results may help to select proper physiological signals and emotional labels for further dataset creation and post-processing

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    De l'indexation d'évènements dans des films (application à la détection de violence)

    Get PDF
    Dans cette thèse, nous nous intéressons à la détection de concepts sémantiques dans des films "Hollywoodiens" à l'aide de concepts audio et vidéos, dans le cadre applicatif de la détection de violence. Nos travaux se portent sur deux axes : la détection de concepts audio violents, tels que les coups de feu et les explosions, puis la détection de violence, dans un premier temps uniquement fondée sur l'audio, et dans un deuxième temps fondée sur l'audio et la vidéo. Dans le cadre de la détection de concepts audio, nous mettons tout d'abord un problème de généralisation en lumière, et nous montrons que ce problème est probablement dû à une divergence statistique entre les attributs audio extraits des films. Nous proposons pour résoudre ce problème d'utiliser le concept des mots audio, de façon à réduire cette variabilité en groupant les échantillons par similarité, associé à des réseaux Bayésiens contextuels. Les résultats obtenus sont très encourageants, et une comparaison avec un état de l'art obtenu sur les même données montre que les résultats sont équivalents. Le système obtenu peut être soit très robuste vis-à-vis du seuil appliqué en utilisant la fusion précoce des attributs, soit proposer une grande variété de points de fonctionnement. Nous proposons enfin une adaptation de l'analyse factorielle développée dans le cadre de la reconnaissance du locuteur, et montrons que son intégration dans notre système améliore les résultats obtenus. Dans le cadre de la détection de violence, nous présentons la campagne d'évaluation MediaEval Affect Task 2012, dont l'objectif est de regrouper les équipes travaillant sur le sujet de la détection de violence. Nous proposons ensuite trois systèmes pour détecter la violence, deux fondés uniquement sur l'audio, le premier utilisant une description TF-IDF, et le second étant une intégration du système de détection de concepts audio dans le cadre de la détection violence, et un système multimodal utilisant l'apprentissage de structures de graphe dans des réseaux bayésiens. Les performances obtenues dans le cadre des différents systèmes, et une comparaison avec les systèmes développés dans le cadre de MediaEval, montrent que nous sommes au niveau de l'état de l'art, et révèlent la complexité de tels systèmes.In this thesis, we focus on the detection of semantic concepts in "Hollywood" movies using audio and video concepts for the detection of violence. We present experiments in two main areas : the detection of violent audio concepts such as gunshots and explosions, and the detection of violence, initially based only on audio, then based on both audio and video. In the context of audio concepts detection, we first show a generalisation arising between movies. We show that this problem is probably due to a statistical divergence between the audio features extracted from the movies. In order to solve it, we propose to use the concept of audio words, so as to reduce the variability by grouping samples by similarity, combined with contextual Bayesian networks. The results are very encouraging, and a comparison with the state of the art obtained on the same data shows that the results we obtain are equivalent. The resulting system can be either robust against the threshold applied by using early fusion of features, or provides a wide variety of operating points. We finally propose an adaptation of the factor analysis scheme developed in the context of speaker recognition, and show that its integration into our system improves the results. In the context of the detection of violence, we present the Mediaeval Affect Task 2012 evaluation campaign, which aims at bringing together teams working on the topic of violence detection. We then propose three systems for detecting the violence. The first two are based only on audio, the first using a TF-IDF description, and the second being the integration of the previous system for the detection violence. The last system we present is a multimodal system based on Bayesian networks that allows us to explore structure learning algorithms for graphs. The performance obtained in the different systems, and a comparison to the systems developed within Mediaeval, show that we are comparable to the state of the art, and show the complexity of such systems.RENNES1-Bibl. électronique (352382106) / SudocSudocFranceF

    CGAMES'2009

    Get PDF

    Virtual Reality Games for Motor Rehabilitation

    Get PDF
    This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion

    Challenges and perspectives of hate speech research

    Get PDF
    This book is the result of a conference that could not take place. It is a collection of 26 texts that address and discuss the latest developments in international hate speech research from a wide range of disciplinary perspectives. This includes case studies from Brazil, Lebanon, Poland, Nigeria, and India, theoretical introductions to the concepts of hate speech, dangerous speech, incivility, toxicity, extreme speech, and dark participation, as well as reflections on methodological challenges such as scraping, annotation, datafication, implicity, explainability, and machine learning. As such, it provides a much-needed forum for cross-national and cross-disciplinary conversations in what is currently a very vibrant field of research

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
    corecore