    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Electrodance as a "being-together": New forms of mediatization in the communication of youth styles

    Aquesta tesi s'emmarca en un interès per explorar com els estils i les cultures juvenils són produïts culturalment en un marc de comunicació digital en xarxa com l'actual. Amb aquest objectiu, pren com a objecte d'estudi l'electro dance, un estil de ball jove nascut als suburbis parisencs cap al 2006 i disseminat en anys posteriors globalment, l'aparició i desenvolupament del qual arriba amb la darrera onada de mitjans i plataformes digitals. La tesi proposa, des dels plans teòric i empíric, una anàlisi de les pràctiques quotidianes dels electrodancers que configuren una manera d'«estar tots plegats», i dedica atenció especial a les que fan un ús intensiu dels mitjans digitals recents. Conceptes com mediatization, interface, comunicació broadcast enfront de network integren un marc d'interpretació des del qual s'observa com nocions habituals com les de públic i privat, producció i consum, local i global o interacció cara a cara i tecnològicament mitjançada assoleixen una nova articulació en els nostres dies i esdevenen expressió d'un entorn de comunicació global massiva en constant transformació.Esta tesis se enmarca un interés por explorar cómo los estilos y las culturas juveniles son producidos culturalmente en un marco de comunicación digital en red como el actual. Para ello, toma como objeto de estudio el electro dance, un estilo de baile juvenil nacido en los suburbios parisinos hacia el 2006 y diseminado en años posteriores globamente, cuya aparición y desarrollo se da de la mano de la última ola de medios y plataformas digitales. La tesis propone, desde los planos teórico y empírico, un análisis de las prácticas cotidianas de los electrodancers que dan sentido a una forma de «estar juntos» y dedica especial atención a las que se apoyan intensivamente en el uso de medios digitales recientes. Conceptos como mediatization, interface, comunicación broadcast frente a network conforman un marco de interpretación desde el cual se observa cómo nociones habituales como las de público y privado, producción y consumo, local y global o interacción cara a cara y tecnológicamente mediada adquieren una articulación distintiva en nuestros días y son expresión de un entorno de comunicación masiva global en constante transformación.This thesis focuses on the study of what is known as ElectroDance youth style - i.e. a dance and sound style which began to blossom within Parisian clubs and affluent suburbs in 2006, spreading quickly across the globe in subsequent years. The aim is to explore the way youth cultures and styles are culturally produced under the conditions of the current global network communication environment. The thesis theoretically and empirically analyses electrodancers' everyday practices, paying special attention to those based on an intensive use of new media, that create a sense togetherness among the youth. Concepts such as mediatization, interface and broadcast versus network communication build an interpretative framework which allows a way to look at which of the traditional notions (public and private, production and consumption, global and local, or technologically-mediated and face-to-face interaction) manifest themselves differently nowadays, acquiring thus a new expression at a time when global mass-communication is in constant transformation

    Affect recognition & generation in-the-wild

    Affect recognition based on a subject’s facial expressions has been a topic of major research in the attempt to generate machines that can understand the way subjects feel, act and react. In the past, due to the unavailability of large amounts of data captured in real-life situations, research has mainly focused on controlled environments. However, recently, social media and platforms have been widely used. Moreover, deep learning has emerged as a means to solve visual analysis and recognition problems. This Ph.D. Thesis exploits these advances and makes significant contributions for affect analysis and recognition in-the-wild. We tackle affect analysis and recognition as a dual knowledge generation problem: i) we create new, large and rich in-the-wild databases and ii) we design and train novel deep neural architectures that are able to analyse affect over these databases and to successfully generalise their performance on other datasets. At first, we present the creation of Aff-Wild database annotated according to valence-arousal and an end-to-end CNN-RNN architecture, AffWildNet. Then we use AffWildNet as a robust prior for dimensional and categorical affect recognition and extend it by extracting low-/mid-/high-level latent information and analysing this via multiple RNNs. Additionally, we propose a novel loss function for DNN-based categorical affect recognition. Next, we generate Aff-Wild2, the first database containing annotations for all main behavior tasks: estimate Valence-Arousal; classify into Basic Expressions; detect Action Units. We develop multi-task and multi-modal extensions of AffWildNet by fusing these tasks and propose a novel holistic approach that utilises all existing databases with non-overlapping annotations and couples them through co-annotation and distribution matching. Finally, we present an approach for valence-arousal, or basic expressions’ facial affect synthesis. We generate an image with a given affect, or a sequence of images with evolving affect, by annotating a 4-D database and utilising a 3-D morphable model.Open Acces

    Immersed in Pop! Excursions into Compositional Design

    Recent changes in consumer audio and music technology and distribution - for example the addition of 3D audio formats such as Dolby Atmos to music streaming services, the recent release of “Spatial Audio” on Apple and Beats products, the proliferation of musical content in virtual reality and 360º videos, etc. - have reignited a public discourse on concepts of immersion and interactivity in popular music and media. This raises questions and necessitates a deepening of popular musicological discourse in these areas. This thesis thus asks: what is the relationship between so-called immersive media and immersive experience? How are immersive and interactive experiences of audiovisual popular music compositionally designed? And to what degree do interpretations of immersion and interactivity in popular music imply agency on part of the listener/viewer? To address these questions, Bresler has authored or co-authored four articles and book chapters on music in immersive and interactive media with a focus on compositional design and immersion in pop music. In the framing chapter, these articles are contextualized through the coining of the term immersive staging, which is a framework for understanding how the perceived relationship between the performer and listener is mediated through technology, performativity, audiovisual compositional design, and aesthetics. Additionally, the chapter makes a case for the hermeneutic methodologies employed throughout.publishedVersio

    CCTV as a Smart Sensor Network

    With the emergence of so-called 'smart CCTV' being able to recognise the precursors for disorder and civil disobedience, we present a preliminary study into using available CCTV networks augmented with big social media datasets. We examine the existing CCTV infrastructure in the UK, and use an agent-based simulation to model interactions between people based on friendship networks and features derived from their social media usage, proposing a novel algorithm for detection of psychopathy. Finally, we explore the frequency of crimes occurring within CCTV viewsheds using available UK police crime datasets to illustrate the current limitations of the CCTV infrastructure, as well as the potential ramifications of the stealthy emergence of CCTV networks as the fifth utility in smart cities

    Facial detection on digital videos

    Trabajo de Fin de Grado en Ingeniería Informática y Grado en Ingeniería del Software, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2019/2020.Hoy en día, las redes neuronales constituyen un pilar fundamental en el desarrollo de aplicaciones inteligentes. No obstante, dicho campo se encuentra aún en fase de expansión y son muchos los avances que se consiguen diariamente. La inteligencia artificial es la responsable de crear sistemas autónomos y de automatizar tareas que suponen una gran inversión de tiempo y esfuerzo de ser realizadas por seres humanos, como la identificación de objetos tanto en imágenes como en vídeos. Por otro lado, el análisis forense digital ha sufrido un avance considerable en los últimos años gracias a la evolución tecnológica continua que se está viviendo actualmente. Si se juntan la inteligencia artificial y el análisis forense, se puede mejorar considerablemente dicha labor, y permitir así abrir las puertas a futuros avances que den paso a funcionalidades que aún están por conocer. Este trabajo se centra en intentar crear sistemas óptimos especializados en encontrar rostros en vídeos con condiciones muy desfavorables, con el fin de encontrar comportamientos delictivos de manera rápida y eficiente, facilitando así la labor del análisis forense y desprenderse así de la necesidad de analizar los vídeos manualmente. Para ello, se ha realizado una extensa investigación tanto en los conceptos que engloban el campo de la detección facial, como en trabajos previos relacionados con este tema, para comprender en profundidad cuáles son las técnicas actuales en uso que podrían servir para este trabajo. Tras realizar una amplia experimentación probando con seis modelos distintos basados en aprendizaje profundo, se ha llegado a la conclusión de proponer dos modelos: uno especializado en la detección de vídeos en local capaz de inferir con una exactitud del 74,83 %, y otro adaptado para su funcionamiento en tiempo real con el que se alcanza una exactitud del 74,59%.Today, neural networks are a fundamental pillar in the development of intelligent applications. However, this field is still in a phase of expansion and many advances are being made every day. Artificial intelligence is responsible for creating autonomous systems and for automating tasks that require a great investment of time and effort to be carried out by human beings, such as the identification of objects in both images and videos. On the other hand, digital forensic analysis has undergone a considerable advance in recent years thanks to the continuous technological evolution that is currently taking place. If artificial intelligence and forensic analysis are brought together, this work can be considerably improved, thus opening the door to future advances that will give way to functionalities that are still unknown. This work is focused on trying to create optimal systems specialized in finding faces in videos with very unfavorable conditions, in order to find criminal behavior quickly and efficiently, thus facilitating the work of forensic analysis and thus get rid of the need to analyze the videos manually. For this purpose, extensive research has been carried out both in the concepts that encompass the field of facial detection, as well as in previous work related to this topic, in order to understand in depth which are the current techniques in use that could serve for this work. After carrying out extensive experimentation by testing six different models based on deep learning, the conclusion has been reached that two models should be proposed: one specialized in the detection of videos in premises capable of inferring with an accuracy of 74.83 %, and another adapted for real time operation with which an accuracy of 74.59% is achieved.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu