2,486 research outputs found

    Towards an automatic semantic annotation of car aesthetics

    Get PDF
    The design of a new car is guided by a set of directives indicating the target market and specific engineering and aesthetic constraints, which may include also the preservation of the company brand identity or the restyling of products already on the market. When creating a new product designers are used to evaluating other existing products to take inspiration or to possibly reuse successful solutions. In the perspective of an optimised styling workflow a great benefit could come from the opportunity of easily retrieving the related documentation and existing digital models both from internal and external repositories. In fact, the rapid growth of the web contents and the widely spread adoption of computerassisted design tools have made a huge amount of digital data available, whose exploitation could be improved by more selective retrieval methods. In particular, the retrieval of aesthetic elements may help designers to more efficiently create digital models conforming to specific styling properties. The aim of the research described in this document is the definition of a framework able to support a (semi-) automatic extraction of semantic data from 3D models and other multimedia data to allow car designers to reuse knowledge and design solutions within the styling department. The first objective is then capturing and structuring both the explicit and implicit elements that contribute to the car aesthetics and can be realistically tackled through computational models and methods. The second step is the definition of a system architecture able to transfer such semantics through an automatic annotation of car models

    Scene extraction in motion pictures

    Full text link
    This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today\u27s content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from fill production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Fill Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. In addition, we present different refinement mechanisms, such as film-punctuation detection founded on Film Grammar, to further improve the results. These refinement techniques demonstrate significant improvements in overall performance. Furthermore, we analyze errors in the context of film-production techniques, which offer useful insights into the limitations of our method

    A Data-Driven Approach for Tag Refinement and Localization in Web Videos

    Get PDF
    Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of media collections, e.g. using tag clouds, or to retrieve multimedia content. However, not all media are equally tagged by users. Using the current systems is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook. On the other hand, tagging a video sequence is more complicated and time consuming, so that users just tag the overall content of a video. In this paper we present a method for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to keyframes. Our approach exploits collective knowledge embedded in user-generated tags and web sources, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr, as well as web sources like Google and Bing. Given a keyframe, our method is able to select on the fly from these visual sources the training exemplars that should be the most relevant for this test sample, and proceeds to transfer labels across similar images. Compared to existing video tagging approaches that require training classifiers for each tag, our system has few parameters, is easy to implement and can deal with an open vocabulary scenario. We demonstrate the approach on tag refinement and localization on DUT-WEBV, a large dataset of web videos, and show state-of-the-art results.Comment: Preprint submitted to Computer Vision and Image Understanding (CVIU

    Aesthetics assessment of videos through visual descriptors and automatic polarity annotation

    Get PDF
    En un mundo en el que las nuevas tecnologías están cada vez más ligadas a la información multimedia, el desarrollo de herramientas que permitan manejar fácilmente este tipo de datos se ha convertido en una tarea imprescindible, que ha despertado el interés científico en los últimos años. De entre las líneas de investigación que han empezado a desarrollarse recientemente, el estudio de características subjetivas en material audiovisual a partir de datos objetivos es de especial interés por cuanto puede ser aplicado a sistemas de clasificación y de recomendación. Este documento presenta un trabajo de investigación centrado en el estudio de modelos que permitan predecir automáticamente la satisfacción o interés que despierta un vídeo, concretamente un anuncio publicitario de un coche, en los usuarios de YouTube que lo ven, a partir de los descriptores de bajo nivel del v ́ıdeo. Un aspecto novedoso de este trabajo es el planteamiento de una solución para este tipo de problemas basada en un procedimiento para obtener automáticamente el etiquetado de los vídeos mediante técnicas de aprendizaje no supervisado. Para ello, se ha adquirido un conjunto de anuncios de coches junto con los metadatos asociados a cada vídeo que proporcionan los usuarios y que ofrecen información referente a la satisfacción que perciben estos cuando los visualizan en YouTube. Estos metadatos han permitido diseñar tres estrategias de análisis cluster para anotar automáticamente los vídeos, utilizando cada una de ellas un conjunto de metadatos diferente, de acuerdo a la manera en que los mismos son proporcionados por los usuarios. Por otro lado, se ha extraído, mediante técnicas de procesamiento de imagen y vídeo, un conjunto descriptores visuales de cada vídeo para posteriormente entrenar un sistema de aprendizaje de máquina que ha permitido el estudio de la relevancia y utilidad de este conjunto de descriptores para predecir el valor estético de los vídeos percibido por los usuarios.Grado en Ingeniería de Sistemas Audiovisuale

    Value annotation of web resources: the ValueML Language

    Get PDF
    In the multimedia design field, we have recently witnessed a shift of focus from products and the user' s experience to social effects of technologies and the quality of life. In this context, values play an important role. They may be inscribed within an artifact as symbolic meanings or as a built- in use consequence. In spite of their growing relevance, there is not yet a markup language for value annotation. This paper describes a proposal for filling this gap. After a brief review of various perspectives on the concept of value and relevant taxonomies, we discuss the syntax and semantics of a preliminary version of the ValueML language together with an example of annotation of a commercial video clip

    Role of shot length in characterizing tempo and dramatic story sections in motion pictures

    Full text link
    Motivated by existing cinematic conventions known as film grammar, we proposed a computational approach to determine tempo as a high-level movie content descriptor as well as means for deriving dramatic story sections and events occurring in movies. Movie tempo is extracted from two easily computed aspects in our approach: shot length and motion. Story sections and events are generally associated with changes in tempo, and are thus identified by edges located in the tempo function. In this paper, we analyze our initial founding of the tempo function on the basis that the distribution of both shot length and motion in movies is normal. Given that the distribution of shot length is approximately Weibull as confirmed in our experiments, we examine the impact of modelling and modifying the contributions of shot length to tempo. We derive an appropriate normalization function that faithfully encapsulates the role of shot length in tempo perception, and analyze the changes to the story sections identified in films.<br /
    corecore