4,700 research outputs found

    Aesthetics assessment of videos through visual descriptors and automatic polarity annotation

    Get PDF
    En un mundo en el que las nuevas tecnologías están cada vez más ligadas a la información multimedia, el desarrollo de herramientas que permitan manejar fácilmente este tipo de datos se ha convertido en una tarea imprescindible, que ha despertado el interés científico en los últimos años. De entre las líneas de investigación que han empezado a desarrollarse recientemente, el estudio de características subjetivas en material audiovisual a partir de datos objetivos es de especial interés por cuanto puede ser aplicado a sistemas de clasificación y de recomendación. Este documento presenta un trabajo de investigación centrado en el estudio de modelos que permitan predecir automáticamente la satisfacción o interés que despierta un vídeo, concretamente un anuncio publicitario de un coche, en los usuarios de YouTube que lo ven, a partir de los descriptores de bajo nivel del v ́ıdeo. Un aspecto novedoso de este trabajo es el planteamiento de una solución para este tipo de problemas basada en un procedimiento para obtener automáticamente el etiquetado de los vídeos mediante técnicas de aprendizaje no supervisado. Para ello, se ha adquirido un conjunto de anuncios de coches junto con los metadatos asociados a cada vídeo que proporcionan los usuarios y que ofrecen información referente a la satisfacción que perciben estos cuando los visualizan en YouTube. Estos metadatos han permitido diseñar tres estrategias de análisis cluster para anotar automáticamente los vídeos, utilizando cada una de ellas un conjunto de metadatos diferente, de acuerdo a la manera en que los mismos son proporcionados por los usuarios. Por otro lado, se ha extraído, mediante técnicas de procesamiento de imagen y vídeo, un conjunto descriptores visuales de cada vídeo para posteriormente entrenar un sistema de aprendizaje de máquina que ha permitido el estudio de la relevancia y utilidad de este conjunto de descriptores para predecir el valor estético de los vídeos percibido por los usuarios.Grado en Ingeniería de Sistemas Audiovisuale

    Highly efficient low-level feature extraction for video representation and retrieval.

    Get PDF
    PhDWitnessing the omnipresence of digital video media, the research community has raised the question of its meaningful use and management. Stored in immense multimedia databases, digital videos need to be retrieved and structured in an intelligent way, relying on the content and the rich semantics involved. Current Content Based Video Indexing and Retrieval systems face the problem of the semantic gap between the simplicity of the available visual features and the richness of user semantics. This work focuses on the issues of efficiency and scalability in video indexing and retrieval to facilitate a video representation model capable of semantic annotation. A highly efficient algorithm for temporal analysis and key-frame extraction is developed. It is based on the prediction information extracted directly from the compressed domain features and the robust scalable analysis in the temporal domain. Furthermore, a hierarchical quantisation of the colour features in the descriptor space is presented. Derived from the extracted set of low-level features, a video representation model that enables semantic annotation and contextual genre classification is designed. Results demonstrate the efficiency and robustness of the temporal analysis algorithm that runs in real time maintaining the high precision and recall of the detection task. Adaptive key-frame extraction and summarisation achieve a good overview of the visual content, while the colour quantisation algorithm efficiently creates hierarchical set of descriptors. Finally, the video representation model, supported by the genre classification algorithm, achieves excellent results in an automatic annotation system by linking the video clips with a limited lexicon of related keywords

    A comprehensive survey of multi-view video summarization

    Full text link
    [EN] There has been an exponential growth in the amount of visual data on a daily basis acquired from single or multi-view surveillance camera networks. This massive amount of data requires efficient mechanisms such as video summarization to ensure that only significant data are reported and the redundancy is reduced. Multi-view video summarization (MVS) is a less redundant and more concise way of providing information from the video content of all the cameras in the form of either keyframes or video segments. This paper presents an overview of the existing strategies proposed for MVS, including their advantages and drawbacks. Our survey covers the genericsteps in MVS, such as the pre-processing of video data, feature extraction, and post-processing followed by summary generation. We also describe the datasets that are available for the evaluation of MVS. Finally, we examine the major current issues related to MVS and put forward the recommendations for future research(1). (C) 2020 Elsevier Ltd. All rights reserved.This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1A2B5B01070067)Hussain, T.; Muhammad, K.; Ding, W.; Lloret, J.; Baik, SW.; De Albuquerque, VHC. (2021). A comprehensive survey of multi-view video summarization. Pattern Recognition. 109:1-15. https://doi.org/10.1016/j.patcog.2020.10756711510

    Deep Learning for Video Object Segmentation:A Review

    Get PDF
    As one of the fundamental problems in the field of video understanding, video object segmentation aims at segmenting objects of interest throughout the given video sequence. Recently, with the advancements of deep learning techniques, deep neural networks have shown outstanding performance improvements in many computer vision applications, with video object segmentation being one of the most advocated and intensively investigated. In this paper, we present a systematic review of the deep learning-based video segmentation literature, highlighting the pros and cons of each category of approaches. Concretely, we start by introducing the definition, background concepts and basic ideas of algorithms in this field. Subsequently, we summarise the datasets for training and testing a video object segmentation algorithm, as well as common challenges and evaluation metrics. Next, previous works are grouped and reviewed based on how they extract and use spatial and temporal features, where their architectures, contributions and the differences among each other are elaborated. At last, the quantitative and qualitative results of several representative methods on a dataset with many remaining challenges are provided and analysed, followed by further discussions on future research directions. This article is expected to serve as a tutorial and source of reference for learners intended to quickly grasp the current progress in this research area and practitioners interested in applying the video object segmentation methods to their problems. A public website is built to collect and track the related works in this field: https://github.com/gaomingqi/VOS-Review

    Tracking technical refinement in elite performers: The good, the better, and the ugly

    Get PDF
    This study extends coaching research examining the practical implementation of technical refinement in elite-level golfers. In doing so, we provide an initial check of precepts pertaining to the Five-A Model and, examine the dynamics between coaching, psychomotor, biomechanical and psychological inputs to the process. Three case studies of golfers attempting refinements to their already well-established techniques are reported. Kinematic data were supplemented with intra-individual movement variability and self-perceptions of mental effort as measures of tracking behaviour and motor control. Results showed different levels of success in refining technique and subsequent ability to return to executing under largely subconscious control. In one case, the technique was refined as intended but without consistent reduction of conscious attention, in another, both were successfully apparent, whereas in the third case neither was achieved. Implications of these studies are discussed with reference to the process’ interdisciplinary nature and importance of the initial and final stages

    A high speed Tri-Vision system for automotive applications

    Get PDF
    Purpose: Cameras are excellent ways of non-invasively monitoring the interior and exterior of vehicles. In particular, high speed stereovision and multivision systems are important for transport applications such as driver eye tracking or collision avoidance. This paper addresses the synchronisation problem which arises when multivision camera systems are used to capture the high speed motion common in such applications. Methods: An experimental, high-speed tri-vision camera system intended for real-time driver eye-blink and saccade measurement was designed, developed, implemented and tested using prototype, ultra-high dynamic range, automotive-grade image sensors specifically developed by E2V (formerly Atmel) Grenoble SA as part of the European FP6 project – sensation (advanced sensor development for attention stress, vigilance and sleep/wakefulness monitoring). Results : The developed system can sustain frame rates of 59.8 Hz at the full stereovision resolution of 1280 × 480 but this can reach 750 Hz when a 10 k pixel Region of Interest (ROI) is used, with a maximum global shutter speed of 1/48000 s and a shutter efficiency of 99.7%. The data can be reliably transmitted uncompressed over standard copper Camera-Link® cables over 5 metres. The synchronisation error between the left and right stereo images is less than 100 ps and this has been verified both electrically and optically. Synchronisation is automatically established at boot-up and maintained during resolution changes. A third camera in the set can be configured independently. The dynamic range of the 10bit sensors exceeds 123 dB with a spectral sensitivity extending well into the infra-red range. Conclusion: The system was subjected to a comprehensive testing protocol, which confirms that the salient requirements for the driver monitoring application are adequately met and in some respects, exceeded. The synchronisation technique presented may also benefit several other automotive stereovision applications including near and far-field obstacle detection and collision avoidance, road condition monitoring and others.Partially funded by the EU FP6 through the IST-507231 SENSATION project.peer-reviewe

    An investigation into feature effectiveness for multimedia hyperlinking

    Get PDF
    The increasing amount of archival multimedia content available online is creating increasing opportunities for users who are interested in exploratory search behaviour such as browsing. The user experience with online collections could therefore be improved by enabling navigation and recommendation within multimedia archives, which can be supported by allowing a user to follow a set of hyperlinks created within or across documents. The main goal of this study is to compare the performance of dierent multimedia features for automatic hyperlink generation. In our work we construct multimedia hyperlinks by indexing and searching textual and visual features extracted from the blip.tv dataset. A user-driven evaluation strategy is then proposed by applying the Amazon Mechanical Turk (AMT) crowdsourcing platform, since we believe that AMT workers represent a good example of "real world" users. We conclude that textual features exhibit better performance than visual features for multimedia hyperlink construction. In general, a combination of ASR transcripts and metadata provides the best results
    corecore