713 research outputs found

    A Motion-Driven Approach for Fine-Grained Temporal Segmentation of User-Generated Videos

    Get PDF
    This paper presents an algorithm for the temporal segmentation of user-generated videos into visually coherent parts that correspond to individual video capturing activities. The latter include camera pan and tilt, change in focal length and camera displacement. The proposed approach identifies the aforementioned activities by extracting and evaluating the region-level spatio-temporal distribution of the optical flow over sequences of neighbouring video frames. The performance of the algorithm was evaluated with the help of a newly constructed ground-truth dataset, against several state-of-the-art techniques and variations of them. Extensive evaluation indicates the competitiveness of the proposed approach in terms of detection accuracy, and highlight its suitability for analysing large collections of data in a time-efficient manner

    Multimodal Video Annotation for Retrieval and Discovery of Newsworthy Video in a News Verification Scenario

    Get PDF
    © 2019, Springer Nature Switzerland AG. This paper describes the combination of advanced technologies for social-media-based story detection, story-based video retrieval and concept-based video (fragment) labeling under a novel approach for multimodal video annotation. This approach involves textual metadata, structural information and visual concepts - and a multimodal analytics dashboard that enables journalists to discover videos of news events, posted to social networks, in order to verify the details of the events shown. It outlines the characteristics of each individual method and describes how these techniques are blended to facilitate the content-based retrieval, discovery and summarization of (parts of) news videos. A set of case-driven experiments conducted with the help of journalists, indicate that the proposed multimodal video annotation mechanism - combined with a professional analytics dashboard which presents the collected and generated metadata about the news stories and their visual summaries - can support journalists in their content discovery and verification work

    Detecting Tampered Videos with Multimedia Forensics and Deep Learning

    Get PDF
    © 2019, Springer Nature Switzerland AG. User-Generated Content (UGC) has become an integral part of the news reporting cycle. As a result, the need to verify videos collected from social media and Web sources is becoming increasingly important for news organisations. While video verification is attracting a lot of attention, there has been limited effort so far in applying video forensics to real-world data. In this work we present an approach for automatic video manipulation detection inspired by manual verification approaches. In a typical manual verification setting, video filter outputs are visually interpreted by human experts. We use two such forensics filters designed for manual verification, one based on Discrete Cosine Transform (DCT) coefficients and a second based on video requantization errors, and combine them with Deep Convolutional Neural Networks (CNN) designed for image classification. We compare the performance of the proposed approach to other works from the state of the art, and discover that, while competing approaches perform better when trained with videos from the same dataset, one of the proposed filters demonstrates superior performance in cross-dataset settings. We discuss the implications of our work and the limitations of the current experimental setup, and propose directions for future research in this area

    Capturing Nutrition Data for Sports: Challenges and Ethical Issues

    Get PDF
    Presentation at the 29th International Conference on Multimedia Modeling, 09. - 13.01.23, Bergen, Norway: https://www.mmm2023.no/.Nutritionplaysakeyroleinanathlete’s performance, health, and mental well-being. Capturing nutrition data is crucial for analyzing those relations and performing necessary interventions. Using traditional methods to capture long-term nutritional data requires intensive labor, and is prone to errors and biases. Artificial Intelligence (AI) methods can be used to remedy such problems by using Image-Based Dietary Assessment (IBDA) methods where athletes can take pictures of their food before consuming it. However, the current state of IBDA is not perfect. In this paper, we discuss the challenges faced in employing such methods to capture nutrition data. We also discuss ethical and legal issues that must be addressed before using these methods on a large scale

    Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

    Full text link
    Video topic segmentation unveils the coarse-grained semantic structure underlying videos and is essential for other video understanding tasks. Given the recent surge in multi-modal, relying solely on a single modality is arguably insufficient. On the other hand, prior solutions for similar tasks like video scene/shot segmentation cater to short videos with clear visual shifts but falter for long videos with subtle changes, such as livestreams. In this paper, we introduce a multi-modal video topic segmenter that utilizes both video transcripts and frames, bolstered by a cross-modal attention mechanism. Furthermore, we propose a dual-contrastive learning framework adhering to the unsupervised domain adaptation paradigm, enhancing our model's adaptability to longer, more semantically complex videos. Experiments on short and long video corpora demonstrate that our proposed solution, significantly surpasses baseline methods in terms of both accuracy and transferability, in both intra- and cross-domain settings.Comment: Accepted at the 30th International Conference on Multimedia Modeling (MMM 2024

    Multi-Layer Local Graph Words for Object Recognition

    Full text link
    In this paper, we propose a new multi-layer structural approach for the task of object based image retrieval. In our work we tackle the problem of structural organization of local features. The structural features we propose are nested multi-layered local graphs built upon sets of SURF feature points with Delaunay triangulation. A Bag-of-Visual-Words (BoVW) framework is applied on these graphs, giving birth to a Bag-of-Graph-Words representation. The multi-layer nature of the descriptors consists in scaling from trivial Delaunay graphs - isolated feature points - by increasing the number of nodes layer by layer up to graphs with maximal number of nodes. For each layer of graphs its own visual dictionary is built. The experiments conducted on the SIVAL and Caltech-101 data sets reveal that the graph features at different layers exhibit complementary performances on the same content and perform better than baseline BoVW approach. The combination of all existing layers, yields significant improvement of the object recognition performance compared to single level approaches.Comment: International Conference on MultiMedia Modeling, Klagenfurt : Autriche (2012

    Keystroke Dynamics as Part of Lifelogging

    Get PDF
    In this paper we present the case for including keystroke dynamics in lifelogging. We describe how we have used a simple keystroke logging application called Loggerman, to create a dataset of longitudinal keystroke timing data spanning a period of more than 6 months for 4 participants. We perform a detailed analysis of this data by examining the timing information associated with bigrams or pairs of adjacently-typed alphabetic characters. We show how there is very little day-on-day variation of the keystroke timing among the top-200 bigrams for some participants and for others there is a lot and this correlates with the amount of typing each would do on a daily basis. We explore how daily variations could correlate with sleep score from the previous night but find no significant relation-ship between the two. Finally we describe the public release of this data as well including as a series of pointers for future work including correlating keystroke dynamics with mood and fatigue during the day.Comment: Accepted to 27th International Conference on Multimedia Modeling, Prague, Czech Republic, June 202
    corecore