456 research outputs found

    A Deep Siamese Network for Scene Detection in Broadcast Videos

    Get PDF
    We present a model that automatically divides broadcast videos into coherent scenes by learning a distance measure between shots. Experiments are performed to demonstrate the effectiveness of our approach by comparing our algorithm against recent proposals for automatic scene segmentation. We also propose an improved performance measure that aims to reduce the gap between numerical evaluation and expected results, and propose and release a new benchmark dataset.Comment: ACM Multimedia 201

    Automatic Synchronization of Multi-User Photo Galleries

    Full text link
    In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solution that achieves better generalization performance for the synchronization task compared to the available literature. The method is characterized by three stages: at first, deep convolutional neural network features are used to assess the visual similarity among the photos; then, pairs of similar photos are detected across different galleries and used to construct a graph; eventually, a probabilistic graphical model is used to estimate the temporal offset of each pair of galleries, by traversing the minimum spanning tree extracted from this graph. The experimental evaluation is conducted on four publicly available datasets covering different types of events, demonstrating the strength of our proposed method. A thorough discussion of the obtained results is provided for a critical assessment of the quality in synchronization.Comment: ACCEPTED to IEEE Transactions on Multimedi

    Video Summarization Using Unsupervised Deep Learning

    Get PDF
    In this thesis, we address the task of video summarization using unsupervised deep-learning architectures. Video summarization aims to generate a short summary by selecting the most informative and important frames (key-frames) or fragments (key-fragments) of the full-length video, and presenting them in temporally-ordered fashion. Our objective is to overcome observed weaknesses of existing video summarization approaches that utilize RNNs for modeling the temporal dependence of frames, related to: i) the small influence of the estimated frame-level importance scores in the created video summary, ii) the insufficiency of RNNs to model long-range frames' dependence, and iii) the small amount of parallelizable operations during the training of RNNs. To address the first weakness, we propose a new unsupervised network architecture, called AC-SUM-GAN, which formulates the selection of important video fragments as a sequence generation task and learns this task by embedding an Actor-Critic model in a Generative Adversarial Network. The feedback of a trainable Discriminator is used as a reward by the Actor-Critic model in order to explore a space of actions and learn a value function (Critic) and a policy (Actor) for video fragment selection. To tackle the remaining weaknesses, we investigate the use of attention mechanisms for video summarization and propose a new supervised network architecture, called PGL-SUM, that combines global and local multi-head attention mechanisms which take into account the temporal position of the video frames, in order to discover different modelings of the frames' dependencies at different levels of granularity. Based on the acquired experience, we then propose a new unsupervised network architecture, called CA-SUM, which estimates the frames' importance using a novel concentrated attention mechanism that focuses on non-overlapping blocks in the main diagonal of the attention matrix and takes into account the attentive uniqueness and diversity of the associated frames of the video. All the proposed architectures have been extensively evaluated on the most commonly-used benchmark datasets, demonstrating their competitiveness against other approaches and documenting the contribution of our proposals on advancing the current state-of-the-art on video summarization. Finally, we make a first attempt on producing explanations for the video summarization results. Inspired by relevant works in the Natural Language Processing domain, we propose an attention-based method for explainable video summarization and we evaluate the performance of various explanation signals using our CA-SUM architecture and two benchmark datasets for video summarization. The experimental results indicate the advanced performance of explanation signals formed using the inherent attention weights, and demonstrate the ability of the proposed method to explain the video summarization results using clues about the focus of the attention mechanism

    Video Summarization Using Deep Neural Networks: A Survey

    Get PDF
    Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content. Several approaches have been developed over the last couple of decades and the current state of the art is represented by methods that rely on modern deep neural network architectures. This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization. After presenting the motivation behind the development of technologies for video summarization, we formulate the video summarization task and discuss the main characteristics of a typical deep-learning-based analysis pipeline. Then, we suggest a taxonomy of the existing algorithms and provide a systematic review of the relevant literature that shows the evolution of the deep-learning-based video summarization technologies and leads to suggestions for future developments. We then report on protocols for the objective evaluation of video summarization algorithms and we compare the performance of several deep-learning-based approaches. Based on the outcomes of these comparisons, as well as some documented considerations about the suitability of evaluation protocols, we indicate potential future research directions.Comment: Journal paper; Under revie

    AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization

    Get PDF
    This paper presents a new method for unsupervised video summarization. The proposed architecture embeds an Actor-Critic model into a Generative Adversarial Network and formulates the selection of important video fragments (that will be used to form the summary) as a sequence generation task. The Actor and the Critic take part in a game that incrementally leads to the selection of the video key-fragments, and their choices at each step of the game result in a set of rewards from the Discriminator. The designed training workflow allows the Actor and Critic to discover a space of actions and automatically learn a policy for key-fragment selection. Moreover, the introduced criterion for choosing the best model after the training ends, enables the automatic selection of proper values for parameters of the training process that are not learned from the data (such as the regularization factor σ). Experimental evaluation on two benchmark datasets (SumMe and TVSum) demonstrates that the proposed AC-SUM-GAN model performs consistently well and gives SoA results in comparison to unsupervised methods, that are also competitive with respect to supervised methods

    Hanging Noncalculous Gallbladder

    Get PDF
    The removal of acalculous and not acutely inflamed gall-bladder in patients with typical biliary pain remains a questionable procedure. This study was conducted to present our experience. In the period 1982- 90, 1089 cases of calculous and acalculous gallbladder disease were treated in our clinic. In this period, 27 patients were subjected to cholecystectomy because of an acalculous, non inflamed gallbladder which was elongated lying in an abnormal position with a long cystic duct. The mean duration ofsymptoms supportive of cholelithiasis, was 5 years. Oral cholecystogram and ultrasonography led to the diagnosis and other causes ofchronic abdominal pain were excluded. There were 13 lumbar, 9 pelvic and 5 iliac gallbladders, with poor function in 20 of them. During cholecystectomy, the organ was invested by peritoneum and suspended in 7 cases from a mesentery. On pathological examination mild chronic inflammation was reported in 19 cases and minimal changes in 8. The minimum follow up was one year and the maximum 9 years. Complete relief of symptoms was achieved in all the cases. In conclusion, cholecystectomy should be offered in these symptomatic "hanging" gallbladders

    A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization

    Get PDF
    In this paper we present our work on improving the efficiency of adversarial training for unsupervised video summarization. Our starting point is the SUM-GAN model, which creates a representative summary based on the intuition that such a summary should make it possible to reconstruct a video that is indistinguishable from the original one. We build on a publicly available implementation of a variation of this model, that includes a linear compression layer to reduce the number of learned parameters and applies an incremental approach for training the different components of the architecture. After assessing the impact of these changes to the model’s performance, we propose a stepwise, label-based learning process to improve the training efficiency of the adversarial part of the model. Before evaluating our model’s efficiency, we perform a thorough study with respect to the used evaluation protocols and we examine the possible performance on two benchmarking datasets, namely SumMe and TVSum. Experimental evaluations and comparisons with the state of the art highlight the competitiveness of the proposed method. An ablation study indicates the benefit of each applied change on the model’s performance, and points out the advantageous role of the introduced stepwise, label-based training strategy on the learning efficiency of the adversarial part of the architecture

    Deliverable D1.4 Visual, text and audio information analysis for hypervideo, final release

    Get PDF
    Having extensively evaluated the performance of the technologies included in the first release of WP1 multimedia analysis tools, using content from the LinkedTV scenarios and by participating in international benchmarking activities, concrete decisions regarding the appropriateness and the importance of each individual method or combination of methods were made, which, combined with an updated list of information needs for each scenario, led to a new set of analysis requirements that had to be addressed through the release of the final set of analysis techniques of WP1. To this end, coordinated efforts on three directions, including (a) the improvement of a number of methods in terms of accuracy and time efficiency, (b) the development of new technologies and (c) the definition of synergies between methods for obtaining new types of information via multimodal processing, resulted in the final bunch of multimedia analysis methods for video hyperlinking. Moreover, the different developed analysis modules have been integrated into a web-based infrastructure, allowing the fully automatic linking of the multitude of WP1 technologies and the overall LinkedTV platform

    Deliverable D1.1 State of the art and requirements analysis for hypervideo

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary
    • …
    corecore