10 research outputs found

    Informative Content Extraction through Key Frames

    Get PDF
    Nowadays, a huge amount of multimedia data is available on internet. As there is much redundancy, we can’t go through all this data. For various purposes this data is browsed, retrieved and processed. But when time and speed constraints are taken into consideration, accessing the data is very inefficient. Video Summarization is one of the way which makes it possible to give non-redundant, effective, feature based abstract view of the video which is covering entire contents in terms of the selected key frames. Feature based selected frames will lead to the summarized video. There are two main technique of video summarization i.e. key frame based and video skimming. This paper focuses on key frame extraction using video abstraction, visual descriptors and bag of visual words approach

    Glimpse: A gaze-based measure of temporal salience

    Get PDF
    Temporal salience considers how visual attention varies over time. Although visual salience has been widely studied from a spatial perspective, its temporal dimension has been mostly ignored, despite arguably being of utmost importance to understand the temporal evolution of attention on dynamic contents. To address this gap, we proposed GLIMPSE, a novel measure to compute temporal salience based on the observer-spatio-temporal consistency of raw gaze data. The measure is conceptually simple, training free, and provides a semantically meaningful quantification of visual attention over time. As an extension, we explored scoring algorithms to estimate temporal salience from spatial salience maps predicted with existing computational models. However, these approaches generally fall short when compared with our proposed gaze-based measure. GLIMPSE could serve as the basis for several downstream tasks such as segmentation or summarization of videos. GLIMPSE’s software and data are publicly available

    Effective video summarization approach based on visual attention

    Get PDF
    Video summarization is applied to reduce redundancy and develop a concise representation of key frames in the video, more recently, video summaries have been used through visual attention modeling. In these schemes, the frames that stand out visually are extracted as key frames based on human attention modeling theories. The schemes for modeling visual attention have proven to be effective for video summaries. Nevertheless, the high cost of computing in such techniques restricts their usability in everyday situations. In this context, we propose a method based on KFE (key frame extraction) technique, which is recommended based on an efficient and accurate visual attention model. The calculation effort is minimized by utilizing dynamic visual highlighting based on the temporal gradient instead of the traditional optical flow techniques. In addition, an efficient technique using a discrete cosine transformation is utilized for the static visual salience. The dynamic and static visual attention metrics are merged by means of a non-linear weighted fusion technique. Results of the systemare compared with some existing stateof- the-art techniques for the betterment of accuracy. The experimental results of our proposed model indicate the efficiency and high standard in terms of the key frames extraction as output.Qatar University - No. IRCC-2021-010

    Towards key-frame extraction methods for 3D video: a review

    Get PDF
    The increasing rate of creation and use of 3D video content leads to a pressing need for methods capable of lowering the cost of 3D video searching, browsing and indexing operations, with improved content selection performance. Video summarisation methods specifically tailored for 3D video content fulfil these requirements. This paper presents a review of the state-of-the-art of a crucial component of 3D video summarisation algorithms: the key-frame extraction methods. The methods reviewed cover 3D video key-frame extraction as well as shot boundary detection methods specific for use in 3D video. The performance metrics used to evaluate the key-frame extraction methods and the summaries derived from those key-frames are presented and discussed. The applications of these methods are also presented and discussed, followed by an exposition about current research challenges on 3D video summarisation methods

    Resource Allocation for Personalized Video Summarization

    Full text link

    Research genres and multiliteracies: channelling the audience's gaze in powerpoint presentations

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão. Programa de Pós-Graduação em Letras/Inglês e Literatura Correspondente.PowerPoint-supported presentations have become an important event for creating and sharing scientific knowledge within and across disciplines (LaPorte et al., 2002; Kunkel, 2004; Tardy, 2005; Adams, 2006). Yet little is known about the ways semiotic resources enabled by PowerPoint technology of slide editing and management (e.g. slide dimensions, layout, colour) are combined with conventional resources of "research talks" (Swales, 2005[2004]) and contribute to building presentations that are valued in specific contexts. In order to inform our understanding of how research meanings are multimodally made under the influence of the software, in this thesis I investigate a set of fourteen PowerPoint Research Presentations (PPRPs) from Applied Linguistics. Two planes of cohesion are explored: (1) along the slideshows; and (2) between the slideshows and the performance. Regarding the first plane, the analysis of "periodicity" (Martin and Rose, 2007[2003]) revealed that applied linguists foreground the software's 'modularised logic', construing 'serial expansion' (Martin and Rose (2007[2003]). Others however customise slideshows so as to build 'Design Hierarchies', in which particular slides are assigned higher discursive statuses. These presenters construed a path for their audiences gaze by a configuration of semiotic resources of the display mode - e.g. slide position, background, layout, typography. As for the second plane of cohesion, I propose that slides and performance relate by 'synchronicity'. The tool recontextualizes the system of taxis (Halliday, 2009c; Halliday and Matthiessen, 2004) to account for the semantic interdependency between the displayed discourse and the performative discourse at a given point in PPRPs. In each of the cohesive planes, I set out to identify the software resources that play a role in construing cohesive ties, and evaluate both their "functional specialization" (cf. Halliday, 2009e[1975]; Kress, 2008[2003]; Jewitt and Kress, 2008[2003]) and the demands they impose on presenters and on audiences in terms of genre, discipline, software and multimodal literacies. By indicating some of the ways in which the software influences the "process of semiotic production" (Kress and van Leeuwen, 2001) of such practice, I intend to move beyond prescriptive (e. g. Costa, 2001; Cyphert, 2004; DuFrene and Lehman, 2004; Grant, 2010) as well as technically-focused (e.g. Downing and Garmon, 2002; Jones, 2003) accounts of PowerPoint. As a conclusion, I suggest that descriptions of the meaning potential in PPRPs and its conditions of access should be incorporated in pedagogies of academic multiliteracies (New London Group, 1996; Kope and Kalantizs, 2000).Apresentações de pesquisa com uso de PowerPoint desempenham um papel importante na criação e negociação de conhecimento científico em diferentes disciplinas (LaPorte et al., 2002; Kunkel, 2004; Tardy, 2005; Adams, 2006). Entretanto, pouco sabemos sobre os modos como os recursos semióticos potencializados pela tecnologia PowerPoint para edição e gerenciamento de slides (e.g. dimensões do slide, arranjo, cor) são combinados com recursos convencionais dos "relatos de pesquisa" (Swales, 2005[2004]) e contribuem para construir apresentações valorizadas em contextos específicos. No intuito de informar nosso entendimento sobre como significados de pesquisa são multimodalmente construídos sob a influência do software, nesta tese, investigo um conjunto de quatorze apresentações de pesquisa em PowerPoint (APPP) em Linguística Aplicada. Dois planos coesivos são explorados: (1) ao longo do texto em slides; e (2) entre os slides e a performance. No tocante ao primeiro plano, a análise da "periodicidade" (Martin e Rose, 2007[2003]) da informação revelou que os linguistas aplicados tendem a aderir à 'lógica modularizada' do software, realizando "expansão em série" (Martin e Rose (2007[2003]) do discurso. Outros, porém, 'personalizam' o texto em slides de modo a construir 'Hieraquias de Desenho', as quais atribuem valor de informação superordinada à determinados slides. Esses apresentadores direcionam o olhar de sua audiência por meio de uma configuração de recursos semióticos particulares do modo de exibição (e.g. sequência, fundo, arranjo, tipografia). Quanto ao segundo plano coesivo, proponho que slides e performance se relacionam por 'sincronicidade'. Essa erramenta recontextualiza o sistema de taxe (Halliday, 2009c; Halliday e Matthiessen, 2004) para explicar a interdependência semântica entre o discurso exibido e o discurso performado em um determinado ponto da APPP. Em cada um dos planos coesivos, busco identificar os recursos do software que desempenham função coesiva e avaliar tanto a sua "especialização funcional" (cf. Halliday, 2009e[1975]; Kress, 2008[2003]; Jewitt e Kress, 2008[2003]) quanto as demandas de letramento que impõem nos apresentadores e na audiência no que tange a gênero, disciplina, software e multimodalidade. Ao apontar alguns dos modos pelos quais o software influencia o "processo de produção semiótica" (Kress e van Leeuwen, 2001) dessa prática, pretendo ir além de orientações prescritivas (e. g. Costa, 2001; Cyphert, 2004; DuFrene e Lehman, 2004; Grant, 2010) e focadas em aspectos técnicos (e.g. Downing and Garmon, 2002; Jones, 2003). Sugiro, por fim, que a descrição dos significados potenciais em APPP e suas condições de acesso sejam incorporadas em pedagogias de multiletramentos acadêmicos (New London Group, 1996; Kope e Kalantizs, 2000)

    Video Summarization Using Unsupervised Deep Learning

    Get PDF
    In this thesis, we address the task of video summarization using unsupervised deep-learning architectures. Video summarization aims to generate a short summary by selecting the most informative and important frames (key-frames) or fragments (key-fragments) of the full-length video, and presenting them in temporally-ordered fashion. Our objective is to overcome observed weaknesses of existing video summarization approaches that utilize RNNs for modeling the temporal dependence of frames, related to: i) the small influence of the estimated frame-level importance scores in the created video summary, ii) the insufficiency of RNNs to model long-range frames' dependence, and iii) the small amount of parallelizable operations during the training of RNNs. To address the first weakness, we propose a new unsupervised network architecture, called AC-SUM-GAN, which formulates the selection of important video fragments as a sequence generation task and learns this task by embedding an Actor-Critic model in a Generative Adversarial Network. The feedback of a trainable Discriminator is used as a reward by the Actor-Critic model in order to explore a space of actions and learn a value function (Critic) and a policy (Actor) for video fragment selection. To tackle the remaining weaknesses, we investigate the use of attention mechanisms for video summarization and propose a new supervised network architecture, called PGL-SUM, that combines global and local multi-head attention mechanisms which take into account the temporal position of the video frames, in order to discover different modelings of the frames' dependencies at different levels of granularity. Based on the acquired experience, we then propose a new unsupervised network architecture, called CA-SUM, which estimates the frames' importance using a novel concentrated attention mechanism that focuses on non-overlapping blocks in the main diagonal of the attention matrix and takes into account the attentive uniqueness and diversity of the associated frames of the video. All the proposed architectures have been extensively evaluated on the most commonly-used benchmark datasets, demonstrating their competitiveness against other approaches and documenting the contribution of our proposals on advancing the current state-of-the-art on video summarization. Finally, we make a first attempt on producing explanations for the video summarization results. Inspired by relevant works in the Natural Language Processing domain, we propose an attention-based method for explainable video summarization and we evaluate the performance of various explanation signals using our CA-SUM architecture and two benchmark datasets for video summarization. The experimental results indicate the advanced performance of explanation signals formed using the inherent attention weights, and demonstrate the ability of the proposed method to explain the video summarization results using clues about the focus of the attention mechanism

    Contribution to study and implementation of a bio-inspired perception system based on visual and auditory attention

    Get PDF
    The main goal of these researches is the design of one artificial perception system allowing to identify events or scenes in a complex environment. The work carried out during this thesis focused on the study and the conception of a bio-inspired perception system based on the both visual and auditory saliency. The main contributions of this thesis are auditory saliency with sound recognition and visual saliency with object recognition. The auditory saliency is computed by merging information from the both temporal and spectral signals with a saliency map of a spectrogram. The visual perception system is based on visual saliency and recognition of foreground object. In addition, the originality of the proposed approach is the possibility to do an evaluation of the coherence between visual and auditory observations using the obtained information from the features extracted from both visual and auditory patters. The experimental results have proven the interest of this method in the framework of scene identification in a complex environmentL'objectif principal de cette thèse porte sur la conception d'un système de perception artificiel permettant d'identifier des scènes ou évènements pertinents dans des environnements complexes. Les travaux réalisés ont permis d'étudier et de mettre en œuvre d'un système de perception bio-inspiré basé sur l'attention visuelle et auditive. Les principales contributions de cette thèse concernent la saillance auditive associée à une identification des sons et bruits environnementaux ainsi que la saillance visuelle associée à une reconnaissance d'objets pertinents. La saillance du signal sonore est calculée en fusionnant des informations extraites des représentations temporelles et spectrales du signal acoustique avec une carte de saillance visuelle du spectrogramme du signal concerné. Le système de perception visuelle est quant à lui composé de deux mécanismes distincts. Le premier se base sur des méthodes de saillance visuelle et le deuxième permet d'identifier l'objet en premier plan. D'autre part, l'originalité de notre approche est qu'elle permet d'évaluer la cohérence des observations en fusionnant les informations extraites des signaux auditifs et visuels perçus. Les résultats expérimentaux ont permis de confirmer l'intérêt des méthodes utilisées dans le cadre de l'identification de scènes pertinentes dans un environnement complex

    Tematski zbornik radova međunarodnog značaja. Tom 3 / Međunarodni naučni skup “Dani Arčibalda Rajsa”, Beograd, 3-4. mart 2015.

    Get PDF
    In front of you is the Thematic Collection of Papers presented at the International Scientific Confer-ence “Archibald Reiss Days”, which was organized by the Academy of Criminalistic and Police Studies in Belgrade, in co-operation with the Ministry of Interior and the Ministry of Education, Science and Techno-logical Development of the Republic of Serbia, National Police University of China, Lviv State University of Internal Affairs, Volgograd Academy of the Russian Internal Affairs Ministry, Faculty of Security in Skopje, Faculty of Criminal Justice and Security in Ljubljana, Police Academy “Alexandru Ioan Cuza“ in Bucharest, Academy of Police Force in Bratislava and Police College in Banjaluka, and held at the Academy of Crimi-nalistic and Police Studies, on 3 and 4 March 2015.International Scientific Conference “Archibald Reiss Days” is organized for the fifth time in a row, in memory of the founder and director of the first modern higher police school in Serbia, Rodolphe Archibald Reiss, PhD, after whom the Conference was named.The Thematic Collection of Papers contains 168 papers written by eminent scholars in the field of law, security, criminalistics, police studies, forensics, informatics, as well as members of national security system participating in education of the police, army and other security services from Spain, Russia, Ukraine, Bela-rus, China, Poland, Armenia, Portugal, Turkey, Austria, Slovakia, Hungary, Slovenia, Macedonia, Croatia, Montenegro, Bosnia and Herzegovina, Republic of Srpska and Serbia. Each paper has been reviewed by two reviewers, international experts competent for the field to which the paper is related, and the Thematic Conference Proceedings in whole has been reviewed by five competent international reviewers.The papers published in the Thematic Collection of Papers contain the overview of contemporary trends in the development of police education system, development of the police and contemporary secu-rity, criminalistic and forensic concepts. Furthermore, they provide us with the analysis of the rule of law activities in crime suppression, situation and trends in the above-mentioned fields, as well as suggestions on how to systematically deal with these issues. The Collection of Papers represents a significant contribution to the existing fund of scientific and expert knowledge in the field of criminalistic, security, penal and legal theory and practice. Publication of this Collection contributes to improving of mutual cooperation between educational, scientific and expert institutions at national, regional and international level