Search CORE

881 research outputs found

A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION

Author: Arifin Sutjipoto
Arifin Sutjipoto
Publication venue: Electrical & Electronic Engineering, Imperial College London
Publication date: 01/10/2008
Field of study

VIDEO segmentation facilitates e±cient video indexing and navigation in large digital video archives. It is an important process in a content-based video indexing and retrieval (CBVIR) system. Many automated solutions performed seg- mentation by utilizing information about the \facts" of the video. These \facts" come in the form of labels that describe the objects which are captured by the cam- era. This type of solutions was able to achieve good and consistent results for some video genres such as news programs and informational presentations. The content format of this type of videos is generally quite standard, and automated solutions were designed to follow these format rules. For example in [1], the presence of news anchor persons was used as a cue to determine the start and end of a meaningful news segment. The same cannot be said for video genres such as movies and feature films. This is because makers of this type of videos utilized different filming techniques to design their videos in order to elicit certain affective response from their targeted audience. Humans usually perform manual video segmentation by trying to relate changes in time and locale to discontinuities in meaning [2]. As a result, viewers usually have doubts about the boundary locations of a meaningful video segment due to their different affective responses. This thesis presents an entirely new view to the problem of high level video segmentation. We developed a novel probabilistic method for affective level video content analysis and segmentation. Our method had two stages. In the first stage, a®ective content labels were assigned to video shots by means of a dynamic bayesian 0. Abstract 3 network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN) topology was proposed for this stage. The topology was based on the pleasure- arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this model can represent a large number of emotions. In the second stage, the visual, audio and a®ective information of the video was used to compute a statistical feature vector to represent the content of each shot. Affective level video segmentation was achieved by applying spectral clustering to the feature vectors. We evaluated the first stage of our proposal by comparing its emotion detec- tion ability with all the existing works which are related to the field of a®ective video content analysis. To evaluate the second stage, we used the time adaptive clustering (TAC) algorithm as our performance benchmark. The TAC algorithm was the best high level video segmentation method [2]. However, it is a very computationally intensive algorithm. To accelerate its computation speed, we developed a modified TAC (modTAC) algorithm which was designed to be mapped easily onto a field programmable gate array (FPGA) device. Both the TAC and modTAC algorithms were used as performance benchmarks for our proposed method. Since affective video content is a perceptual concept, the segmentation per- formance and human agreement rates were used as our evaluation criteria. To obtain our ground truth data and viewer agreement rates, a pilot panel study which was based on the work of Gross et al. [4] was conducted. Experiment results will show the feasibility of our proposed method. For the first stage of our proposal, our experiment results will show that an average improvement of as high as 38% was achieved over previous works. As for the second stage, an improvement of as high as 37% was achieved over the TAC algorithm

Spiral - Imperial College Digital Repository

Storytelling with salient stills

Author: Massey Michael J
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1996
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1996.Includes bibliographical references (p. 59-63).Michale J. Massey.M.S

DSpace@MIT

Recommended from our members

Multimodal Indexing of Presentation Videos

Author: Merler Michele
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

This thesis presents four novel methods to help users efficiently and effectively retrieve information from unstructured and unsourced multimedia sources, in particular the increasing amount and variety of presentation videos such as those in e-learning, conference recordings, corporate talks, and student presentations. We demonstrate a system to summarize, index and cross-reference such videos, and measure the quality of the produced indexes as perceived by the end users. We introduce four major semantic indexing cues: text, speaker faces, graphics, and mosaics, going beyond standard tag based searches and simple video playbacks. This work aims at recognizing visual content "in the wild", where the system cannot rely on any additional information besides the video itself. For text, within a scene text detection and recognition framework, we present a novel locally optimal adaptive binarization algorithm, implemented with integral histograms. It determines of an optimal threshold that maximizes the between-classes variance within a subwindow, with computational complexity independent from the size of the window itself. We obtain character recognition rates of 74%, as validated against ground truth of 8 presentation videos spanning over 1 hour and 45 minutes, which almost doubles the baseline performance of an open source OCR engine. For speaker faces, we detect, track, match, and finally select a humanly preferred face icon per speaker, based on three quality measures: resolution, amount of skin, and pose. We register a 87% accordance (51 out of 58 speakers) between the face indexes automatically generated from three unstructured presentation videos of approximately 45 minutes each, and human preferences recorded through Mechanical Turk experiments. For diagrams, we locate graphics inside frames showing a projected slide, cluster them according to an on-line algorithm based on a combination of visual and temporal information, and select and color-correct their representatives to match human preferences recorded through Mechanical Turk experiments. We register 71% accuracy (57 out of 81 unique diagrams properly identified, selected and color-corrected) on three hours of videos containing five different presentations. For mosaics, we combine two existing suturing measures, to extend video images into in-the-world coordinate system. A set of frames to be registered into a mosaic are sampled according to the PTZ camera movement, which is computed through least square estimation starting from the luminance constancy assumption. A local features based stitching algorithm is then applied to estimate the homography among a set of video frames and median blending is used to render pixels in overlapping regions of the mosaic. For two of these indexes, namely faces and diagrams, we present two novel MTurk-derived user data collections to determine viewer preferences, and show that they are matched in selection by our methods. The net result work of this thesis allows users to search, inside a video collection as well as within a single video clip, for a segment of presentation by professor X on topic Y, containing graph Z

Columbia University Academic Commons

Characterization of unstructured video

Author: Iyengar Giridharan Ranganathan, 1969-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 1999.Includes bibliographical references (p. 135-139).In this work, we examine video retrieval from a synthesis perspective in co-operation with the more common analysis perspective. Specifically, we target our algorithms for one particular domain- unstructured video material. The goal is to make this unstructured video available for manipulation in interesting ways. I.e, take video that may have been shot with no specific intent and use it in different settings. For example, we build a set of interfaces that will enable taking a collection of home videos and making Christmas cards, Refrigerator magnets, family dramas etc out of them. The work is divided into three parts. First, we study features and models for characterization of video. Examples are VideoBook with its extensions and Hidden Markov Models for video analysis. Secondly, we examine clustering as an approach for characterization of unstructured video. Clustering alleviates some of the common problems with "query-by- example" and presents groupings that rely on the user's abilities to make relevant connections. The clustering techniques we employ operate in the probability density space. One of our goals is to employ these techniques with sophisticated models such as Bayesian Networks and HMMs, which give similar descriptions. The clustering techniques we employ are shown to be optimal in an information theoretic and Gibbs Free Energy sense. Finally, we present a set of interfaces that use these features and groupings to enable browsing and editing of unstructured video content.by Giridharan Ranganathan Iyengar.Ph.D

DSpace@MIT

Integrated navigation and visualisation for skull base surgery

Author: Shapey Jonathan
Publication venue: UCL (University College London)
Publication date: 28/12/2021
Field of study

Skull base surgery involves the management of tumours located on the underside of the brain and the base of the skull. Skull base tumours are intricately associated with several critical neurovascular structures making surgery challenging and high risk. Vestibular schwannoma (VS) is a benign nerve sheath tumour arising from one of the vestibular nerves and is the commonest pathology encountered in skull base surgery. The goal of modern VS surgery is maximal tumour removal whilst preserving neurological function and maintaining quality of life but despite advanced neurosurgical techniques, facial nerve paralysis remains a potentially devastating complication of this surgery. This thesis describes the development and integration of various advanced navigation and visualisation techniques to increase the precision and accuracy of skull base surgery. A novel Diffusion Magnetic Resonance Imaging (dMRI) acquisition and processing protocol for imaging the facial nerve in patients with VS was developed to improve delineation of facial nerve preoperatively. An automated Artificial Intelligence (AI)-based framework was developed to segment VS from MRI scans. A user-friendly navigation system capable of integrating dMRI and tractography of the facial nerve, 3D tumour segmentation and intraoperative 3D ultrasound was developed and validated using an anatomically-realistic acoustic phantom model of a head including the skull, brain and VS. The optical properties of five types of human brain tumour (meningioma, pituitary adenoma, schwannoma, low- and high-grade glioma) and nine different types of healthy brain tissue were examined across a wavelength spectrum of 400 nm to 800 nm in order to inform the development of an Intraoperative Hypserpectral Imaging (iHSI) system. Finally, functional and technical requirements of an iHSI were established and a prototype system was developed and tested in a first-in-patient study

UCL Discovery

Being Young in Arab Detroit: Media and Identity in Post-9/11 America.

Author: Haddad Candice
Publication venue
Publication date: 01/01/2015
Field of study

More than ten years after the events of 9/11, the Arab American community of the Dearborn and Detroit, Michigan area continues to feel the effects of the nation’s intense scrutiny of their lives and identities. As formal government and informal communal surveillance, threats of violence and deportation, and general anxieties escalated, the Arab Detroit community has been at the center of various efforts to understand Arab and Muslim Americans. Through an engagement with post-9/11 national news media discourses, participant-observation work at the Arab American National Museum (founded in 2005 and located in Dearborn, MI), interviews and focus groups with Arab American youth, and digital ethnography of Arab American youths’ online cultural productions, this dissertation examines what it means to be young and Arab American in Dearborn. Moving beyond well-worn distinctions between mainstream and grassroots media, this dissertation examines news discourse and television programming in relation to various media produced and circulated by Arab American youth. In doing so, this dissertation contributes to scholarship in a number of related areas including media representations of race and ethnicity, geography and identity, the increasingly vexed relationship between race and religion, and youth culture.PhDCommunication StudiesUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111598/1/cjhaddad_1.pd

Deep Blue Documents at the University of Michigan

Construction de mosaïques de super-résolution à partir de la vidéo de basse résolution. Application au résumé vidéo et la dissimulation d'erreurs de transmission.

Author: KRAEMER Petra
Publication venue
Publication date: 13/01/2021
Field of study

La numérisation des vidéos existantes ainsi que le développement explosif des services multimédia par des réseaux comme la diffusion de la télévision numérique ou les communications mobiles ont produit une énorme quantité de vidéos compressées. Ceci nécessite des outils d’indexation et de navigation efficaces, mais une indexation avant l’encodage n’est pas habituelle. L’approche courante est le décodage complet des ces vidéos pour ensuite créer des indexes. Ceci est très coûteux et par conséquent non réalisable en temps réel. De plus, des informations importantes comme le mouvement, perdus lors du décodage, sont reestimées bien que déjà présentes dans le flux comprimé. Notre but dans cette thèse est donc la réutilisation des données déjà présents dans le flux comprimé MPEG pour l’indexation et la navigation rapide. Plus précisément, nous extrayons des coefficients DC et des vecteurs de mouvement. Dans le cadre de cette thèse, nous nous sommes en particulier intéressés à la construction de mosaïques à partir des images DC extraites des images I. Une mosaïque est construite par recalage et fusion de toutes les images d’une séquence vidéo dans un seul système de coordonnées. Ce dernier est en général aligné avec une des images de la séquence : l’image de référence. Il en résulte une seule image qui donne une vue globale de la séquence. Ainsi, nous proposons dans cette thèse un système complet pour la construction des mosaïques à partir du flux MPEG-1/2 qui tient compte de différentes problèmes apparaissant dans des séquences vidéo réeles, comme par exemple des objets en mouvment ou des changements d’éclairage. Une tâche essentielle pour la construction d’une mosaïque est l’estimation de mouvement entre chaque image de la séquence et l’image de référence. Notre méthode se base sur une estimation robuste du mouvement global de la caméra à partir des vecteurs de mouvement des images P. Cependant, le mouvement global de la caméra estimé pour une image P peut être incorrect car il dépend fortement de la précision des vecteurs encodés. Nous détectons les images P concernées en tenant compte des coefficients DC de l’erreur encodée associée et proposons deux méthodes pour corriger ces mouvements. Unemosaïque construite à partir des images DC a une résolution très faible et souffre des effets d’aliasing dus à la nature des images DC. Afin d’augmenter sa résolution et d’améliorer sa qualité visuelle, nous appliquons une méthode de super-résolution basée sur des rétro-projections itératives. Les méthodes de super-résolution sont également basées sur le recalage et la fusion des images d’une séquence vidéo, mais sont accompagnées d’une restauration d’image. Dans ce cadre, nous avons développé une nouvelleméthode d’estimation de flou dû au mouvement de la caméra ainsi qu’une méthode correspondante de restauration spectrale. La restauration spectrale permet de traiter le flou globalement, mais, dans le cas des obvi jets ayant un mouvement indépendant du mouvement de la caméra, des flous locaux apparaissent. C’est pourquoi, nous proposons un nouvel algorithme de super-résolution dérivé de la restauration spatiale itérative de Van Cittert et Jansson permettant de restaurer des flous locaux. En nous basant sur une segmentation d’objets en mouvement, nous restaurons séparément lamosaïque d’arrière-plan et les objets de l’avant-plan. Nous avons adapté notre méthode d’estimation de flou en conséquence. Dans une premier temps, nous avons appliqué notre méthode à la construction de résumé vidéo avec pour l’objectif la navigation rapide par mosaïques dans la vidéo compressée. Puis, nous établissions comment la réutilisation des résultats intermédiaires sert à d’autres tâches d’indexation, notamment à la détection de changement de plan pour les images I et à la caractérisation dumouvement de la caméra. Enfin, nous avons exploré le domaine de la récupération des erreurs de transmission. Notre approche consiste en construire une mosaïque lors du décodage d’un plan ; en cas de perte de données, l’information manquante peut être dissimulée grace à cette mosaïque

Oskar Bordeaux

Multimedia Retrieval

Author
Publication venue: Springer
Publication date: 01/01/2007
Field of study

University of Twente Research Information

A Markov Random Field Based Approach to 3D Mosaicing and Registration Applied to Ultrasound Simulation

Author: Kutarnia Jason Francis
Publication venue: Digital WPI
Publication date: 15/07/2011
Field of study

A novel Markov Random Field (MRF) based method for the mosaicing of 3D ultrasound volumes is presented in this dissertation. The motivation for this work is the production of training volumes for an affordable ultrasound simulator, which offers a low-cost/portable training solution for new users of diagnostic ultrasound, by providing the scanning experience essential for developing the necessary psycho-motor skills. It also has the potential for introducing ultrasound instruction into medical education curriculums. The interest in ultrasound training stems in part from the widespread adoption of point-of-care scanners, i.e. low cost portable ultrasound scanning systems in the medical community. This work develops a novel approach for producing 3D composite image volumes and validates the approach using clinically acquired fetal images from the obstetrics department at the University of Massachusetts Medical School (UMMS). Results using the Visible Human Female dataset as well as an abdominal trauma phantom are also presented. The process is broken down into five distinct steps, which include individual 3D volume acquisition, rigid registration, calculation of a mosaicing function, group-wise non-rigid registration, and finally blending. Each of these steps, common in medical image processing, has been investigated in the context of ultrasound mosaicing and has resulted in improved algorithms. Rigid and non-rigid registration methods are analyzed in a probabilistic framework and their sensitivity to ultrasound shadowing artifacts is studied. The group-wise non-rigid registration problem is initially formulated as a maximum likelihood estimation, where the joint probability density function is comprised of the partially overlapping ultrasound image volumes. This expression is simplified using a block-matching methodology and the resulting discrete registration energy is shown to be equivalent to a Markov Random Field. Graph based methods common in computer vision are then used for optimization, resulting in a set of transformations that bring the overlapping volumes into alignment. This optimization is parallelized using a fusion approach, where the registration problem is divided into 8 independent sub-problems whose solutions are fused together at the end of each iteration. This method provided a speedup factor of 3.91 over the single threaded approach with no noticeable reduction in accuracy during our simulations. Furthermore, the registration problem is simplified by introducing a mosaicing function, which partitions the composite volume into regions filled with data from unique partially overlapping source volumes. This mosaicing functions attempts to minimize intensity and gradient differences between adjacent sources in the composite volume. Experimental results to demonstrate the performance of the group-wise registration algorithm are also presented. This algorithm is initially tested on deformed abdominal image volumes generated using a finite element model of the Visible Human Female to show the accuracy of its calculated displacement fields. In addition, the algorithm is evaluated using real ultrasound data from an abdominal phantom. Finally, composite obstetrics image volumes are constructed using clinical scans of pregnant subjects, where fetal movement makes registration/mosaicing especially difficult. Our solution to blending, which is the final step of the mosaicing process, is also discussed. The trainee will have a better experience if the volume boundaries are visually seamless, and this usually requires some blending prior to stitching. Also, regions of the volume where no data was collected during scanning should have an ultrasound-like appearance before being displayed in the simulator. This ensures the trainee\u27s visual experience isn\u27t degraded by unrealistic images. A discrete Poisson approach has been adapted to accomplish these tasks. Following this, we will describe how a 4D fetal heart image volume can be constructed from swept 2D ultrasound. A 4D probe, such as the Philips X6-1 xMATRIX Array, would make this task simpler as it can acquire 3D ultrasound volumes of the fetal heart in real-time; However, probes such as these aren\u27t widespread yet. Once the theory has been introduced, we will describe the clinical component of this dissertation. For the purpose of acquiring actual clinical ultrasound data, from which training datasets were produced, 11 pregnant subjects were scanned by experienced sonographers at the UMMS following an approved IRB protocol. First, we will discuss the software/hardware configuration that was used to conduct these scans, which included some custom mechanical design. With the data collected using this arrangement we generated seamless 3D fetal mosaics, that is, the training datasets, loaded them into our ultrasound training simulator, and then subsequently had them evaluated by the sonographers at the UMMS for accuracy. These mosaics were constructed from the raw scan data using the techniques previously introduced. Specific training objectives were established based on the input from our collaborators in the obstetrics sonography group. Important fetal measurements are reviewed, which form the basis for training in obstetrics ultrasound. Finally clinical images demonstrating the sonographer making fetal measurements in practice, which were acquired directly by the Philips iU22 ultrasound machine from one of our 11 subjects, are compared with screenshots of corresponding images produced by our simulator

DigitalCommons@WPI

University of Maryland University College: UMUC Digital Repository

A Markov Random Field Based Approach to 3D Mosaicing and Registration Applied to Ultrasound Simulation

Author: Kutarnia Jason Francis
Publication venue: Digital WPI
Publication date: 27/08/2014
Field of study

DigitalCommons@WPI