1,952 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationInteractive editing and manipulation of digital media is a fundamental component in digital content creation. One media in particular, digital imagery, has seen a recent increase in popularity of its large or even massive image formats. Unfortunately, current systems and techniques are rarely concerned with scalability or usability with these large images. Moreover, processing massive (or even large) imagery is assumed to be an off-line, automatic process, although many problems associated with these datasets require human intervention for high quality results. This dissertation details how to design interactive image techniques that scale. In particular, massive imagery is typically constructed as a seamless mosaic of many smaller images. The focus of this work is the creation of new technologies to enable user interaction in the formation of these large mosaics. While an interactive system for all stages of the mosaic creation pipeline is a long-term research goal, this dissertation concentrates on the last phase of the mosaic creation pipeline - the composition of registered images into a seamless composite. The work detailed in this dissertation provides the technologies to fully realize interactive editing in mosaic composition on image collections ranging from the very small to massive in scale

    Movie/Script: Alignment and Parsing of Video and Text Transcription

    Get PDF
    Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales “in the wild”. Harvesting automatically labeled sequences of actions from video would enable creation of large-scale and highly-varied datasets. To enable such collection, we focus on the task of recovering scene structure in movies and TV series for object tracking and action retrieval. We present a weakly supervised algorithm that uses the screenplay and closed captions to parse a movie into a hierarchy of shots and scenes. Scene boundaries in the movie are aligned with screenplay scene labels and shots are reordered into a sequence of long continuous tracks or threads which allow for more accurate tracking of people, actions and objects. Scene segmentation, alignment, and shot threading are formulated as inference in a unified generative model and a novel hierarchical dynamic programming algorithm that can handle alignment and jump-limited reorderings in linear time is presented. We present quantitative and qualitative results on movie alignment and parsing, and use the recovered structure to improve character naming and retrieval of common actions in several episodes of popular TV series

    A robust patch-based synthesis framework for combining inconsistent images

    Get PDF
    Current methods for combining different images produce visible artifacts when the sources have very different textures and structures, come from far view points, or capture dynamic scenes with motions. In this thesis, we propose a patch-based synthesis algorithm to plausibly combine different images that have color, texture, structural, and geometric inconsistencies. For some applications such as cloning and stitching where a gradual blend is required, we present a new method for synthesizing a transition region between two source images, such that inconsistent properties change gradually from one source to the other. We call this process image melding. For gradual blending, we generalized patch-based optimization foundation with three key generalizations: First, we enrich the patch search space with additional geometric and photometric transformations. Second, we integrate image gradients into the patch representation and replace the usual color averaging with a screened Poisson equation solver. Third, we propose a new energy based on mixed L2/L0 norms for colors and gradients that produces a gradual transition between sources without sacrificing texture sharpness. Together, all three generalizations enable patch-based solutions to a broad class of image melding problems involving inconsistent sources: object cloning, stitching challenging panoramas, hole filling from multiple photos, and image harmonization. We also demonstrate another application which requires us to address inconsistencies across the images: high dynamic range (HDR) reconstruction using sequential exposures. In this application, the results will suffer from objectionable artifacts for dynamic scenes if the inconsistencies caused by significant scene motions are not handled properly. In this thesis, we propose a new approach to HDR reconstruction that uses information in all exposures while being more robust to motion than previous techniques. Our algorithm is based on a novel patch-based energy-minimization formulation that integrates alignment and reconstruction in a joint optimization through an equation we call the HDR image synthesis equation. This allows us to produce an HDR result that is aligned to one of the exposures yet contains information from all of them. These two applications (image melding and high dynamic range reconstruction) show that patch based methods like the one proposed in this dissertation can address inconsistent images and could open the door to many new image editing applications in the future

    Fluid Morphing for 2D Animations

    Get PDF
    Professionaalsel tasemel animeerimine on aeganõudev ja kulukas tegevus. Seda eriti sõltumatule arvutimängude tegijale. Siit tulenevalt osutub kasulikuks leida meetodeid, mis võimaldaks programmaatiliselt suurendada kaadrite arvu igas kahemõõtmelises raster animatsioonis. Vedeliku simulaatoriga eksperimenteerimine andis käesoleva töö autoritele idee, kuidas saavutada visuaalselt meeldiv kaadrite üleminek, kasutades selleks vedeliku dünaamikat. Tulemusena valmis programm, mis võib animaatori efektiivsust tõsta lausa mitmeid kordi. Autorid usuvad, et see avastus võib viia kahemõõtmeliste animatsioonide uuele võidukäigule — näiteks kaasaegsete arvutimängude kontekstis.Creation of professional animations is expensive and time-consuming, especially for the independent game developers. Therefore, it is rewarding to find a method that would programmatically increase the frame rate of any two-dimensional raster animation. Experimenting with a fluid simulator gave the authors an insight that to achieve visually pleasant and smooth animations, elements from fluid dynamics can be used. As a result, fluid image morphing was developed, allowing the animators to produce more significant frames than they would with the classic methods. The authors believe that this discovery could reintroduce hand drawn animations to modern computer games

    Fehlerkaschierte Bildbasierte Darstellungsverfahren

    Get PDF
    Creating photo-realistic images has been one of the major goals in computer graphics since its early days. Instead of modeling the complexity of nature with standard modeling tools, image-based approaches aim at exploiting real-world footage directly,as they are photo-realistic by definition. A drawback of these approaches has always been that the composition or combination of different sources is a non-trivial task, often resulting in annoying visible artifacts. In this thesis we focus on different techniques to diminish visible artifacts when combining multiple images in a common image domain. The results are either novel images, when dealing with the composition task of multiple images, or novel video sequences rendered in real-time, when dealing with video footage from multiple cameras.Fotorealismus ist seit jeher eines der großen Ziele in der Computergrafik. Anstatt die Komplexität der Natur mit standardisierten Modellierungswerkzeugen nachzubauen, gehen bildbasierte Ansätze den umgekehrten Weg und verwenden reale Bildaufnahmen zur Modellierung, da diese bereits per Definition fotorealistisch sind. Ein Nachteil dieser Variante ist jedoch, dass die Komposition oder Kombination mehrerer Quellbilder eine nichttriviale Aufgabe darstellt und häufig unangenehm auffallende Artefakte im erzeugten Bild nach sich zieht. In dieser Dissertation werden verschiedene Ansätze verfolgt, um Artefakte zu verhindern oder abzuschwächen, welche durch die Komposition oder Kombination mehrerer Bilder in einer gemeinsamen Bilddomäne entstehen. Im Ergebnis liefern die vorgestellten Verfahren neue Bilder oder neue Ansichten einer Bildsammlung oder Videosequenz, je nachdem, ob die jeweilige Aufgabe die Komposition mehrerer Bilder ist oder die Kombination mehrerer Videos verschiedener Kameras darstellt

    A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION

    No full text
    VIDEO segmentation facilitates e±cient video indexing and navigation in large digital video archives. It is an important process in a content-based video indexing and retrieval (CBVIR) system. Many automated solutions performed seg- mentation by utilizing information about the \facts" of the video. These \facts" come in the form of labels that describe the objects which are captured by the cam- era. This type of solutions was able to achieve good and consistent results for some video genres such as news programs and informational presentations. The content format of this type of videos is generally quite standard, and automated solutions were designed to follow these format rules. For example in [1], the presence of news anchor persons was used as a cue to determine the start and end of a meaningful news segment. The same cannot be said for video genres such as movies and feature films. This is because makers of this type of videos utilized different filming techniques to design their videos in order to elicit certain affective response from their targeted audience. Humans usually perform manual video segmentation by trying to relate changes in time and locale to discontinuities in meaning [2]. As a result, viewers usually have doubts about the boundary locations of a meaningful video segment due to their different affective responses. This thesis presents an entirely new view to the problem of high level video segmentation. We developed a novel probabilistic method for affective level video content analysis and segmentation. Our method had two stages. In the first stage, a®ective content labels were assigned to video shots by means of a dynamic bayesian 0. Abstract 3 network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN) topology was proposed for this stage. The topology was based on the pleasure- arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this model can represent a large number of emotions. In the second stage, the visual, audio and a®ective information of the video was used to compute a statistical feature vector to represent the content of each shot. Affective level video segmentation was achieved by applying spectral clustering to the feature vectors. We evaluated the first stage of our proposal by comparing its emotion detec- tion ability with all the existing works which are related to the field of a®ective video content analysis. To evaluate the second stage, we used the time adaptive clustering (TAC) algorithm as our performance benchmark. The TAC algorithm was the best high level video segmentation method [2]. However, it is a very computationally intensive algorithm. To accelerate its computation speed, we developed a modified TAC (modTAC) algorithm which was designed to be mapped easily onto a field programmable gate array (FPGA) device. Both the TAC and modTAC algorithms were used as performance benchmarks for our proposed method. Since affective video content is a perceptual concept, the segmentation per- formance and human agreement rates were used as our evaluation criteria. To obtain our ground truth data and viewer agreement rates, a pilot panel study which was based on the work of Gross et al. [4] was conducted. Experiment results will show the feasibility of our proposed method. For the first stage of our proposal, our experiment results will show that an average improvement of as high as 38% was achieved over previous works. As for the second stage, an improvement of as high as 37% was achieved over the TAC algorithm

    ScaleTrotter: Illustrative Visual Travels Across Negative Scales

    Full text link
    We present ScaleTrotter, a conceptual framework for an interactive, multi-scale visualization of biological mesoscale data and, specifically, genome data. ScaleTrotter allows viewers to smoothly transition from the nucleus of a cell to the atomistic composition of the DNA, while bridging several orders of magnitude in scale. The challenges in creating an interactive visualization of genome data are fundamentally different in several ways from those in other domains like astronomy that require a multi-scale representation as well. First, genome data has intertwined scale levels---the DNA is an extremely long, connected molecule that manifests itself at all scale levels. Second, elements of the DNA do not disappear as one zooms out---instead the scale levels at which they are observed group these elements differently. Third, we have detailed information and thus geometry for the entire dataset and for all scale levels, posing a challenge for interactive visual exploration. Finally, the conceptual scale levels for genome data are close in scale space, requiring us to find ways to visually embed a smaller scale into a coarser one. We address these challenges by creating a new multi-scale visualization concept. We use a scale-dependent camera model that controls the visual embedding of the scales into their respective parents, the rendering of a subset of the scale hierarchy, and the location, size, and scope of the view. In traversing the scales, ScaleTrotter is roaming between 2D and 3D visual representations that are depicted in integrated visuals. We discuss, specifically, how this form of multi-scale visualization follows from the specific characteristics of the genome data and describe its implementation. Finally, we discuss the implications of our work to the general illustrative depiction of multi-scale data
    corecore