100 research outputs found

    Non-disruptive use of light fields in image and video processing

    Get PDF
    In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestĂŒtzten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal fĂŒr fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestĂŒtzten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollstĂ€ndige volumetrische Information einer Szene. Diese Technologie bietet dabei das grĂ¶ĂŸte Potenzial, immersive Erlebnisse zu mehr RealitĂ€tsnĂ€he zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der InkompatibilitĂ€t derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen fĂŒr 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die KompatibilitĂ€t zwischen Anwendungen und GerĂ€ten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken fĂŒr eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollstĂ€ndige 3D-Information einer Szene beinhalten, wird eine große Menge an Daten, die in hohem Maße redundant sind, erfasst. Die hier vorgeschlagenen AnsĂ€tze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung

    Storytelling with salient stills

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1996.Includes bibliographical references (p. 59-63).Michale J. Massey.M.S

    Media gateway utilizando um GPU

    Get PDF
    Mestrado em Engenharia de Computadores e TelemĂĄtic

    AXMEDIS 2008

    Get PDF
    The AXMEDIS International Conference series aims to explore all subjects and topics related to cross-media and digital-media content production, processing, management, standards, representation, sharing, protection and rights management, to address the latest developments and future trends of the technologies and their applications, impacts and exploitation. The AXMEDIS events offer venues for exchanging concepts, requirements, prototypes, research ideas, and findings which could contribute to academic research and also benefit business and industrial communities. In the Internet as well as in the digital era, cross-media production and distribution represent key developments and innovations that are fostered by emergent technologies to ensure better value for money while optimising productivity and market coverage

    Novel source coding methods for optimising real time video codecs.

    Get PDF
    The quality of the decoded video is affected by errors occurring in the various layers of the protocol stack. In this thesis, disjoint errors occurring in different layers of the protocol stack are investigated with the primary objective of demonstrating the flexibility of the source coding layer. In the first part of the thesis, the errors occurring in the editing layer, due to the coexistence of different video standards in the broadcast market, are addressed. The problems investigated are ‘Field Reversal’ and ‘Mixed Pulldown’. Field Reversal is caused when the interlaced video fields are not shown in the same order as they were captured. This results in a shaky video display, as the fields are not displayed in chronological order. Additionally, Mixed Pulldown occurs when the video frame-rate is up-sampled and down-sampled, when digitised film material is being standardised to suit standard televisions. Novel image processing algorithms are proposed to solve these problems from the source coding layer. In the second part of the thesis, the errors occurring in the transmission layer due to data corruption are addressed. The usage of block level source error-resilient methods over bit level channel coding methods are investigated and improvements are suggested. The secondary objective of the thesis is to optimise the proposed algorithm’s architecture for real-time implementation, since the problems are of a commercial nature. The Field Reversal and Mixed Pulldown algorithms were tested in real time at MTV (Music Television) and are made available commercially through ‘Cerify’, a Linux-based media testing box manufactured by Tektronix Plc. The channel error-resilient algorithms were tested in a laboratory environment using Matlab and performance improvements are obtained

    Network-Integrated Multimedia Middleware, Services, and Applications

    Get PDF
    Today, there is a strong trend towards networked multimedia devices. However, common multimedia software architectures are restricted to perform all processing on a single system. Available software infrastructures for distributed computing — commonly referred to as middleware — only partly provide the facilities needed for supporting multimedia in distributed and dynamic environments. Approaches from the research community only focus on specific aspects and do not achieve the coverage needed for a full-featured multimedia middleware solution. The Network-Integrated Multimedia Middleware (NMM) presented in this thesis considers the network as an integral part. Despite the inherent heterogeneity of present networking and device technologies, the architecture allows to extend control and cooperation to the network and enables the development of distributed multimedia applications that transparently use local and remote components in combination. The base architecture of this middleware is augmented by several middleware services that especially aim at providing additional support for developing complex applications that involve mobile users and devices. To this end, previously not available services and corresponding abstractions are proposed, realized, and evaluated. The performance and applicability of the developed middleware and its additional services are demonstrated by describing different realized application scenarios.Eine wachsende Anzahl von Multimedia-Geraeten verfuegt heute bereits ueber Netzwerkschnittstellen. Verfueugbare Multimedia Software-Architekturen beschraeanken jedoch die gesamte Datenverarbeitung auf ein einzelnes System. Verbreitete Software-Infrastrukturen fuer Verteilte Systeme — ueblicherweise Middleware genannt — bieten nur teilweise die Eigenschaften, die fuer die Multimedia-Datenverarbeitung in vernetzten und dynamischen Umgebungen benoetigt werden. Ansaetze aus der Forschung behandeln nur spezielle Teilaspekte und erreichen deshalb nicht den Funktionsumfang einer vollwertigen Middleware fuer Multimedia. Die in dieser Arbeit beschriebene Netzwerk-Integrierte Multimedia Middleware (NMM) betrachtet das Netzwerk als integralen Bestandteil. Die Architektur erlaubt trotz der inhaerenten Heterogenitaet der vorhandenen Netzwerk- und Geraetetechnologie die Kontrolle und das Zusammenspiel von Systemen auf das Netzwerk auszuweiten. Dies ermoeglicht die Entwicklung verteilter Multimedia-Anwendungen, die transparent lokale und entfernte Komponenten zusammen einsetzen. Die Kernarchitektur dieser Middleware wird durch verschiedene Dienste erweitert, die speziell die Realisierung komplexer Anwendungsszenarien mitmobilen Geraeten und Benutzern unterstuetzt. Insbesondere werden neue, bisher nicht vorhandene Middleware-Dienste und zugehoerige Abstraktionen vorgeschlagen, realisiert und evaluiert. Anhand verschiedener Anwendungsszenarien wird die Leistungfaehigkeit, die Effizienz und die praktische Relevanz der entwickelten Middleware und der ergaenzenden Dienste demonstriert

    Emotion Recognition from Speech with Acoustic, Non-Linear and Wavelet-based Features Extracted in Different Acoustic Conditions

    Get PDF
    ABSTRACT: In the last years, there has a great progress in automatic speech recognition. The challenge now it is not only recognize the semantic content in the speech but also the called "paralinguistic" aspects of the speech, including the emotions, and the personality of the speaker. This research work aims in the development of a methodology for the automatic emotion recognition from speech signals in non-controlled noise conditions. For that purpose, different sets of acoustic, non-linear, and wavelet based features are used to characterize emotions in different databases created for such purpose

    Design and analysis of a 3-dimensional cluster multicomputer architecture using optical interconnection for petaFLOP computing

    Get PDF
    In this dissertation, the design and analyses of an extremely scalable distributed multicomputer architecture, using optical interconnects, that has the potential to deliver in the order of petaFLOP performance is presented in detail. The design takes advantage of optical technologies, harnessing the features inherent in optics, to produce a 3D stack that implements efficiently a large, fully connected system of nodes forming a true 3D architecture. To adopt optics in large-scale multiprocessor cluster systems, efficient routing and scheduling techniques are needed. To this end, novel self-routing strategies for all-optical packet switched networks and on-line scheduling methods that can result in collision free communication and achieve real time operation in high-speed multiprocessor systems are proposed. The system is designed to allow failed/faulty nodes to stay in place without appreciable performance degradation. The approach is to develop a dynamic communication environment that will be able to effectively adapt and evolve with a high density of missing units or nodes. A joint CPU/bandwidth controller that maximizes the resource allocation in this dynamic computing environment is introduced with an objective to optimize the distributed cluster architecture, preventing performance/system degradation in the presence of failed/faulty nodes. A thorough analysis, feasibility study and description of the characteristics of a 3-Dimensional multicomputer system capable of achieving 100 teraFLOP performance is discussed in detail. Included in this dissertation is throughput analysis of the routing schemes, using methods from discrete-time queuing systems and computer simulation results for the different proposed algorithms. A prototype of the 3D architecture proposed is built and a test bed developed to obtain experimental results to further prove the feasibility of the design, validate initial assumptions, algorithms, simulations and the optimized distributed resource allocation scheme. Finally, as a prelude to further research, an efficient data routing strategy for highly scalable distributed mobile multiprocessor networks is introduced

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art
    • 

    corecore