46 research outputs found

    Adaptive deinterlacing of video sequences using motion data

    Get PDF
    In this work an efficient motion adaptive deinterlacing method with considerable improvement in picture quality is proposed. A temporal deinterlacing method has a high performance in static images while a spatial method has a better performance in dynamic parts. In the proposed deinterlacing method, a motion adaptive interpolator combines the results of a spatial method and a temporal method based on motion activity level of video sequence. A high performance and low complexity algorithm for motion detection is introduced. This algorithm uses five consecutive interlaced video fields for motion detection. It is able to capture a wide range of motions from slow to fast. The algorithm benefits from a hierarchal structure. It starts with detecting motion in large partitions of a given field. Depending on the detected motion activity level for that partition, the motion detection algorithm might recursively be applied to sub-blocks of the original partition. Two different low pass filters are used during the motion detection to increase the algorithm accuracy. The result of motion detection is then used in the proposed motion adaptive interpolator. The performance of the proposed deinterlacing algorithm is compared to previous methods in the literature. Experimenting with several standard video sequences, the method proposed in this work shows excellent results for motion detection and deinterlacing performance

    Algorithm/Architecture Co-Exploration of Visual Computing: Overview and Future Perspectives

    Get PDF
    Concurrently exploring both algorithmic and architectural optimizations is a new design paradigm. This survey paper addresses the latest research and future perspectives on the simultaneous development of video coding, processing, and computing algorithms with emerging platforms that have multiple cores and reconfigurable architecture. As the algorithms in forthcoming visual systems become increasingly complex, many applications must have different profiles with different levels of performance. Hence, with expectations that the visual experience in the future will become continuously better, it is critical that advanced platforms provide higher performance, better flexibility, and lower power consumption. To achieve these goals, algorithm and architecture co-design is significant for characterizing the algorithmic complexity used to optimize targeted architecture. This paper shows that seamless weaving of the development of previously autonomous visual computing algorithms and multicore or reconfigurable architectures will unavoidably become the leading trend in the future of video technology

    Soft computing techniques for video de-interlacing

    Get PDF
    This paper presents the application of soft computing techniques to video processing. Specially, the research work has been focused on de-interlacing task. It is necessary whenever the transmission standard uses an interlaced format but the receiver requires a progressive scanning, as happens in consumer displays such as LCDs and plasma. A simple hierarchical solution that combines three simple fuzzy logicbased constituents (interpolators) is presented in this paper. Each interpolator specialized in one of three key image features for de-interlacing: motion, edges, and possible repetition of picture areas. The resulting algorithm offers better results than others with less or similar computational cost. A very interesting result is that our algorithm is competitive with motion-compensated algorithm

    Conception et implémentation de processeurs dédiés pour des systÚmes de traitement vidéo temps réel

    Get PDF
    RÉSUMÉ Les systĂšmes de traitement vidĂ©o se caractĂ©risent par des demandes de performance de plus en plus exigeantes. Les nouvelles normes, telles le HMDI 1.3 (High Definition Media Interface), requiĂšrent des bandes passantes allant jusqu'Ă  340 MĂ©ga-pixels par seconde et par canal. Il en dĂ©coule que les processeurs traitant ce type d’information doivent ĂȘtre trĂšs performants. Les nouvelles mĂ©thodologies de conception basĂ©es sur un langage de description d’architecture (ADL) apparaissent pour rĂ©pondre Ă  ces dĂ©fis. Elles nous permettent de concevoir des processeurs dĂ©diĂ©s de bout en bout, avec un maximum de flexibilitĂ©. Cette flexibilitĂ©, grande force de ce type de langage (tels LISA 2.0), nous permet par ajout d’instructions spĂ©cialisĂ©es et modification de l’architecture (ajout de registres spĂ©cialisĂ©s, modification de largeur de bus), de crĂ©er un processeur dĂ©diĂ© Ă  partir d’architectures de base considĂ©rĂ©es comme des processeurs d’usage gĂ©nĂ©ral. Dans le cadre de nos travaux, nous nous sommes concentrĂ©s sur un type d’algorithmes de traitement d’image, le dĂ©sentrelacement. Le dĂ©sentrelacement est un traitement qui permet de reconstruire une sĂ©quence vidĂ©o complĂšte Ă  partir d’une sĂ©quence vidĂ©o entrelacĂ©e pour des raisons telles que la rĂ©duction de bande passante. Tout au long de nos travaux, nous avons eu un souci constant de dĂ©velopper des mĂ©thodologies, les plus gĂ©nĂ©rales possibles, pouvant ĂȘtre utilisĂ©es pour d’autres algorithmes. L’une des contributions de ce mĂ©moire est le dĂ©veloppement d’architectures de testcomplĂštes et modulaires, permettant d’implĂ©menter un processeur de traitement vidĂ©o temps rĂ©el. Nous avons Ă©galement dĂ©veloppĂ© une interface de gestion de RAM qui permet au cours du dĂ©veloppement des processeurs de les tester sans modifier le systĂšme au complet. Le dĂ©veloppement de deux mĂ©thodologies innovatrices reprĂ©sente un apport supplĂ©mentaire dans la conception de processeurs dĂ©diĂ©s. Ces deux mĂ©thodologies, qui se basent sur un langage ADL, sont synergiques et permettent d’implĂ©menter et d’accĂ©lĂ©rer des algorithmes de traitements vidĂ©o temps rĂ©el. Nous obtenons dans un premier temps un facteur d’accĂ©lĂ©ration de 11 pour la premiĂšre mĂ©thodologie puis un facteur d’accĂ©lĂ©ration de 282 pour la deuxiĂšme. ----------ABSTRACT Video processing systems are characterized by rising performance specifications. New standards such as the HDMI 1.3 require bandwidths as high as 340 megapixels per second and per channel, resulting in greater information processing power. New conceptual methodologies based on architectural descriptions (ADL) seem to respond to this challenge. Design methods and languages for architectural descriptions (such as LISA 2.0), allow developing tailor-made high performance processors in a very flexible way. The flexibility of these languages let the user add specialized instructions to an instruction set processor. They also allow modifying its architecture to create a processor with much improved performance compared to some baseline general purpose processsor. Our study focuses on a specific type of video processing algorithm called deinterlacing. Deinterlacing allows reconstructing a complete video sequence from an interlaced video sequence. Despite this algorithmic focus, in the course of this study, we were concerned with developing broadly applicable methodologies usable for other algorithms. This thesis aims to contribute to the existing body of work in the field by developing complete and modular test architectures allowing to implement processors capable of real time video processing. The development of two innovative design methodologies represents an additional contribution. These synergetic methodologies are based on ADL (Architecture Description Language). Our results confirm that they allow implementing processors capable of real-time video processing. We obtained an acceleration factor of 11 with a first design method and the acceleration factor was further improved to 282 with a second method

    The Design Fabrication and Flight Testing of an Academic Research Platform for High Resolution Terrain Imaging

    Get PDF
    This thesis addresses the design, construction, and flight testing of an Unmanned Aircraft System (UAS) created to serve as a testbed for Intelligence, Surveillance, and Reconnaissance (ISR) research topics that require the rapid acquisition and processing of high resolution aerial imagery and are to be performed by academic research institutions. An analysis of the requirements of various ISR research applications and the practical limitations of academic research yields a consolidated set of requirements by which the UAS is designed. An iterative design process is used to transition from these requirements to cycles of component selection, systems integration, flight tests, diagnostics, and subsystem redesign. The resulting UAS is designed as an academic research platform to support a variety of ISR research applications ranging from human machine interaction with UAS technology to orthorectified mosaic imaging. The lessons learned are provided to enable future researchers to create similar systems

    Novel source coding methods for optimising real time video codecs.

    Get PDF
    The quality of the decoded video is affected by errors occurring in the various layers of the protocol stack. In this thesis, disjoint errors occurring in different layers of the protocol stack are investigated with the primary objective of demonstrating the flexibility of the source coding layer. In the first part of the thesis, the errors occurring in the editing layer, due to the coexistence of different video standards in the broadcast market, are addressed. The problems investigated are ‘Field Reversal’ and ‘Mixed Pulldown’. Field Reversal is caused when the interlaced video fields are not shown in the same order as they were captured. This results in a shaky video display, as the fields are not displayed in chronological order. Additionally, Mixed Pulldown occurs when the video frame-rate is up-sampled and down-sampled, when digitised film material is being standardised to suit standard televisions. Novel image processing algorithms are proposed to solve these problems from the source coding layer. In the second part of the thesis, the errors occurring in the transmission layer due to data corruption are addressed. The usage of block level source error-resilient methods over bit level channel coding methods are investigated and improvements are suggested. The secondary objective of the thesis is to optimise the proposed algorithm’s architecture for real-time implementation, since the problems are of a commercial nature. The Field Reversal and Mixed Pulldown algorithms were tested in real time at MTV (Music Television) and are made available commercially through ‘Cerify’, a Linux-based media testing box manufactured by Tektronix Plc. The channel error-resilient algorithms were tested in a laboratory environment using Matlab and performance improvements are obtained

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    Multimodal feature extraction and fusion for audio-visual speech recognition

    Get PDF
    Multimodal signal processing analyzes a physical phenomenon through several types of measures, or modalities. This leads to the extraction of higher-quality and more reliable information than that obtained from single-modality signals. The advantage is two-fold. First, as the modalities are usually complementary, the end-result of multimodal processing is more informative than for each of the modalities individually, which represents the first advantage. This is true in all application domains: human-machine interaction, multimodal identification or multimodal image processing. The second advantage is that, as modalities are not always reliable, it is possible, when one modality becomes corrupted, to extract the missing information from the other one. There are two essential challenges in multimodal signal processing. First, the features used from each modality need to be as relevant and as few as possible. The fact that multimodal systems have to process more than just one modality means that they can run into errors caused by the curse of dimensionality much more easily than mono-modal ones. The curse of dimensionality is a term used essentially to say that the number of equally-distributed samples required to cover a region of space grows exponentially with the dimensionality of the space. This has important implications in the classification domain, since accurate models can only be obtained if an adequate number of samples is available, and obviously this required number of samples grows with the dimensionality of the features. Dimensionality reduction is thus a necessary step in any application dealing with complex signals, and this is achieved through selection, transforms or the combination of the two. The second essential challenge is multimodal integration. Since the signals involved do not necessarily have the same data rate, range or even dimensionality, combining information coming from such different sources is not straightforward. This can be done at different levels, starting from the basic signal level by combining the signals themselves, if they are compatible, up to the highest decision level, where only the individual decisions taken based on the signals are combined. Ideally, the fusion method should allow temporal variations in the relative importance of the two streams, to account for possible changes in their quality. However, this can only be done with methods operating at a high decision level. The aim of this thesis is to offer solutions to both these challenges, in the context of audio-visual speech recognition and speaker localization. Both these applications are from the field of human-machine interaction. Audio-visual speech recognition aims to improve the accuracy of speech recognizers by augmenting the audio with information extracted from the video, more particularly, the movement of the speaker's lips. This works well especially when the audio is corrupted, leading in this case to significant gains in accuracy. Speaker localization means detecting who is the active speaker in a audio-video sequence containing several persons, something that is useful for videoconferencing and the automated annotation of meetings. These two applications are the context in which we present our solutions to both feature selection and multimodal integration. First, we show how informative features can be extracted from the visual modality, using an information-theoretic framework which gives us a quantitative measure of the relevance of individual features. We also prove that reducing redundancy between these features is important for avoiding the curse of dimensionality and improving recognition results. The methods that we present are novel in the field of audio-visual speech recognition and we found that their use leads to significant improvements compared to the state of the art. Second, we present a method of multimodal fusion at the level of intermediate decisions using a weight for each of the streams. The weights are adaptive, changing according to the estimated reliability of each stream. This makes the system tolerant to changes in the quality of either stream, and even to the temporary interruption of one of the streams. The reliability estimate is based on the entropy of the posterior probability distributions of each stream at the intermediate decision level. Our results are superior to those obtained with a state of the art method based on maximizing the same posteriors. Moreover, we analyze the effect of a constraint typically imposed on stream weights in the literature, the constraint that they should sum to one. Our results show that removing this constraint can lead to improvements in recognition accuracy. Finally, we develop a method for audio-visual speaker localization, based on the correlation between audio energy and the movement of the speaker's lips. Our method is based on a joint probability model of the audio and video which is used to build a likelihood map showing the likely positions of the speaker's mouth. We show that our novel method performs better than a similar method from the literature. In conclusion, we analyze two different challenges of multimodal signal processing for two audio-visual problems, and offer innovative approaches for solving them

    Enhancing a Neurosurgical Imaging System with a PC-based Video Processing Solution

    Get PDF
    This work presents a PC-based prototype video processing application developed to be used with a specific neurosurgical imaging device, the OPMIÂź PenteroTM operating microscope, in the Department of Neurosurgery of Helsinki University Central Hospital at Töölö, Helsinki. The motivation for implementing the software was the lack of some clinically important features in the imaging system provided by the microscope. The imaging system is used as an online diagnostic aid during surgery. The microscope has two internal video cameras; one for regular white light imaging and one for near-infrared fluorescence imaging, used for indocyanine green videoangiography. The footage of the microscope’s current imaging mode is accessed via the composite auxiliary output of the device. The microscope also has an external high resolution white light video camera, accessed via a composite output of a separate video hub. The PC was chosen as the video processing platform for its unparalleled combination of prototyping and high-throughput video processing capabilities. A thorough analysis of the platform and efficient video processing methods was conducted in the thesis and the results were used in the design of the imaging station. The features found feasible during the project were incorporated into a video processing application running on a GNU/Linux distribution Ubuntu. The clinical usefulness of the implemented features was ensured beforehand by consulting the neurosurgeons using the original system. The most significant shortcomings of the original imaging system were mended in this work. The key features of the developed application include: live streaming, simultaneous streaming and recording, and playing back of upto two video streams. The playback mode provides full media player controls, with a frame-by-frame precision rewinding, in an intuitive and responsive interface. A single view and a side-by-side comparison mode are provided for the streams. The former gives more detail, while the latter can be used, for example, for before-after and anatomic-angiographic comparisons.fi=OpinnĂ€ytetyö kokotekstinĂ€ PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=LĂ€rdomsprov tillgĂ€ngligt som fulltext i PDF-format
    corecore