69 research outputs found

    Segmentation-based mesh design for motion estimation

    Get PDF
    Dans la plupart des codec vidéo standard, l'estimation des mouvements entre deux images se fait généralement par l'algorithme de concordance des blocs ou encore BMA pour « Block Matching Algorithm ». BMA permet de représenter l'évolution du contenu des images en décomposant normalement une image par blocs 2D en mouvement translationnel. Cette technique de prédiction conduit habituellement à de sévères distorsions de 1'artefact de bloc lorsque Ie mouvement est important. De plus, la décomposition systématique en blocs réguliers ne dent pas compte nullement du contenu de l'image. Certains paramètres associes aux blocs, mais inutiles, doivent être transmis; ce qui résulte d'une augmentation de débit de transmission. Pour paillier a ces défauts de BMA, on considère les deux objectifs importants dans Ie codage vidéo, qui sont de recevoir une bonne qualité d'une part et de réduire la transmission a très bas débit d'autre part. Dans Ie but de combiner les deux exigences quasi contradictoires, il est nécessaire d'utiliser une technique de compensation de mouvement qui donne, comme transformation, de bonnes caractéristiques subjectives et requiert uniquement, pour la transmission, l'information de mouvement. Ce mémoire propose une technique de compensation de mouvement en concevant des mailles 2D triangulaires a partir d'une segmentation de l'image. La décomposition des mailles est construite a partir des nœuds repartis irrégulièrement Ie long des contours dans l'image. La décomposition résultant est ainsi basée sur Ie contenu de l'image. De plus, étant donné la même méthode de sélection des nœuds appliquée à l'encodage et au décodage, la seule information requise est leurs vecteurs de mouvement et un très bas débit de transmission peut ainsi être réalise. Notre approche, comparée avec BMA, améliore à la fois la qualité subjective et objective avec beaucoup moins d'informations de mouvement. Dans la premier chapitre, une introduction au projet sera présentée. Dans Ie deuxième chapitre, on analysera quelques techniques de compression dans les codec standard et, surtout, la populaire BMA et ses défauts. Dans Ie troisième chapitre, notre algorithme propose et appelé la conception active des mailles a base de segmentation, sera discute en détail. Ensuite, les estimation et compensation de mouvement seront décrites dans Ie chapitre 4. Finalement, au chapitre 5, les résultats de simulation et la conclusion seront présentés.Abstract: In most video compression standards today, the generally accepted method for temporal prediction is motion compensation using block matching algorithm (BMA). BMA represents the scene content evolution with 2-D rigid translational moving blocks. This kind of predictive scheme usually leads to distortions such as block artefacts especially when the motion is important. The two most important aims in video coding are to receive a good quality on one hand and a low bit-rate on the other. This thesis proposes a motion compensation scheme using segmentation-based 2-D triangular mesh design method. The mesh is constructed by irregularly spread nodal points selected along image contour. Based on this, the generated mesh is, to a great extent, image content based. Moreover, the nodes are selected with the same method on the encoder and decoder sides, so that the only information that has to be transmitted are their motion vectors, and thus very low bit-rate can be achieved. Compared with BMA, our approach could improve subjective and objective quality with much less motion information."--Résumé abrégé par UM

    Variable Block Size Motion Compensation In The Redundant Wavelet Domain

    Get PDF
    Video is one of the most powerful forms of multimedia because of the extensive information it delivers. Video sequences are highly correlated both temporally and spatially, a fact which makes the compression of video possible. Modern video systems employ motion estimation and motion compensation (ME/MC) to de-correlate a video sequence temporally. ME/MC forms a prediction of the current frame using the frames which have been already encoded. Consequently, one needs to transmit the corresponding residual image instead of the original frame, as well as a set of motion vectors which describe the scene motion as observed at the encoder. The redundant wavelet transform (RDWT) provides several advantages over the conventional wavelet transform (DWT). The RDWT overcomes the shift invariant problem in DWT. Moreover, RDWT retains all the phase information of wavelet coefficients and provides multiple prediction possibilities for ME/MC in wavelet domain. The general idea of variable size block motion compensation (VSBMC) technique is to partition a frame in such a way that regions with uniform translational motions are divided into larger blocks while those containing complicated motions into smaller blocks, leading to an adaptive distribution of motion vectors (MV) across the frame. The research proposed new adaptive partitioning schemes and decision criteria in RDWT that utilize more effectively the motion content of a frame in terms of various block sizes. The research also proposed a selective subpixel accuracy algorithm for the motion vector using a multiband approach. The selective subpixel accuracy reduces the computations produced by the conventional subpixel algorithm while maintaining the same accuracy. In addition, the method of overlapped block motion compensation (OBMC) is used to reduce blocking artifacts. Finally, the research extends the applications of the proposed VSBMC to the 3D video sequences. The experimental results obtained here have shown that VSBMC in the RDWT domain can be a powerful tool for video compression

    Motion compensation and very low bit rate video coding

    Get PDF
    Recently, many activities of the International Telecommunication Union (ITU) and the International Standard Organization (ISO) are leading to define new standards for very low bit-rate video coding, such as H.263 and MPEG-4 after successful applications of the international standards H.261 and MPEG-1/2 for video coding above 64kbps. However, at very low bit-rate the classic block matching based DCT video coding scheme suffers seriously from blocking artifacts which degrade the quality of reconstructed video frames considerably. To solve this problem, a new technique in which motion compensation is based on dense motion field is presented in this dissertation. Four efficient new video coding algorithms based on this new technique for very low bit-rate are proposed. (1) After studying model-based video coding algorithms, we propose an optical flow based video coding algorithm with thresh-olding techniques. A statistic model is established for distribution of intensity difference between two successive frames, and four thresholds are used to control the bit-rate and the quality of reconstructed frames. It outperforms the typical model-based techniques in terms of complexity and quality of reconstructed frames. (2) An efficient algorithm using DCT coded optical flow. It is found that dense motion fields can be modeled as the first order auto-regressive model, and efficiently compressed with DCT technique, hence achieving very low bit-rate and higher visual quality than the H.263/TMN5. (3) A region-based discrete wavelet transform video coding algorithm. This algorithm implements dense motion field and regions are segmented according to their content significance. The DWT is applied to residual images region by region, and bits are adaptively allocated to regions. It improves the visual quality and PSNR of significant regions while maintaining low bit-rate. (4) A segmentation-based video coding algorithm for stereo sequence. A correlation-feedback algorithm with Kalman filter is utilized to improve the accuracy of optical flow fields. Three criteria, which are associated with 3-D information, 2-D connectivity and motion vector fields, respectively, are defined for object segmentation. A chain code is utilized to code the shapes of the segmented objects. it can achieve very high compression ratio up to several thousands

    Fast search algorithms for digital video coding

    Get PDF
    PhD ThesisMotion Estimation algorithm is one of the important issues in video coding standards such as ISO MPEG-1/2 and ITU-T H.263. These international standards regularly use a conventional Full Search (FS) Algorithm to estimate the motion of pixels between pairs of image blocks. Since a FS method requires intensive computations and the distortion function needs to be evaluated many times for each target block. the process is very time consuming. To alleviate this acute problem, new search algorithms, Orthogonal Logarithmic Search (OLS) and Diagonal Logarithmic Search (DLS), have been designed and implemented. The performance of the algorithms are evaluated by using standard 176x 144 pixels quarter common intermediate format (QCIF) benchmark video sequences and the results are compared to the traditional well-known FS Algorithm and a widely used fast search algorithm called the Three Step Search (3SS), The fast search algorithms are known as sub-optimal algorithms as they test only some of the candidate blocks from the search area and choose a match from a subset of blocks. These algorithms can reduce the computational complexity as they do not examine all candidate blocks and hence are algorithmically faster. However, the quality is generally not as good as that of the FS algorithms but can be acceptable in terms of subjective quality. The important metrics, time and Peak Signal to Noise Ratio are used to evaluate the novel algorithms. The results show that the strength of the algorithms lie in their speed of operation as they are much faster than the FS and 3SS. The performance in speed is improved by 85.37% and 22% over the FS and 3SS respectively for the OLS. For the DLS, the speed advantages are 88.77% and 40% over the FS and 3SS. Furthermore, the accuracy of prediction of OLS and DLS are comparahle to the 3SS.Thepsatri Rajabhat University: Royal Thai Government

    State of the art 3D technologies and MVV end to end system design

    Get PDF
    L’oggetto del presente lavoro di tesi è costituito dall’analisi e dalla recensione di tutte le tecnologie 3D: esistenti e in via di sviluppo per ambienti domestici; tenendo come punto di riferimento le tecnologie multiview video (MVV). Tutte le sezioni della catena dalla fase di cattura a quella di riproduzione sono analizzate. Lo scopo è di progettare una possibile architettura satellitare per un futuro sistema MVV televisivo, nell’ambito di due possibili scenari, broadcast o interattivo. L’analisi coprirà considerazioni tecniche, ma anche limitazioni commerciali

    Enhanced life-size holographic telepresence framework with real-time three-dimensional reconstruction for dynamic scene

    Get PDF
    Three-dimensional (3D) reconstruction has the ability to capture and reproduce 3D representation of a real object or scene. 3D telepresence allows the user to feel the presence of remote user that was remotely transferred in a digital representation. Holographic display is one of alternatives to discard wearable hardware restriction, it utilizes light diffraction to display 3D images to the viewers. However, to capture a real-time life-size or a full-body human is still challenging since it involves a dynamic scene. The remaining issue arises when dynamic object to be reconstructed is always moving and changes shapes and required multiple capturing views. The life-size data captured were multiplied exponentially when working with more depth cameras, it can cause the high computation time especially involving dynamic scene. To transfer high volume 3D images over network in real-time can also cause lag and latency issue. Hence, the aim of this research is to enhance life-size holographic telepresence framework with real-time 3D reconstruction for dynamic scene. There are three stages have been carried out, in the first stage the real-time 3D reconstruction with the Marching Square algorithm is combined during data acquisition of dynamic scenes captured by life-size setup of multiple Red Green Blue-Depth (RGB-D) cameras. Second stage is to transmit the data that was acquired from multiple RGB-D cameras in real-time and perform double compression for the life-size holographic telepresence. The third stage is to evaluate the life-size holographic telepresence framework that has been integrated with the real-time 3D reconstruction of dynamic scenes. The findings show that by enhancing life-size holographic telepresence framework with real-time 3D reconstruction, it has reduced the computation time and improved the 3D representation of remote user in dynamic scene. By running the double compression for the life-size holographic telepresence, 3D representations in life-size is smooth. It has proven can minimize the delay or latency during acquired frames synchronization in remote communications

    Adapting Single-View View Synthesis with Multiplane Images for 3D Video Chat

    Get PDF
    Activities like one-on-one video chatting and video conferencing with multiple participants are more prevalent than ever today as we continue to tackle the pandemic. Bringing a 3D feel to video chat has always been a hot topic in Vision and Graphics communities. In this thesis, we have employed novel view synthesis in attempting to turn one-on-one video chatting into 3D. We have tuned the learning pipeline of Tucker and Snavely\u27s single-view view synthesis paper — by retraining it on MannequinChallenge dataset — to better predict a layered representation of the scene viewed by either video chat participant at any given time. This intermediate representation of the local light field — called a Multiplane Image (MPI) — may then be used to rerender the scene at an arbitrary viewpoint which, in our case, would match with the head pose of the watcher in the opposite, concurrent video frame. We discuss that our pipeline, when implemented in real-time, would allow both video chat participants to unravel occluded scene content and peer into each other\u27s dynamic video scenes to a certain extent. It would enable full parallax up to the baselines of small head rotations and/or translations. It would be similar to a VR headset\u27s ability to determine the position and orientation of the wearer\u27s head in 3D space and render any scene in alignment with this estimated head pose. We have attempted to improve the performance of the retrained model by extending MannequinChallenge with the much larger RealEstate10K dataset. We present a quantitative and qualitative comparison of the model variants and describe our impactful dataset curation process, among other aspects

    Efficient real-time video delivery in vehicular networks

    Full text link
    Tesis por compendio[EN] Vehicular Ad-hoc Networks (VANET) are a special type of networks where the nodes involved in the communication are vehicles. VANETs are created when several vehicles connect among themselves without the use of any infrastructure. In certain situations the absence of infrastructure is an advantage, but it also creates several challenges that should be overcome. One of the main problems related with the absence of infrastructure is the lack of a coordinator that can ensure a certain level of quality in order to enable the correct transmission of video and audio. Video transmission can be extremely useful in this type of networks as it can be used for videoconferencing of by traffic authorities to monitor the scene of an accident. In this thesis we focused on real time video transmission, providing solutions for both unicast and multicast environments. Specifically, we built a real-world testbed scenario and made a comparison with simulation results to validate the behavior of the simulation models. Using that testbed we implemented and improved DACME, an admission control module able to provide Quality of Service (QoS) to unicast video transmissions. DACME proved to be a valid solution to obtain a certain level of QoS in multi-hop environments. Concerning multicast video transmission, we developed and simulated several flooding schemes, focusing specifically on VANET environments. In this scope, the main contribution of this thesis is the Automatic Copies Distance Based (ACDB) flooding scheme. Thanks to the use of the perceived vehicular density, ACDB is a zeroconf scheme able to achieve good video quality in both urban and highway environments, being specially effective in highway environments.[ES] Las redes vehiculares ad-hoc (VANET) son un tipo especial de redes en las que los nodos que participan de la comunicación son vehículos. Las VANETs se crean cuando diversos vehículos se conectan entre ellos sin el uso de ninguna infraestructura. En determinadas situaciones, la ausencia de infraestructura es una ventaja, pero también crea una gran cantidad de desafíos que se deben superar. Uno de los principales problemas relacionados con la ausencia de infraestructura, es la ausencia de un coordinador que pueda asegurar un determinado nivel de calidad, para poder asegurar la correcta transmisión de audio y vídeo. La transmisión de vídeo puede ser de extrema utilidad en este tipo de redes ya que puede ser empleada para videoconferencias o por las autoridades de tráfico para monitorizar el estado de un accidente. En esta tesis nos centramos en la transmisión de vídeo en tiempo real, proveyendo soluciones tanto para entornos unicast como multicast. En particular construimos un banco de pruebas real y comparamos los resultados obtenidos con resultados obtenidos en un entorno simulado para comprobar la fiabilidad de estos modelos. Usando el mismo banco de pruebas, implementamos y mejoramos DACME, un módulo de control de admisión capaz de proveer de calidad de servicio a transmisiones de vídeo unicast. DACME probó ser una solución válida para obtener ciertos niveles de calidad de servicio en entornos multisalto. En lo referente a la transmisión de vídeo multicast, desarrollamos y simulamos diversos esquemas de difusión diseñados específicamente para entornos VANET. En este campo, la principal contribución de esta tesis es el esquema de difusión "Automatic Copies Distance Based" (ACDB). Gracias al uso de la densidad vehicular percibida, ACDB es un esquema, que sin necesidad de configuración, permite alcanzar una buena calidad de vídeo tanto en entornos urbanos como en autopistas, siendo especialmente efectivo en este último entorno.[CA] Les xarxes vehiculars ad-hoc (VANET) son un tipus de xarxes especials a les que els diferents nodes que formen part d'una comunicació son vehicles. Les VANETs es formen quan diversos vehicles es connecten sense fer ús de cap infraestructura. A certes situacions l'absència d'una infraestructura suposa un avantatge, encara que també genera una gran quantitat de desafiaments que s'han de superar. U dels principals problemes relacionats amb l'absència d'infraestructura, és la manca d'un coordinador que puga garantir una correcta transmissió tant de video com d'àudio. La transmissió de video pot ser d'extrema utilitat a aquest tipus de xarxes, ja que es por emprar tant per a videoconferències com per part de les autoritats de trànsit per monitoritzar l'estat d'un accident. A aquesta tesi ens centrem en transmissió de video en temps real, proporcionant solucions tant a entorns unicast como a entorns multicast. Particularment, vam construir un banc de proves i obtinguérem resultats que comparàrem amb resultats obtinguts mitjançant simulació. D'aquesta manera validarem la fiabilitat dels resultats simulats. Fent ús del mateix banc de proves, vàrem implementar i millorar DACME, un mòdul de control d'admissió, capaç de proveir de qualitat de servici a transmissions de video unicast. DACME va provar ser una bona solució per obtindré un bon nivell de qualitat de servici en entorns de xarxes ad-hoc amb diversos salts. Si ens centrem a la transmissió de video multicast, vàrem implementar i simular diferents esquemes de difusió, específicament dissenyats per al seu ús a entorns VANET. La principal contribució d'aquesta tesi es l'esquema de difusió ACDB (Automatic Copies Distance Based). Fent ús de la densitat vehicular, ACDB es capaç d'obtindre una bona qualitat de video tant a ciutats com a vies interurbanes, sent a especialment efectiu a aquestes últimes. A més a més no es necessària cap configuració per part de l'usuari.Torres Cortés, Á. (2016). Efficient real-time video delivery in vehicular networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62685TESISCompendi

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity
    corecore