244 research outputs found

    Construction de mosaïques de super-résolution à partir de la vidéo de basse résolution. Application au résumé vidéo et la dissimulation d'erreurs de transmission.

    Get PDF
    La numĂ©risation des vidĂ©os existantes ainsi que le dĂ©veloppement explosif des services multimĂ©dia par des rĂ©seaux comme la diffusion de la tĂ©lĂ©vision numĂ©rique ou les communications mobiles ont produit une Ă©norme quantitĂ© de vidĂ©os compressĂ©es. Ceci nĂ©cessite des outils d’indexation et de navigation efficaces, mais une indexation avant l’encodage n’est pas habituelle. L’approche courante est le dĂ©codage complet des ces vidĂ©os pour ensuite crĂ©er des indexes. Ceci est trĂšs coĂ»teux et par consĂ©quent non rĂ©alisable en temps rĂ©el. De plus, des informations importantes comme le mouvement, perdus lors du dĂ©codage, sont reestimĂ©es bien que dĂ©jĂ  prĂ©sentes dans le flux comprimĂ©. Notre but dans cette thĂšse est donc la rĂ©utilisation des donnĂ©es dĂ©jĂ  prĂ©sents dans le flux comprimĂ© MPEG pour l’indexation et la navigation rapide. Plus prĂ©cisĂ©ment, nous extrayons des coefficients DC et des vecteurs de mouvement. Dans le cadre de cette thĂšse, nous nous sommes en particulier intĂ©ressĂ©s Ă  la construction de mosaĂŻques Ă  partir des images DC extraites des images I. Une mosaĂŻque est construite par recalage et fusion de toutes les images d’une sĂ©quence vidĂ©o dans un seul systĂšme de coordonnĂ©es. Ce dernier est en gĂ©nĂ©ral alignĂ© avec une des images de la sĂ©quence : l’image de rĂ©fĂ©rence. Il en rĂ©sulte une seule image qui donne une vue globale de la sĂ©quence. Ainsi, nous proposons dans cette thĂšse un systĂšme complet pour la construction des mosaĂŻques Ă  partir du flux MPEG-1/2 qui tient compte de diffĂ©rentes problĂšmes apparaissant dans des sĂ©quences vidĂ©o rĂ©eles, comme par exemple des objets en mouvment ou des changements d’éclairage. Une tĂąche essentielle pour la construction d’une mosaĂŻque est l’estimation de mouvement entre chaque image de la sĂ©quence et l’image de rĂ©fĂ©rence. Notre mĂ©thode se base sur une estimation robuste du mouvement global de la camĂ©ra Ă  partir des vecteurs de mouvement des images P. Cependant, le mouvement global de la camĂ©ra estimĂ© pour une image P peut ĂȘtre incorrect car il dĂ©pend fortement de la prĂ©cision des vecteurs encodĂ©s. Nous dĂ©tectons les images P concernĂ©es en tenant compte des coefficients DC de l’erreur encodĂ©e associĂ©e et proposons deux mĂ©thodes pour corriger ces mouvements. UnemosaĂŻque construite Ă  partir des images DC a une rĂ©solution trĂšs faible et souffre des effets d’aliasing dus Ă  la nature des images DC. Afin d’augmenter sa rĂ©solution et d’amĂ©liorer sa qualitĂ© visuelle, nous appliquons une mĂ©thode de super-rĂ©solution basĂ©e sur des rĂ©tro-projections itĂ©ratives. Les mĂ©thodes de super-rĂ©solution sont Ă©galement basĂ©es sur le recalage et la fusion des images d’une sĂ©quence vidĂ©o, mais sont accompagnĂ©es d’une restauration d’image. Dans ce cadre, nous avons dĂ©veloppĂ© une nouvellemĂ©thode d’estimation de flou dĂ» au mouvement de la camĂ©ra ainsi qu’une mĂ©thode correspondante de restauration spectrale. La restauration spectrale permet de traiter le flou globalement, mais, dans le cas des obvi jets ayant un mouvement indĂ©pendant du mouvement de la camĂ©ra, des flous locaux apparaissent. C’est pourquoi, nous proposons un nouvel algorithme de super-rĂ©solution dĂ©rivĂ© de la restauration spatiale itĂ©rative de Van Cittert et Jansson permettant de restaurer des flous locaux. En nous basant sur une segmentation d’objets en mouvement, nous restaurons sĂ©parĂ©ment lamosaĂŻque d’arriĂšre-plan et les objets de l’avant-plan. Nous avons adaptĂ© notre mĂ©thode d’estimation de flou en consĂ©quence. Dans une premier temps, nous avons appliquĂ© notre mĂ©thode Ă  la construction de rĂ©sumĂ© vidĂ©o avec pour l’objectif la navigation rapide par mosaĂŻques dans la vidĂ©o compressĂ©e. Puis, nous Ă©tablissions comment la rĂ©utilisation des rĂ©sultats intermĂ©diaires sert Ă  d’autres tĂąches d’indexation, notamment Ă  la dĂ©tection de changement de plan pour les images I et Ă  la caractĂ©risation dumouvement de la camĂ©ra. Enfin, nous avons explorĂ© le domaine de la rĂ©cupĂ©ration des erreurs de transmission. Notre approche consiste en construire une mosaĂŻque lors du dĂ©codage d’un plan ; en cas de perte de donnĂ©es, l’information manquante peut ĂȘtre dissimulĂ©e grace Ă  cette mosaĂŻque

    Image analysis using visual saliency with applications in hazmat sign detection and recognition

    Get PDF
    Visual saliency is the perceptual process that makes attractive objects stand out from their surroundings in the low-level human visual system. Visual saliency has been modeled as a preprocessing step of the human visual system for selecting the important visual information from a scene. We investigate bottom-up visual saliency using spectral analysis approaches. We present separate and composite model families that generalize existing frequency domain visual saliency models. We propose several frequency domain visual saliency models to generate saliency maps using new spectrum processing methods and an entropy-based saliency map selection approach. A group of saliency map candidates are then obtained by inverse transform. A final saliency map is selected among the candidates by minimizing the entropy of the saliency map candidates. The proposed models based on the separate and composite model families are also extended to various color spaces. We develop an evaluation tool for benchmarking visual saliency models. Experimental results show that the proposed models are more accurate and efficient than most state-of-the-art visual saliency models in predicting eye fixation.^ We use the above visual saliency models to detect the location of hazardous material (hazmat) signs in complex scenes. We develop a hazmat sign location detection and content recognition system using visual saliency. Saliency maps are employed to extract salient regions that are likely to contain hazmat sign candidates and then use a Fourier descriptor based contour matching method to locate the border of hazmat signs in these regions. This visual saliency based approach is able to increase the accuracy of sign location detection, reduce the number of false positive objects, and speed up the overall image analysis process. We also propose a color recognition method to interpret the color inside the detected hazmat sign. Experimental results show that our proposed hazmat sign location detection method is capable of detecting and recognizing projective distorted, blurred, and shaded hazmat signs at various distances.^ In other work we investigate error concealment for scalable video coding (SVC). When video compressed with SVC is transmitted over loss-prone networks, the decompressed video can suffer severe visual degradation across multiple frames. In order to enhance the visual quality, we propose an inter-layer error concealment method using motion vector averaging and slice interleaving to deal with burst packet losses and error propagation. Experimental results show that the proposed error concealment methods outperform two existing methods

    Error Resilient Video Coding Using Bitstream Syntax And Iterative Microscopy Image Segmentation

    Get PDF
    There has been a dramatic increase in the amount of video traffic over the Internet in past several years. For applications like real-time video streaming and video conferencing, retransmission of lost packets is often not permitted. Popular video coding standards such as H.26x and VPx make use of spatial-temporal correlations for compression, typically making compressed bitstreams vulnerable to errors. We propose several adaptive spatial-temporal error concealment approaches for subsampling-based multiple description video coding. These adaptive methods are based on motion and mode information extracted from the H.26x video bitstreams. We also present an error resilience method using data duplication in VPx video bitstreams. A recent challenge in image processing is the analysis of biomedical images acquired using optical microscopy. Due to the size and complexity of the images, automated segmentation methods are required to obtain quantitative, objective and reproducible measurements of biological entities. In this thesis, we present two techniques for microscopy image analysis. Our first method, “Jelly Filling” is intended to provide 3D segmentation of biological images that contain incompleteness in dye labeling. Intuitively, this method is based on filling disjoint regions of an image with jelly-like fluids to iteratively refine segments that represent separable biological entities. Our second method selectively uses a shape-based function optimization approach and a 2D marked point process simulation, to quantify nuclei by their locations and sizes. Experimental results exhibit that our proposed methods are effective in addressing the aforementioned challenges

    Error resilient packet switched H.264 video telephony over third generation networks.

    Get PDF
    Real-time video communication over wireless networks is a challenging problem because wireless channels suffer from fading, additive noise and interference, which translate into packet loss and delay. Since modern video encoders deliver video packets with decoding dependencies, packet loss and delay can significantly degrade the video quality at the receiver. Many error resilience mechanisms have been proposed to combat packet loss in wireless networks, but only a few were specifically designed for packet switched video telephony over Third Generation (3G) networks. The first part of the thesis presents an error resilience technique for packet switched video telephony that combines application layer Forward Error Correction (FEC) with rateless codes, Reference Picture Selection (RPS) and cross layer optimization. Rateless codes have lower encoding and decoding computational complexity compared to traditional error correcting codes. One can use them on complexity constrained hand-held devices. Also, their redundancy does not need to be fixed in advance and any number of encoded symbols can be generated on the fly. Reference picture selection is used to limit the effect of spatio-temporal error propagation. Limiting the effect of spatio-temporal error propagation results in better video quality. Cross layer optimization is used to minimize the data loss at the application layer when data is lost at the data link layer. Experimental results on a High Speed Packet Access (HSPA) network simulator for H.264 compressed standard video sequences show that the proposed technique achieves significant Peak Signal to Noise Ratio (PSNR) and Percentage Degraded Video Duration (PDVD) improvements over a state of the art error resilience technique known as Interactive Error Control (IEC), which is a combination of Error Tracking and feedback based Reference Picture Selection. The improvement is obtained at a cost of higher end-to-end delay. The proposed technique is improved by making the FEC (Rateless code) redundancy channel adaptive. Automatic Repeat Request (ARQ) is used to adjust the redundancy of the Rateless codes according to the channel conditions. Experimental results show that the channel adaptive scheme achieves significant PSNR and PDVD improvements over the static scheme for a simulated Long Term Evolution (LTE) network. In the third part of the thesis, the performance of the previous two schemes is improved by making the transmitter predict when rateless decoding will fail. In this case, reference picture selection is invoked early and transmission of encoded symbols for that source block is aborted. Simulations for an LTE network show that this results in video quality improvement and bandwidth savings. In the last part of the thesis, the performance of the adaptive technique is improved by exploiting the history of the wireless channel. In a Rayleigh fading wireless channel, the RLC-PDU losses are correlated under certain conditions. This correlation is exploited to adjust the redundancy of the Rateless code and results in higher Rateless code decoding success rate and higher video quality. Simulations for an LTE network show that the improvement was significant when the packet loss rate in the two wireless links was 10%. To facilitate the implementation of the proposed error resilience techniques in practical scenarios, RTP/UDP/IP level packetization schemes are also proposed for each error resilience technique. Compared to existing work, the proposed error resilience techniques provide better video quality. Also, more emphasis is given to implementation issues in 3G networks

    Evaluating and improving the performance of video content distribution in lossy networks

    Get PDF
    The contributions in this research are split in to three distinct, but related, areas. The focus of the work is based on improving the efficiency of video content distribution in the networks that are liable to packet loss, such as the Internet. Initially, the benefits and limitations of content distribution using Forward Error Correction (FEC) in conjunction with the Transmission Control Protocol (TCP) is presented. Since added FEC can be used to reduce the number of retransmissions, the requirement for TCP to deal with any losses is greatly reduced. When real-time applications are needed, delay must be kept to a minimum, and retransmissions not desirable. A balance, therefore, between additional bandwidth and delays due to retransmissions must be struck. This is followed by the proposal of a hybrid transport, specifically for H.264 encoded video, as a compromise between the delay-prone TCP and the loss-prone UDP. It is argued that the playback quality at the receiver often need not be 100% perfect, providing a certain level is assured. Reliable TCP is used to transmit and guarantee delivery of the most important packets. The delay associated with the proposal is measured, and the potential for use as an alternative to the conventional methods of transporting video by either TCP or UDP alone is demonstrated. Finally, a new objective measurement is investigated for assessing the playback quality of video transported using TCP. A new metric is defined to characterise the quality of playback in terms of its continuity. Using packet traces generated from real TCP connections in a lossy environment, simulating the playback of a video is possible, whilst monitoring buffer behaviour to calculate pause intensity values. Subjective tests are conducted to verify the effectiveness of the metric introduced and show that the results of objective and subjective scores made are closely correlated
    • 

    corecore