199 research outputs found

    Livrable D4.2 of the PERSEE project : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architecture

    Get PDF
    51Livrable D4.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.2 du projet. Son titre : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architectur

    Selected topics in video coding and computer vision

    Get PDF
    Video applications ranging from multimedia communication to computer vision have been extensively studied in the past decades. However, the emergence of new applications continues to raise questions that are only partially answered by existing techniques. This thesis studies three selected topics related to video: intra prediction in block-based video coding, pedestrian detection and tracking in infrared imagery, and multi-view video alignment.;In the state-of-art video coding standard H.264/AVC, intra prediction is defined on the hierarchical quad-tree based block partitioning structure which fails to exploit the geometric constraint of edges. We propose a geometry-adaptive block partitioning structure and a new intra prediction algorithm named geometry-adaptive intra prediction (GAIP). A new texture prediction algorithm named geometry-adaptive intra displacement prediction (GAIDP) is also developed by extending the original intra displacement prediction (IDP) algorithm with the geometry-adaptive block partitions. Simulations on various test sequences demonstrate that intra coding performance of H.264/AVC can be significantly improved by incorporating the proposed geometry adaptive algorithms.;In recent years, due to the decreasing cost of thermal sensors, pedestrian detection and tracking in infrared imagery has become a topic of interest for night vision and all weather surveillance applications. We propose a novel approach for detecting and tracking pedestrians in infrared imagery based on a layered representation of infrared images. Pedestrians are detected from the foreground layer by a Principle Component Analysis (PCA) based scheme using the appearance cue. To facilitate the task of pedestrian tracking, we formulate the problem of shot segmentation and present a graph matching-based tracking algorithm. Simulations with both OSU Infrared Image Database and WVU Infrared Video Database are reported to demonstrate the accuracy and robustness of our algorithms.;Multi-view video alignment is a process to facilitate the fusion of non-synchronized multi-view video sequences for various applications including automatic video based surveillance and video metrology. In this thesis, we propose an accurate multi-view video alignment algorithm that iteratively aligns two sequences in space and time. To achieve an accurate sub-frame temporal alignment, we generalize the existing phase-correlation algorithm to 3-D case. We also present a novel method to obtain the ground-truth of the temporal alignment by using supplementary audio signals sampled at a much higher rate. The accuracy of our algorithm is verified by simulations using real-world sequences

    The Multiplicative Zak Transform, Dimension Reduction, and Wavelet Analysis of LIDAR Data

    Get PDF
    This thesis broadly introduces several techniques within the context of timescale analysis. The representation, compression and reconstruction of DEM and LIDAR data types is studied with directional wavelet methods and the wedgelet decomposition. The optimality of the contourlet transform, and then the wedgelet transform is evaluated with a valuable new structural similarity index. Dimension reduction for material classification is conducted with a frame-based kernel pipeline and a spectral-spatial method using wavelet packets. It is shown that these techniques can improve on baseline material classification methods while significantly reducing the amount of data. Finally, the multiplicative Zak transform is modified to allow the study and partial characterization of wavelet frames

    A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

    Full text link
    The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.Comment: 65 pages, 33 figures, 303 reference

    Hierarchical Structuring of Video Previews by Leading-Cluster-Analysis

    Get PDF
    3noClustering of shots is frequently used for accessing video data and enabling quick grasping of the associated content. In this work we first group video shots by a classic hierarchical algorithm, where shot content is described by a codebook of visual words and different codebooks are compared by a suitable measure of distortion. To deal with the high number of levels in a hierarchical tree, a novel procedure of Leading-Cluster-Analysis is then proposed to extract a reduced set of hierarchically arranged previews. The depth of the obtained structure is driven both from the nature of the visual content information, and by the user needs, who can navigate the obtained video previews at various levels of representation. The effectiveness of the proposed method is demonstrated by extensive tests and comparisons carried out on a large collection of video data. of digital videos has not been accompanied by a parallel increase in its accessibility. In this context, video abstraction techniques may represent a key components of a practical video management system: indeed a condensed video may be effective for a quick browsing or retrieval tasks. A commonly accepted type of abstract for generic videos does not exist yet, and the solutions investigated so far depend usually on the nature and the genre of video data.openopenBenini, Sergio; Migliorati, Pierangelo; Leonardi, RiccardoBenini, Sergio; Migliorati, Pierangelo; Leonardi, Riccard

    Image coding using wavelets, interval wavelets and multi- layered wedgelets

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    3D coding tools final report

    Get PDF
    Livrable D4.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.3 du projet. Son titre : 3D coding tools final repor
    • …
    corecore