9 research outputs found

    Techniques de codage d'images basées représentations parcimonieuses de scÚnes et prédiction spatiale multi-patches

    Get PDF
    In recent years, video compression eld has increased signicantly since the apparitionof H.264/AVC standard and of its successor HEVC. Spatial prediction in these standardsare based on the unidirectional propagation of neighboring pixels. Although very effectiveto extend pattern with the same characteristics, this prediction has limited performances toextrapolate complex textures. This thesis aims at exploring new spatial prediction schemesto improve the current intra prediction techniques, by extending these local schemes toglobal, multidimensional and multi-patches schemes. A hybrid prediction method based ontemplate and block matching is first investigated. This hybrid approach is then extended tomulti-patchs-based prediction of type "Neighbor Embedding" (NE). The other part of thisthesis is dedicated to the study of epitome image within the scope of image compression.The idea is to exploit spatial redundancies in the original image in order to rst extracta summary image containing the texture patches the most representative of the image,and then use this compacted representation to rebuild the original image. The conceptof epitome has been incorporated in two compression schemes, one of these algorithms isin rupture with the traditional techniques since the image blocks are processed, both atencoder and decoder sides, in a spatial order that depends on the image content and this inthe interest of propagating image structures. In this last compression algorithm, extendedH.264 Intra directional prediction modes and advanced multi-patches prediction methodshave been also included. These different solutions have been integrated in a H.264/AVCencoder in order to assess their coding performances with respect to H.264 intra modesand the state of the art relative to these dierent techniques.Au cours de ces derniĂšres annĂ©es, le domaine de la compression vidĂ©o a connu un essorconsidĂ©rable avec le standard H.264/AVC et l'arrivĂ©e de son successeur HEVC. La prĂ©dictionspatiale de ces standards repose sur la propagation unidirectionnelle de pixels voisins.Bien que trĂšs efficace pour Ă©tendre des motifs rĂ©pondants aux mĂȘmes caractĂ©ristiques,cette prĂ©diction prĂ©sente des performances limitĂ©es lorsqu'il s'agit de propager des texturescomplexes. Cette thĂšse vise Ă  explorer de nouveaux schĂ©mas de prĂ©diction spatiale afind'amĂ©liorer les techniques actuelles de prĂ©diction intra, en Ă©tendant ces schĂ©mas locaux etmonodimensionnels Ă  des schĂ©mas globaux, multidimensionnels et multi-patches. Une premiĂšre mĂ©thode de prĂ©diction hybride intĂ©grant correspondance de bloc et correspondancede gabarit (template) a Ă©tĂ© investiguĂ©e. Cette approche hybride a ensuite Ă©tĂ© Ă©tendue enprĂ©diction multi-patches de type "neighbor embedding" (NE). L'autre partie de la thĂšseest dĂ©diĂ©e Ă  l'Ă©tude des Ă©pitomes dans un contexte de compression d'images. L'idĂ©e estd'exploiter la redondance spatiale de l'image d'origine afin d'extraire une image rĂ©sumĂ©contenant les patches de texture les plus reprĂ©sentatifs de l'image, puis ensuite utilisercette reprĂ©sentation compacte pour reconstruire l'image de dĂ©part. Ce concept d'Ă©pitome aĂ©tĂ© intĂ©grĂ© dans deux schĂ©mas de compression, l'un de ces algorithmes s'avĂšre vraiment enrupture avec les techniques traditionnelles dans la mesure oĂč les blocs de l'image sont traitĂ©s, Ă  l'encodeur et au dĂ©codeur, dans un ordre spatial qui dĂ©pend du contenu et cela dansun souci de propagation des structures de l'image. Dans ce dernier algorithme de compression,des modes de prĂ©diction directionnelle intra H.264 Ă©tendus et des mĂ©thodes avancĂ©esde prĂ©diction multi-patches y ont Ă©tĂ© Ă©galement introduits. Ces diffĂ©rentes solutions ont Ă©tĂ©intĂ©grĂ©es dans un encodeur de type H.264/AVC afin d'Ă©valuer leurs performances de codagepar rapport aux modes intra H.264 et Ă  l'Ă©tat de l'art relatif Ă  ces diffĂ©rentes techniques

    BEYOND MULTI-TARGET TRACKING: STATISTICAL PATTERN ANALYSIS OF PEOPLE AND GROUPS

    Get PDF
    Ogni giorno milioni e milioni di videocamere monitorano la vita quotidiana delle persone, registrando e collezionando una grande quantit\ue0 di dati. Questi dati possono essere molto utili per scopi di video-sorveglianza: dalla rilevazione di comportamenti anomali all'analisi del traffico urbano nelle strade. Tuttavia i dati collezionati vengono usati raramente, in quanto non \ue8 pensabile che un operatore umano riesca a esaminare manualmente e prestare attenzione a una tale quantit\ue0 di dati simultaneamente. Per questo motivo, negli ultimi anni si \ue8 verificato un incremento della richiesta di strumenti per l'analisi automatica di dati acquisiti da sistemi di video-sorveglianza in modo da estrarre informazione di pi\uf9 alto livello (per esempio, John, Sam e Anne stanno camminando in gruppo al parco giochi vicino alla stazione) a partire dai dati a disposizione che sono solitamente a basso livello e ridondati (per esempio, una sequenza di immagini). L'obiettivo principale di questa tesi \ue8 quello di proporre soluzioni e algoritmi automatici che permettono di estrarre informazione ad alto livello da una zona di interesse che viene monitorata da telecamere. Cos\uec i dati sono rappresentati in modo da essere facilmente interpretabili e analizzabili da qualsiasi persona. In particolare, questo lavoro \ue8 focalizzato sull'analisi di persone e i loro comportamenti sociali collettivi. Il titolo della tesi, beyond multi-target tracking, evidenzia lo scopo del lavoro: tutti i metodi proposti in questa tesi che si andranno ad analizzare hanno come comune denominatore il target tracking. Inoltre andremo oltre le tecniche standard per arrivare a una rappresentazione del dato a pi\uf9 alto livello. Per prima cosa, analizzeremo il problema del target tracking in quanto \ue8 alle basi di questo lavoro. In pratica, target tracking significa stimare la posizione di ogni oggetto di interesse in un immagine e la sua traiettoria nel tempo. Analizzeremo il problema da due prospettive complementari: 1) il punto di vista ingegneristico, dove l'obiettivo \ue8 quello di creare algoritmi che ottengono i risultati migliori per il problema in esame. 2) Il punto di vista della neuroscienza: motivati dalle teorie che cercano di spiegare il funzionamento del sistema percettivo umano, proporremo in modello attenzionale per tracking e il riconoscimento di oggetti e persone. Il secondo problema che andremo a esplorare sar\ue0 l'estensione del tracking alla situazione dove pi\uf9 telecamere sono disponibili. L'obiettivo \ue8 quello di mantenere un identificatore univoco per ogni persona nell'intera rete di telecamere. In altre parole, si vuole riconoscere gli individui che vengono monitorati in posizioni e telecamere diverse considerando un database di candidati. Tale problema \ue8 chiamato in letteratura re-indetificazione di persone. In questa tesi, proporremo un modello standard di come affrontare il problema. In questo modello, presenteremo dei nuovi descrittori di aspetto degli individui, in quanto giocano un ruolo importante allo scopo di ottenere i risultati migliori. Infine raggiungeremo il livello pi\uf9 alto di rappresentazione dei dati che viene affrontato in questa tesi, che \ue8 l'analisi di interazioni sociali tra persone. In particolare, ci focalizzeremo in un tipo specifico di interazione: il raggruppamento di persone. Proporremo dei metodi di visione computazionale che sfruttano nozioni di psicologia sociale per rilevare gruppi di persone. Inoltre, analizzeremo due modelli probabilistici che affrontano il problema di tracking (congiunto) di gruppi e individui.Every day millions and millions of surveillance cameras monitor the world, recording and collecting huge amount of data. The collected data can be extremely useful: from the behavior analysis to prevent unpleasant events, to the analysis of the traffic. However, these valuable data is seldom used, because of the amount of information that the human operator has to manually attend and examine. It would be like looking for a needle in the haystack. The automatic analysis of data is becoming mandatory for extracting summarized high-level information (e.g., John, Sam and Anne are walking together in group at the playground near the station) from the available redundant low-level data (e.g., an image sequence). The main goal of this thesis is to propose solutions and automatic algorithms that perform high-level analysis of a camera-monitored environment. In this way, the data are summarized in a high-level representation for a better understanding. In particular, this work is focused on the analysis of moving people and their collective behaviors. The title of the thesis, beyond multi-target tracking, mirrors the purpose of the work: we will propose methods that have the target tracking as common denominator, and go beyond the standard techniques in order to provide a high-level description of the data. First, we investigate the target tracking problem as it is the basis of all the next work. Target tracking estimates the position of each target in the image and its trajectory over time. We analyze the problem from two complementary perspectives: 1) the engineering point of view, where we deal with problem in order to obtain the best results in terms of accuracy and performance. 2) The neuroscience point of view, where we propose an attentional model for tracking and recognition of objects and people, motivated by theories of the human perceptual system. Second, target tracking is extended to the camera network case, where the goal is to keep a unique identifier for each person in the whole network, i.e., to perform person re-identification. The goal is to recognize individuals in diverse locations over different non-overlapping camera views or also the same camera, considering a large set of candidates. In this context, we propose a pipeline and appearance-based descriptors that enable us to define in a proper way the problem and to reach the-state-of-the-art results. Finally, the higher level of description investigated in this thesis is the analysis (discovery and tracking) of social interaction between people. In particular, we focus on finding small groups of people. We introduce methods that embed notions of social psychology into computer vision algorithms. Then, we extend the detection of social interaction over time, proposing novel probabilistic models that deal with (joint) individual-group tracking

    Computational Intelligence for the Micro Learning

    Get PDF
    The developments of the Web technology and the mobile devices have blurred the time and space boundaries of people’s daily activities, which enable people to work, entertain, and learn through the mobile device at almost anytime and anywhere. Together with the life-long learning requirement, such technology developments give birth to a new learning style, micro learning. Micro learning aims to effectively utilise learners’ fragmented spare time and carry out personalised learning activities. However, the massive volume of users and the online learning resources force the micro learning system deployed in the context of enormous and ubiquitous data. Hence, manually managing the online resources or user information by traditional methods are no longer feasible. How to utilise computational intelligence based solutions to automatically managing and process different types of massive information is the biggest research challenge for realising the micro learning service. As a result, to facilitate the micro learning service in the big data era efficiently, we need an intelligent system to manage the online learning resources and carry out different analysis tasks. To this end, an intelligent micro learning system is designed in this thesis. The design of this system is based on the service logic of the micro learning service. The micro learning system consists of three intelligent modules: learning material pre-processing module, learning resource delivery module and the intelligent assistant module. The pre-processing module interprets the content of the raw online learning resources and extracts key information from each resource. The pre-processing step makes the online resources ready to be used by other intelligent components of the system. The learning resources delivery module aims to recommend personalised learning resources to the target user base on his/her implicit and explicit user profiles. The goal of the intelligent assistant module is to provide some evaluation or assessment services (such as student dropout rate prediction and final grade prediction) to the educational resource providers or instructors. The educational resource providers can further refine or modify the learning materials based on these assessment results

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF

    Patch-based models for visual object classes

    Get PDF
    This thesis concerns models for visual object classes that exhibit a reasonable amount of regularity, such as faces, pedestrians, cells and human brains. Such models are useful for making “within-object” inferences such as determining their individual characteristics and establishing their identity. For example, the model could be used to predict the identity of a face, the pose of a pedestrian or the phenotype of a cell and segment parts of a human brain. Existing object modelling techniques have several limitations. First, most current methods have targeted the above tasks individually using object specific representations; therefore, they cannot be applied to other problems without major alterations. Second, most methods have been designed to work with small databases which do not contain the variations in pose, illumination, occlusion and background clutter seen in ‘real world’ images. Consequently, many existing algorithms fail when tested on unconstrained databases. Finally, the complexity of the training procedure in these methods makes it impractical to use large datasets. In this thesis, we investigate patch-based models for object classes. Our models are capable of exploiting very large databases of objects captured in uncontrolled environments. We represent the test image with a regular grid of patches from a library of images of the same object. All the domain specific information is held in this library: we use one set of images of the object to help draw inferences about others. In each experimental chapter we investigate a different within-object inference task. In particular we develop models for classification, regression, semantic segmentation and identity recognition. In each task, we achieve results that are comparable to or better than the state of the art. We conclude that patch-based representation can be successfully used for the above tasks and shows promise for other applications such as generation and localization

    Epitomic image factorization via neighbor-embedding

    Get PDF
    International audienceWe describe a novel epitomic image representation scheme that factors a given image content into a condensed epitome and a low-resolution image to reduce the memory space for images. Given an input image, we construct a condensed epitome such that all image patches can successfully be reconstructed from the factored representation by means of an optimized neighbor-embedding strategy. Under this new scope of epitomic image representations aligned with the manifold sampling assumption, we end up a more generic epitome learning scheme with increased optimality, compactness, and reconstruction stability. We present the performance of the proposed method for image and video up-scaling (super-resolution) while extensions to other image and video processing are straightforward
    corecore