103 research outputs found

    Multi-Modality Human Action Recognition

    Get PDF
    Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model

    Finding Objects of Interest in Images using Saliency and Superpixels

    Get PDF
    The ability to automatically find objects of interest in images is useful in the areas of compression, indexing and retrieval, re-targeting, and so on. There are two classes of such algorithms – those that find any object of interest with no prior knowledge, independent of the task, and those that find specific objects of interest known a priori. The former class of algorithms tries to detect objects in images that stand-out, i.e. are salient, by virtue of being different from the rest of the image and consequently capture our attention. The detection is generic in this case as there is no specific object we are trying to locate. The latter class of algorithms detects specific known objects of interest and often requires training using features extracted from known examples. In this thesis we address various aspects of finding objects of interest under the topics of saliency detection and object detection. We present two saliency detection algorithms that rely on the principle of center-surround contrast. These two algorithms are shown to be superior to several state-of-the-art techniques in terms of precision and recall measures with respect to a ground truth. They output full-resolution saliency maps, are simpler to implement, and are computationally more efficient than most existing algorithms. We further establish the relevance of our saliency detection algorithms by using them for the known applications of object segmentation and image re-targeting. We first present three different techniques for salient object segmentation using our saliency maps that are based on clustering, graph-cuts, and geodesic distance based labeling. We then demonstrate the use of our saliency maps for a popular technique of content-aware image resizing and compare the result with that of existing methods. Our saliency maps prove to be a much more effective replacement for conventional gradient maps for providing automatic content-awareness. Just as it is important to find regions of interest in images, it is also important to find interesting images within a large collection of images. We therefore extend the notion of saliency detection in images to image databases. We propose an algorithm for finding salient images in a database. Apart from finding such images we also present two novel techniques for creating visually appealing summaries in the form of collages and mosaics. Finally, we address the problem of finding specific known objects of interest in images. Specifically, we deal with the feature extraction step that is a pre-requisite for any technique in this domain. In this context, we first present a superpixel segmentation algorithm that outperforms previous algorithms in terms quantitative measures of under-segmentation error and boundary recall. Our superpixel segmentation algorithm also offers several other advantages over existing algorithms like compactness, uniform size, control on the number of superpixels, and computational efficiency. We prove the effectiveness of our superpixels by deploying them in existing algorithms, specifically, an object class detection technique and a graph based algorithm, and improving their performance. We also present the result of using our superpixels in a technique for detecting mitochondria in noisy medical images

    The European Settlement Map 2019 release

    Get PDF
    The ESM_2015 is the latest release of the European Settlement Map produced in the frame of the GHSL project. It is produced with the Global Human Settlement Layer (GHSL) technology of the Joint Research Centre (JRC) in collaboration with the Directorate General of Regional and Urban Policy. The workflow was executed on the JRC Big Data Analytics platform. It follows-up on the previous ESM_2012 derived from 2.5 m resolution SPOT-5/6 images acquired in the context of the pan-European GMES/Copernicus (Core_003) dataset for the reference year 2012. The ESM_2015 product exploits the Copernicus VHR_IMAGE_2015 dataset made of satellite images Pleiades, Deimos-02, WorldView-2, WorldView-3, GeoEye-01 and Spot 6/7 ranging from 2014 to 2016. Unlike the previous ESM versions, the built-up extraction is realized through supervised learning (and not only by means of image filtering and processing techniques) based on textural and morphological features. The workflow is fully automated and it does not include any post-processing. For the first time a new layer containing non-residential buildings was derived by using only remote sensing imagery and training data. The produced built-up map is delivered at 2 m pixel resolution (level 1 layer) while the residential/non-residential layer (level 2) is delivered at 10 m spatial resolution. ESM_2015 offers new opportunities in Earth observation related research by allowing to study urbanisation and related features across Europe in urban and rural areas, from continental to country perspective, from regional to local, until single blocks. ESM_2015 was validated against the LUCAS 2015 survey database both at 2 and 10 meters resolution (including also the non-residential class). The validation has resulted in a Balanced Accuracy of 0.81 for the 2 m resolution built-up layer and of 0.71 for the 10 m non-residential built-up layer.JRC.E.1-Disaster Risk Managemen

    Very High Resolution (VHR) Satellite Imagery: Processing and Applications

    Get PDF
    Recently, growing interest in the use of remote sensing imagery has appeared to provide synoptic maps of water quality parameters in coastal and inner water ecosystems;, monitoring of complex land ecosystems for biodiversity conservation; precision agriculture for the management of soils, crops, and pests; urban planning; disaster monitoring, etc. However, for these maps to achieve their full potential, it is important to engage in periodic monitoring and analysis of multi-temporal changes. In this context, very high resolution (VHR) satellite-based optical, infrared, and radar imaging instruments provide reliable information to implement spatially-based conservation actions. Moreover, they enable observations of parameters of our environment at greater broader spatial and finer temporal scales than those allowed through field observation alone. In this sense, recent very high resolution satellite technologies and image processing algorithms present the opportunity to develop quantitative techniques that have the potential to improve upon traditional techniques in terms of cost, mapping fidelity, and objectivity. Typical applications include multi-temporal classification, recognition and tracking of specific patterns, multisensor data fusion, analysis of land/marine ecosystem processes and environment monitoring, etc. This book aims to collect new developments, methodologies, and applications of very high resolution satellite data for remote sensing. The works selected provide to the research community the most recent advances on all aspects of VHR satellite remote sensing

    Fusing Multimedia Data Into Dynamic Virtual Environments

    Get PDF
    In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments. First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotemporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identified several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism. We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality. Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education

    Geometric and photometric affine invariant image registration

    Get PDF
    This thesis aims to present a solution to the correspondence problem for the registration of wide-baseline images taken from uncalibrated cameras. We propose an affine invariant descriptor that combines the geometry and photometry of the scene to find correspondences between both views. The geometric affine invariant component of the descriptor is based on the affine arc-length metric, whereas the photometry is analysed by invariant colour moments. A graph structure represents the spatial distribution of the primitive features; i.e. nodes correspond to detected high-curvature points, whereas arcs represent connectivities by extracted contours. After matching, we refine the search for correspondences by using a maximum likelihood robust algorithm. We have evaluated the system over synthetic and real data. The method is endemic to propagation of errors introduced by approximations in the system.BAE SystemsSelex Sensors and Airborne System

    Spectral-spatial Feature Extraction for Hyperspectral Image Classification

    Get PDF
    As an emerging technology, hyperspectral imaging provides huge opportunities in both remote sensing and computer vision. The advantage of hyperspectral imaging comes from the high resolution and wide range in the electromagnetic spectral domain which reflects the intrinsic properties of object materials. By combining spatial and spectral information, it is possible to extract more comprehensive and discriminative representation for objects of interest than traditional methods, thus facilitating the basic pattern recognition tasks, such as object detection, recognition, and classification. With advanced imaging technologies gradually available for universities and industry, there is an increased demand to develop new methods which can fully explore the information embedded in hyperspectral images. In this thesis, three spectral-spatial feature extraction methods are developed for salient object detection, hyperspectral face recognition, and remote sensing image classification. Object detection is an important task for many applications based on hyperspectral imaging. While most traditional methods rely on the pixel-wise spectral response, many recent efforts have been put on extracting spectral-spatial features. In the first approach, we extend Itti's visual saliency model to the spectral domain and introduce the spectral-spatial distribution based saliency model for object detection. This procedure enables the extraction of salient spectral features in the scale space, which is related to the material property and spatial layout of objects. Traditional 2D face recognition has been studied for many years and achieved great success. Nonetheless, there is high demand to explore unrevealed information other than structures and textures in spatial domain in faces. Hyperspectral imaging meets such requirements by providing additional spectral information on objects, in completion to the traditional spatial features extracted in 2D images. In the second approach, we propose a novel 3D high-order texture pattern descriptor for hyperspectral face recognition, which effectively exploits both spatial and spectral features in hyperspectral images. Based on the local derivative pattern, our method encodes hyperspectral faces with multi-directional derivatives and binarization function in spectral-spatial space. Compared to traditional face recognition methods, our method can describe distinctive micro-patterns which integrate the spatial and spectral information of faces. Mathematical morphology operations are limited to extracting spatial feature in two-dimensional data and cannot cope with hyperspectral images due to so-called ordering problem. In the third approach, we propose a novel multi-dimensional morphology descriptor, tensor morphology profile~(TMP), for hyperspectral image classification. TMP is a general framework to extract multi-dimensional structures in high-dimensional data. The n-order morphology profile is proposed to work with the n-order tensor, which can capture the inner high order structures. By treating a hyperspectral image as a tensor, it is possible to extend the morphology to high dimensional data so that powerful morphological tools can be used to analyze hyperspectral images with fused spectral-spatial information. At last, we discuss the sampling strategy for the evaluation of spectral-spatial methods in remote sensing hyperspectral image classification. We find that traditional pixel-based random sampling strategy for spectral processing will lead to unfair or biased performance evaluation in the spectral-spatial processing context. When training and testing samples are randomly drawn from the same image, the dependence caused by overlap between them may be artificially enhanced by some spatial processing methods. It is hard to determine whether the improvement of classification accuracy is caused by incorporating spatial information into the classifier or by increasing the overlap between training and testing samples. To partially solve this problem, we propose a novel controlled random sampling strategy for spectral-spatial methods. It can significantly reduce the overlap between training and testing samples and provides more objective and accurate evaluation

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered
    • …
    corecore