Search CORE

1,915 research outputs found

Motion and disparity estimation with self adapted evolutionary strategy in 3D video coding

Author: Adedoyin S
Aggoun A
Fernando WAC
Kondoz KM
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2007
Field of study

Real world information, obtained by humans is three dimensional (3-D). In experimental user-trials, subjective assessments have clearly demonstrated the increased impact of 3-D pictures compared to conventional flat-picture techniques. It is reasonable, therefore, that we humans want an imaging system that produces pictures that are as natural and real as things we see and experience every day. Three-dimensional imaging and hence, 3-D television (3DTV) are very promising approaches expected to satisfy these desires. Integral imaging, which can capture true 3D color images with only one camera, has been seen as the right technology to offer stress-free viewing to audiences of more than one person. In this paper, we propose a novel approach to use Evolutionary Strategy (ES) for joint motion and disparity estimation to compress 3D integral video sequences. We propose to decompose the integral video sequence down to viewpoint video sequences and jointly exploit motion and disparity redundancies to maximize the compression using a self adapted ES. A half pixel refinement algorithm is then applied by interpolating macro blocks in the previous frame to further improve the video quality. Experimental results demonstrate that the proposed adaptable ES with Half Pixel Joint Motion and Disparity Estimation can up to 1.5 dB objective quality gain without any additional computational cost over our previous algorithm.1Furthermore, the proposed technique get similar objective quality compared to the full search algorithm by reducing the computational cost up to 90%

Crossref

Surrey Research Insight

Brunel University Research Archive

Spatial prediction based on self-similarity compensation for 3D holoscopic image and video coding

Author: Conti C.
Correia P. L.
Lino J.
Nunes P.
Soares L. D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

WOS:000298962501022 (Nº de Acesso Web of Science)Holoscopic imaging, also known as integral imaging, provides a solution for glassless 3D, and is promising to change the market for 3D television. To start, this paper briefly describes the general concepts of holoscopic imaging, focusing mainly on the spatial correlations inherent to this new type of content, which appear due to the micro-lens array that is used for both acquisition and display. The micro-images that are formed behind each micro-lens, from which only one pixel is viewed from a given observation point, have a high cross-correlation between them, which can be exploited for coding. A novel scheme for spatial prediction, exploring the particular arrangement of holoscopic images, is proposed. The proposed scheme can be used for both still image coding and intra-coding of video. Experimental results based on an H.264/AVC video codec modified to handle 3D holoscopic images and video are presented, showing the superior performance of this approach

Repositório Institucional do ISCTE-IUL

Cortical Dynamics of Navigation and Steering in Natural Scenes: Motion-Based Object Segmentation, Heading, and Obstacle Avoidance

Author: Browning Andrew N.
Grossberg Stephen
Mingolla Ennio
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/12/2008
Field of study

Visually guided navigation through a cluttered natural scene is a challenging problem that animals and humans accomplish with ease. The ViSTARS neural model proposes how primates use motion information to segment objects and determine heading for purposes of goal approach and obstacle avoidance in response to video inputs from real and virtual environments. The model produces trajectories similar to those of human navigators. It does so by predicting how computationally complementary processes in cortical areas MT-/MSTv and MT+/MSTd compute object motion for tracking and self-motion for navigation, respectively. The model retina responds to transients in the input stream. Model V1 generates a local speed and direction estimate. This local motion estimate is ambiguous due to the neural aperture problem. Model MT+ interacts with MSTd via an attentive feedback loop to compute accurate heading estimates in MSTd that quantitatively simulate properties of human heading estimation data. Model MT interacts with MSTv via an attentive feedback loop to compute accurate estimates of speed, direction and position of moving objects. This object information is combined with heading information to produce steering decisions wherein goals behave like attractors and obstacles behave like repellers. These steering decisions lead to navigational trajectories that closely match human performance.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National Geospatial Intelligence Agency (NMA201-01-1-2016

Boston University Institutional Repository (OpenBU)

Dense light field coding: a survey

Author: Conti C.
Nunes P.
Soares L. D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio

Repositório Institucional do ISCTE-IUL

HEVC-based 3D holoscopic video coding using self-similarity compensated prediction

Author: Conti C.
Nunes P.
Soares L. D.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Holoscopic imaging, also known as integral, light field, and plenoptic imaging, is an appealing technology for glassless 3D video systems, which has recently emerged as a prospective candidate for future image and video applications, such as 3D television. However, to successfully introduce 3D holoscopic video applications into the market, adequate coding tools that can efficiently handle 3D holoscopic video are necessary. In this context, this paper discusses the requirements and challenges for 3D holoscopic video coding, and presents an efficient 3D holoscopic coding scheme based on High Efficiency Video Coding (HEVC). The proposed 3D holoscopic codec makes use of the self-similarity (SS) compensated prediction concept to efficiently explore the inherent correlation of the 3D holoscopic content in Intra- and Inter-coded frames, as well as a novel vector prediction scheme to take advantage of the peculiar characteristics of the SS prediction data. Extensive experiments were conducted, and have shown that the proposed solution is able to outperform HEVC as well as other coding solutions proposed in the literature. Moreover, a consistently better performance is also observed for a set of different quality metrics proposed in the literature for 3D holoscopic content, as well as for the visual quality of views synthesized from decompressed 3D holoscopic content.info:eu-repo/semantics/submittedVersio

Repositório Institucional do ISCTE-IUL

Change blindness: eradication of gestalt strategies

Author: Goddard Paul
Wilson Steve
Publication venue: 'Pion Ltd'
Publication date: 01/08/2011
Field of study

Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

University of Lincoln Institutional Repository

Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

Author: Chen Yi
Gu Dongbing
Hu Huosheng
Li Yun
Xu Peter
Zhang Jun
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

No abstract available

Enlighten

Stereo Pictorial Structure for 2D Articulated Human Pose Estimation

Author: López-Quintero Manuel I.
Madrid-Cuevas F.J.
Marín-Jiménez M.J.
Medina-Carnicer R.
Muñoz-Salinas Rafael
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In this paper, we consider the problem of 2D human pose estimation on stereo image pairs. In particular, we aim at estimating the location, orientation and scale of upper-body parts of people detected in stereo image pairs from realistic stereo videos that can be found in the Internet. To address this task, we propose a novel pictorial structure model to exploit the stereo information included in such stereo image pairs: the Stereo Pictorial Structure (SPS). To validate our proposed model, we contribute a new annotated dataset of stereo image pairs, the Stereo Human Pose Estimation Dataset (SHPED), obtained from YouTube stereoscopic video sequences, depicting people in challenging poses and diverse indoor and outdoor scenarios. The experimental results on SHPED indicates that SPS improves on state-ofthe- art monocular models thanks to the appropriate use of the stereo informatio

Repositorio Institucional de la Universidad de Córdoba

Recommended from our members

Hand gesture recognition using deep learning neural networks

Author: Alnaim Norah
Publication venue: Brunel University London
Publication date: 01/01/2020
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonHuman Computer Interaction (HCI) is a broad field involving different types of interactions including gestures. Gesture recognition concerns non-verbal motions used as a means of communication in HCI. A system may be utilised to identify human gestures to convey information for device control. This represents a significant field within HCI involving device interfaces and users. The aim of gesture recognition is to record gestures that are formed in a certain way and then detected by a device such as a camera. Hand gestures can be used as a form of communication for many different applications. It may be used by people who possess different disabilities, including those with hearing-impairments, speech impairments and stroke patients, to communicate and fulfil their basic needs. Various studies have previously been conducted relating to hand gestures. Some studies proposed different techniques to implement the hand gesture experiments. For image processing there are multiple tools to extract features of images, as well as Artificial Intelligence which has varied classifiers to classify different types of data. 2D and 3D hand gestures request an effective algorithm to extract images and classify various mini gestures and movements. This research discusses this issue using different algorithms. To detect 2D or 3D hand gestures, this research proposed image processing tools such as Wavelet Transforms and Empirical Mode Decomposition to extract image features. The Artificial Neural Network (ANN) classifier which used to train and classify data besides Convolutional Neural Networks (CNN). These methods were examined in terms of multiple parameters such as execution time, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood, negative likelihood, receiver operating characteristic, area under ROC curve and root mean square. This research discusses four original contributions in the field of hand gestures. The first contribution is an implementation of two experiments using 2D hand gesture video where ten different gestures are detected in short and long distances using an iPhone 6 Plus with 4K resolution. The experiments are performed using WT and EMD for feature extraction while ANN and CNN for classification. The second contribution comprises 3D hand gesture video experiments where twelve gestures are recorded using holoscopic imaging system camera. The third contribution pertains experimental work carried out to detect seven common hand gestures. Finally, disparity experiments were performed using the left and the right 3D hand gesture videos to discover disparities. The results of comparison show the accuracy results of CNN being 100% compared to other techniques. CNN is clearly the most appropriate method to be used in a hand gesture system.Imam Abdulrahman bin Faisal Universit

Brunel University Research Archive

Spatial and temporal integration of binocular disparity in the primate brain

Author: Hejja-Brichard Yseult
Publication venue
Publication date: 10/06/2020
Field of study

Le système visuel du primate s'appuie sur les légères différences entre les deux projections rétiniennes pour percevoir la profondeur. Cependant, on ne sait pas exactement comment ces disparités binoculaires sont traitées et intégrées par le système nerveux. D'un côté, des enregistrements unitaires chez le macaque permettent d'avoir accès au codage neuronal de la disparité à un niveau local. De l'autre côté, la neuroimagerie fonctionnelle (IRMf) chez l'humain met en lumière les réseaux corticaux impliqués dans le traitement de la disparité à un niveau macroscopique mais chez une espèce différente. Dans le cadre de cette thèse, nous proposons d'utiliser la technique de l'IRMf chez le macaque pour permettre de faire le lien entre les enregistrements unitaires chez le macaque et les enregistrements IRMf chez l'humain. Cela, afin de pouvoir faire des comparaisons directes entre les deux espèces. Plus spécifiquement, nous nous sommes intéressés au traitement spatial et temporal des disparités binoculaires au niveau cortical mais aussi au niveau perceptif. En étudiant l'activité corticale en réponse au mouvement tridimensionnel (3D), nous avons pu montrer pour la première fois 1) qu'il existe un réseau dédié chez le macaque qui contient des aires allant au-delà du cluster MT et des aires environnantes et 2) qu'il y a des homologies avec le réseau trouvé chez l'humain en réponse à des stimuli similaires. Dans une deuxième étude, nous avons tenté d'établir un lien entre les biais perceptifs qui reflètent les régularités statistiques 3D ans l'environnement visuel et l'activité corticale. Nous nous sommes demandés si de tels biais existent et peuvent être reliés à des réponses spécifiques au niveau macroscopique. Nous avons trouvé de plus fortes activations pour le stimulus reflétant les statistiques naturelles chez un sujet, démontrant ainsi une possible influence des régularités spatiales sur l'activité corticale. Des analyses supplémentaires sont cependant nécessaires pour conclure de façon définitive. Néanmoins, nous avons pu confirmer de façon robuste l'existence d'un vaste réseau cortical répondant aux disparités corrélées chez le macaque. Pour finir, nous avons pu mesurer pour la première fois les points rétiniens correspondants au niveau du méridien vertical chez un sujet macaque qui réalisait une tâche comportementale (procédure à choix forcé). Nous avons pu comparer les résultats obtenus avec des données également collectées chez des participants humains avec le même protocole. Dans les différentes sections de discussion, nous montrons comment nos différents résultats ouvrent la voie à de nouvelles perspectives.The primate visual system strongly relies on the small differences between the two retinal projections to perceive depth. However, it is not fully understood how those binocular disparities are computed and integrated by the nervous system. On the one hand, single-unit recordings in macaque give access to neuronal encoding of disparity at a very local level. On the other hand, functional neuroimaging (fMRI) studies in human shed light on the cortical networks involved in disparity processing at a macroscopic level but with a different species. In this thesis, we propose to use an fMRI approach in macaque to bridge the gap between single-unit and fMRI recordings conducted in the non-human and human primate brain, respectively, by allowing direct comparisons between the two species. More specifically, we focused on the temporal and spatial processing of binocular disparities at the cortical but also at the perceptual level. Investigating cortical activity in response to motion-in-depth, we could show for the first time that 1) there is a dedicated network in macaque that comprises areas beyond the MT cluster and its surroundings and that 2) there are homologies with the human network involved in processing very similar stimuli. In a second study, we tried to establish a link between perceptual biases that reflect statistical regularities in the three-dimensional visual environment and cortical activity, by investigating whether such biases exist and can be related to specific responses at a macroscopic level. We found stronger activity for the stimulus reflecting natural statistics in one subject, demonstrating a potential influence of spatial regularities on the cortical activity. Further work is needed to firmly conclude about such a link. Nonetheless, we robustly confirmed the existence of a vast cortical network responding to correlated disparities in the macaque brain. Finally, we could measure for the first time retinal corresponding points on the vertical meridian of a macaque subject performing a behavioural task (forced-choice procedure) and compare it to the data we also collected in several human observers with the very same protocol. In the discussion sections, we showed how these findings open the door to varied perspectives

Thèses en ligne de l'Université Toulouse III - Paul Sabatier