1,915 research outputs found
Motion and disparity estimation with self adapted evolutionary strategy in 3D video coding
Real world information, obtained by humans is three dimensional (3-D). In experimental user-trials, subjective assessments have clearly demonstrated the increased impact of 3-D pictures compared to conventional flat-picture techniques. It is reasonable, therefore, that we humans want an imaging system that produces pictures that are as natural and real as things we see and experience every day. Three-dimensional imaging and hence, 3-D television (3DTV) are very promising approaches expected to satisfy these desires. Integral imaging, which can capture true 3D color images with only one camera, has been seen as the right technology to offer stress-free viewing to audiences of more than one person. In this paper, we propose a novel approach to use Evolutionary Strategy (ES) for joint motion and disparity estimation to compress 3D integral video sequences. We propose to decompose the integral video sequence down to viewpoint video sequences and jointly exploit motion and disparity redundancies to maximize the compression using a self adapted ES. A half pixel refinement algorithm is then applied by interpolating macro blocks in the previous frame to further improve the video quality. Experimental results demonstrate that the proposed adaptable ES with Half Pixel Joint Motion and Disparity Estimation can up to 1.5 dB objective quality gain without any additional computational cost over our previous algorithm.1Furthermore, the proposed technique get similar objective quality compared to the full search algorithm by reducing the computational cost up to 90%
Spatial prediction based on self-similarity compensation for 3D holoscopic image and video coding
WOS:000298962501022 (NÂș de Acesso Web of Science)Holoscopic imaging, also known as integral imaging, provides a solution for glassless 3D, and is promising to change the market for 3D television. To start, this paper briefly describes the general concepts of holoscopic imaging, focusing mainly on the spatial correlations inherent to this new type of content, which appear due to the micro-lens array that is used for both acquisition and display. The micro-images that are formed behind each micro-lens, from which only one pixel is viewed from a given observation point, have a high cross-correlation between them, which can be exploited for coding. A novel scheme for spatial prediction, exploring the particular arrangement of holoscopic images, is proposed. The proposed scheme can be used for both still image coding and intra-coding of video. Experimental results based on an H.264/AVC video codec modified to handle 3D holoscopic images and video are presented, showing the superior performance of this approach
Cortical Dynamics of Navigation and Steering in Natural Scenes: Motion-Based Object Segmentation, Heading, and Obstacle Avoidance
Visually guided navigation through a cluttered natural scene is a challenging problem that animals and humans accomplish with ease. The ViSTARS neural model proposes how primates use motion information to segment objects and determine heading for purposes of goal approach and obstacle avoidance in response to video inputs from real and virtual environments. The model produces trajectories similar to those of human navigators. It does so by predicting how computationally complementary processes in cortical areas MT-/MSTv and MT+/MSTd compute object motion for tracking and self-motion for navigation, respectively. The model retina responds to transients in the input stream. Model V1 generates a local speed and direction estimate. This local motion estimate is ambiguous due to the neural aperture problem. Model MT+ interacts with MSTd via an attentive feedback loop to compute accurate heading estimates in MSTd that quantitatively simulate properties of human heading estimation data. Model MT interacts with MSTv via an attentive feedback loop to compute accurate estimates of speed, direction and position of moving objects. This object information is combined with heading information to produce steering decisions wherein goals behave like attractors and obstacles behave like repellers. These steering decisions lead to navigational trajectories that closely match human performance.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National Geospatial Intelligence Agency (NMA201-01-1-2016
Dense light field coding: a survey
Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems.
Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio
HEVC-based 3D holoscopic video coding using self-similarity compensated prediction
Holoscopic imaging, also known as integral, light field, and plenoptic imaging, is an appealing technology for glassless 3D video systems, which has recently emerged as a prospective candidate for future image and video applications, such as 3D television. However, to successfully introduce 3D holoscopic video applications into the market, adequate coding tools that can efficiently handle 3D holoscopic video are necessary. In this context, this paper discusses the requirements and challenges for 3D holoscopic video coding, and presents an efficient 3D holoscopic coding scheme based on High Efficiency Video Coding (HEVC). The proposed 3D holoscopic codec makes use of the self-similarity (SS) compensated prediction concept to efficiently explore the inherent correlation of the 3D holoscopic content in Intra- and Inter-coded frames, as well as a novel vector prediction scheme to take advantage of the peculiar characteristics of the SS prediction data. Extensive experiments were conducted, and have shown that the proposed solution is able to outperform HEVC as well as other coding solutions proposed in the literature. Moreover, a consistently better performance is also observed for a set of different quality metrics proposed in the literature for 3D holoscopic content, as well as for the visual quality of views synthesized from decompressed 3D holoscopic content.info:eu-repo/semantics/submittedVersio
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149â164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Computational intelligence approaches to robotics, automation, and control [Volume guest editors]
No abstract available
Stereo Pictorial Structure for 2D Articulated Human Pose Estimation
In this paper, we consider the problem of 2D human
pose estimation on stereo image pairs. In particular,
we aim at estimating the location, orientation and scale of
upper-body parts of people detected in stereo image pairs
from realistic stereo videos that can be found in the Internet.
To address this task, we propose a novel pictorial structure
model to exploit the stereo information included in such
stereo image pairs: the Stereo Pictorial Structure (SPS). To
validate our proposed model, we contribute a new annotated
dataset of stereo image pairs, the Stereo Human Pose Estimation
Dataset (SHPED), obtained from YouTube stereoscopic
video sequences, depicting people in challenging poses
and diverse indoor and outdoor scenarios. The experimental
results on SHPED indicates that SPS improves on state-ofthe-
art monocular models thanks to the appropriate use of
the stereo informatio
Recommended from our members
Hand gesture recognition using deep learning neural networks
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonHuman Computer Interaction (HCI) is a broad field involving different types of interactions including gestures. Gesture recognition concerns non-verbal motions used as a means of communication in HCI. A system may be utilised to identify human gestures to convey information for device control. This represents a significant field within HCI involving device interfaces and users. The aim of gesture recognition is to record gestures that are formed in a certain way and then detected by a device such as a camera. Hand gestures can be used as a form of communication for many different applications. It may be used by people who possess different disabilities, including those with hearing-impairments, speech impairments and stroke patients, to communicate and fulfil their basic needs.
Various studies have previously been conducted relating to hand gestures. Some studies proposed different techniques to implement the hand gesture experiments. For image processing there are multiple tools to extract features of images, as well as Artificial Intelligence which has varied classifiers to classify different types of data. 2D and 3D hand gestures request an effective algorithm to extract images and classify various mini gestures and movements. This research discusses this issue using different algorithms. To detect 2D or 3D hand gestures, this research proposed image processing tools such as Wavelet Transforms and Empirical Mode Decomposition to extract image features. The Artificial Neural Network (ANN) classifier which used to train and classify data besides Convolutional Neural Networks (CNN). These methods were examined in terms of multiple parameters such as execution time, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood, negative likelihood, receiver operating characteristic, area under ROC curve and root mean square. This research discusses four original contributions in the field of hand gestures. The first contribution is an implementation of two experiments using 2D hand gesture video where ten different gestures are detected in short and long distances using an iPhone 6 Plus with 4K resolution. The experiments are performed using WT and EMD for feature extraction while ANN and CNN for classification. The second contribution comprises 3D hand gesture video experiments where twelve gestures are recorded using holoscopic imaging system camera. The third contribution pertains experimental work carried out to detect seven common hand gestures. Finally, disparity experiments were performed using the left and the right 3D hand gesture videos to discover disparities. The results of comparison show the accuracy results of CNN being 100% compared to other techniques. CNN is clearly the most appropriate method to be used in a hand gesture system.Imam Abdulrahman bin Faisal Universit
Spatial and temporal integration of binocular disparity in the primate brain
Le systĂšme visuel du primate s'appuie sur les lĂ©gĂšres diffĂ©rences entre les deux projections rĂ©tiniennes pour percevoir la profondeur. Cependant, on ne sait pas exactement comment ces disparitĂ©s binoculaires sont traitĂ©es et intĂ©grĂ©es par le systĂšme nerveux. D'un cĂŽtĂ©, des enregistrements unitaires chez le macaque permettent d'avoir accĂšs au codage neuronal de la disparitĂ© Ă un niveau local. De l'autre cĂŽtĂ©, la neuroimagerie fonctionnelle (IRMf) chez l'humain met en lumiĂšre les rĂ©seaux corticaux impliquĂ©s dans le traitement de la disparitĂ© Ă un niveau macroscopique mais chez une espĂšce diffĂ©rente. Dans le cadre de cette thĂšse, nous proposons d'utiliser la technique de l'IRMf chez le macaque pour permettre de faire le lien entre les enregistrements unitaires chez le macaque et les enregistrements IRMf chez l'humain. Cela, afin de pouvoir faire des comparaisons directes entre les deux espĂšces. Plus spĂ©cifiquement, nous nous sommes intĂ©ressĂ©s au traitement spatial et temporal des disparitĂ©s binoculaires au niveau cortical mais aussi au niveau perceptif. En Ă©tudiant l'activitĂ© corticale en rĂ©ponse au mouvement tridimensionnel (3D), nous avons pu montrer pour la premiĂšre fois 1) qu'il existe un rĂ©seau dĂ©diĂ© chez le macaque qui contient des aires allant au-delĂ du cluster MT et des aires environnantes et 2) qu'il y a des homologies avec le rĂ©seau trouvĂ© chez l'humain en rĂ©ponse Ă des stimuli similaires. Dans une deuxiĂšme Ă©tude, nous avons tentĂ© d'Ă©tablir un lien entre les biais perceptifs qui reflĂštent les rĂ©gularitĂ©s statistiques 3D ans l'environnement visuel et l'activitĂ© corticale. Nous nous sommes demandĂ©s si de tels biais existent et peuvent ĂȘtre reliĂ©s Ă des rĂ©ponses spĂ©cifiques au niveau macroscopique. Nous avons trouvĂ© de plus fortes activations pour le stimulus reflĂ©tant les statistiques naturelles chez un sujet, dĂ©montrant ainsi une possible influence des rĂ©gularitĂ©s spatiales sur l'activitĂ© corticale. Des analyses supplĂ©mentaires sont cependant nĂ©cessaires pour conclure de façon dĂ©finitive. NĂ©anmoins, nous avons pu confirmer de façon robuste l'existence d'un vaste rĂ©seau cortical rĂ©pondant aux disparitĂ©s corrĂ©lĂ©es chez le macaque. Pour finir, nous avons pu mesurer pour la premiĂšre fois les points rĂ©tiniens correspondants au niveau du mĂ©ridien vertical chez un sujet macaque qui rĂ©alisait une tĂąche comportementale (procĂ©dure Ă choix forcĂ©). Nous avons pu comparer les rĂ©sultats obtenus avec des donnĂ©es Ă©galement collectĂ©es chez des participants humains avec le mĂȘme protocole. Dans les diffĂ©rentes sections de discussion, nous montrons comment nos diffĂ©rents rĂ©sultats ouvrent la voie Ă de nouvelles perspectives.The primate visual system strongly relies on the small differences between the two retinal projections to perceive depth. However, it is not fully understood how those binocular disparities are computed and integrated by the nervous system. On the one hand, single-unit recordings in macaque give access to neuronal encoding of disparity at a very local level. On the other hand, functional neuroimaging (fMRI) studies in human shed light on the cortical networks involved in disparity processing at a macroscopic level but with a different species. In this thesis, we propose to use an fMRI approach in macaque to bridge the gap between single-unit and fMRI recordings conducted in the non-human and human primate brain, respectively, by allowing direct comparisons between the two species. More specifically, we focused on the temporal and spatial processing of binocular disparities at the cortical but also at the perceptual level. Investigating cortical activity in response to motion-in-depth, we could show for the first time that 1) there is a dedicated network in macaque that comprises areas beyond the MT cluster and its surroundings and that 2) there are homologies with the human network involved in processing very similar stimuli. In a second study, we tried to establish a link between perceptual biases that reflect statistical regularities in the three-dimensional visual environment and cortical activity, by investigating whether such biases exist and can be related to specific responses at a macroscopic level. We found stronger activity for the stimulus reflecting natural statistics in one subject, demonstrating a potential influence of spatial regularities on the cortical activity. Further work is needed to firmly conclude about such a link. Nonetheless, we robustly confirmed the existence of a vast cortical network responding to correlated disparities in the macaque brain. Finally, we could measure for the first time retinal corresponding points on the vertical meridian of a macaque subject performing a behavioural task (forced-choice procedure) and compare it to the data we also collected in several human observers with the very same protocol. In the discussion sections, we showed how these findings open the door to varied perspectives
- âŠ