1,349 research outputs found
Depth map compression via 3D region-based representation
In 3D video, view synthesis is used to create new virtual views between
encoded camera views. Errors in the coding of the depth maps introduce
geometry inconsistencies in synthesized views. In this paper, a new 3D plane
representation of the scene is presented which improves the performance of
current standard video codecs in the view synthesis domain. Two image segmentation
algorithms are proposed for generating a color and depth segmentation.
Using both partitions, depth maps are segmented into regions without
sharp discontinuities without having to explicitly signal all depth edges. The
resulting regions are represented using a planar model in the 3D world scene.
This 3D representation allows an efficient encoding while preserving the 3D
characteristics of the scene. The 3D planes open up the possibility to code
multiview images with a unique representation.Postprint (author's final draft
Recommended from our members
Efficient Debanding Filtering for Inverse Tone Mapped High Dynamic Range Videos
Object detection and activity recognition in digital image and video libraries
This thesis is a comprehensive study of object-based image and video retrieval, specifically for car and human detection and activity recognition purposes. The thesis focuses on the problem of connecting low level features to high level semantics by developing relational object and activity presentations. With the rapid growth of multimedia information in forms of digital image and video libraries, there is an increasing need for intelligent database management tools. The traditional text based query systems based on manual annotation process are impractical for today\u27s large libraries requiring an efficient information retrieval system. For this purpose, a hierarchical information retrieval system is proposed where shape, color and motion characteristics of objects of interest are captured in compressed and uncompressed domains. The proposed retrieval method provides object detection and activity recognition at different resolution levels from low complexity to low false rates.
The thesis first examines extraction of low level features from images and videos using intensity, color and motion of pixels and blocks. Local consistency based on these features and geometrical characteristics of the regions is used to group object parts. The problem of managing the segmentation process is solved by a new approach that uses object based knowledge in order to group the regions according to a global consistency. A new model-based segmentation algorithm is introduced that uses a feedback from relational representation of the object. The selected unary and binary attributes are further extended for application specific algorithms. Object detection is achieved by matching the relational graphs of objects with the reference model. The major advantages of the algorithm can be summarized as improving the object extraction by reducing the dependence on the low level segmentation process and combining the boundary and region properties.
The thesis then addresses the problem of object detection and activity recognition in compressed domain in order to reduce computational complexity. New algorithms for object detection and activity recognition in JPEG images and MPEG videos are developed. It is shown that significant information can be obtained from the compressed domain in order to connect to high level semantics. Since our aim is to retrieve information from images and videos compressed using standard algorithms such as JPEG and MPEG, our approach differentiates from previous compressed domain object detection techniques where the compression algorithms are governed by characteristics of object of interest to be retrieved. An algorithm is developed using the principal component analysis of MPEG motion vectors to detect the human activities; namely, walking, running, and kicking. Object detection in JPEG compressed still images and MPEG I frames is achieved by using DC-DCT coefficients of the luminance and chrominance values in the graph based object detection algorithm. The thesis finally addresses the problem of object detection in lower resolution and monochrome images. Specifically, it is demonstrated that the structural information of human silhouettes can be captured from AC-DCT coefficients
Hashmod: A Hashing Method for Scalable 3D Object Detection
We present a scalable method for detecting objects and estimating their 3D
poses in RGB-D data. To this end, we rely on an efficient representation of
object views and employ hashing techniques to match these views against the
input frame in a scalable way. While a similar approach already exists for 2D
detection, we show how to extend it to estimate the 3D pose of the detected
objects. In particular, we explore different hashing strategies and identify
the one which is more suitable to our problem. We show empirically that the
complexity of our method is sublinear with the number of objects and we enable
detection and pose estimation of many 3D objects with high accuracy while
outperforming the state-of-the-art in terms of runtime.Comment: BMVC 201
Detecting Faces in Impoverished Images
The ability to detect faces in images is of critical ecological significance. It is a pre-requisite for other important face perception tasks such as person identification, gender classification and affect analysis. Here we address the question of how the visual system classifies images into face and non-face patterns. We focus on face detection in impoverished images, which allow us to explore information thresholds required for different levels of performance. Our experimental results provide lower bounds on image resolution needed for reliable discrimination between face and non-face patterns and help characterize the nature of facial representations used by the visual system under degraded viewing conditions. Specifically, they enable an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance
Using Local Context To Improve Face Detection
Most face detection algorithms locate faces by classifying the content of a detection window iterating over all positions and scales of the input image. Recent developments have accelerated this process up to real-time performance at high levels of accuracy. However, even the best of today's computational systems are far from being able to compete with the detection capabilities of the human visual system. Psychophysical experiments have shown the importance of local context in the face detection process. In this paper we investigate the role of local context for face detection algorithms. In experiments on two large data sets we find that using local context can significantly increase the number of correct detections, particularly in low resolution cases, uncommon poses or individual appearances as well as occlusions
Survey of Techniques for Producing Blended Images: A Case Study Using Rollins College Archives
The Rollins College Archives are a treasure trove of historic resources relating to the college’s history, and they are often underutilized or overlooked by both the student body and the surrounding community. In particular, historic resources are all too often excluded from work in computer science and related fields. This project aims to bridge that gap by bringing the two areas together. To that end, the goal of this project is to merge past and present by blending historic photos with input of a present day scene in order to reveal changes and juxtapositions of the same scene across eras. This research explores the possibility of accomplishing this principally through computational means. In order to achieve this, we delve into the domain of computer vision, utilizing techniques in feature detection and matching in order to ultimately blend images in novel ways. Image blending is a technique often used for the creation of unique images, or for emphasizing a contrast between two scenes through their convergence. Whether the blend is produced through masks with alpha values, seam carving, or other techniques, most implementations require a great deal of manual input, whether that entails point selection, mask generation, or setting an alpha value. In this project, we identify recognizable regions and features on a given image. We then use these to identify similar regions and features in a second image. Any matches found are then filtered, and the bad or incorrect matches are removed. The remaining matches are used to compute the difference in perspectives between the two images, and the coordinates of the matching points are used to correct the images to match in the same perspective. We explore various approaches to the problem of feature matching, including built-in library functions, as well as a region based, template-matching algorithm. We also investigate techniques in image blending, such as automatic mask generation, Laplacian pyramid blending, and various off-the-shelf tools contained within Unity. We also test the applications of our findings with regards to working with 360-degree images
- …