75 research outputs found
Object tracking and matting for A class of dynamic image-based representations
Image-based rendering (IBR) is an emerging technology for photo-realistic rendering of scenes from a collection of densely sampled images and videos. Recently, an object-based approach for a class of dynamic image-based representations called plenoptic videos was proposed. This paper proposes an automatic object tracking approach using the level-set method. Our tracking method, which utilizes both local and global features of the image sequences instead of global features exploited in previous approach, can achieve better tracking results for objects, especially with non-uniform energy distribution. Due to possible segmentation errors around object boundaries, natural matting with Bayesian approach is also incorporated into our system. Furthermore, a MPEG-4 like object-based algorithm is developed for compressing the plenoptic videos, which consist of the alpha maps, depth maps and textures of the segmented image-based objects from different video plenoptic streams. Experimental results show that satisfactory renderings can be obtained by the proposed approaches. © 2005 IEEE.published_or_final_versio
Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications
Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications
A multi-camera approach to image-based rendering and 3-D/Multiview display of ancient chinese artifacts
published_or_final_versio
Object-based coding for plenoptic videos
A new object-based coding system for a class of dynamic image-based representations called plenoptic videos (PVs) is proposed. PVs are simplified dynamic light fields, where the videos are taken at regularly spaced locations along line segments instead of a 2-D plane. In the proposed object-based approach, objects at different depth values are segmented to improve the rendering quality. By encoding PVs at the object level, desirable functionalities such as scalability of contents, error resilience, and interactivity with an individual image-based rendering (IBR) object can be achieved. Besides supporting the coding of texture and binary shape maps for IBR objects with arbitrary shapes, the proposed system also supports the coding of grayscale alpha maps as well as depth maps (geometry information) to respectively facilitate the matting and rendering of the IBR objects. Both temporal and spatial redundancies among the streams in the PV are exploited to improve the coding performance, while avoiding excessive complexity in selective decoding of PVs to support fast rendering speed. Advanced spatial/temporal prediction methods such as global disparity-compensated prediction, as well as direct prediction and its extensions are developed. The bit allocation and rate control scheme employing a new convex optimization-based approach are also introduced. Experimental results show that considerable improvements in coding performance are obtained for both synthetic and real scenes, while supporting the stated object-based functionalities. © 2006 IEEE.published_or_final_versio
Image-based rendering and synthesis
Multiview imaging (MVI) is currently the focus of some research as it has a wide range of applications and opens up research in other topics and applications, including virtual view synthesis for three-dimensional (3D) television (3DTV) and entertainment. However, a large amount of storage is needed by multiview systems and are difficult to construct. The concept behind allowing 3D scenes and objects to be visualized in a realistic way without full 3D model reconstruction is image-based rendering (IBR). Using images as the primary substrate, IBR has many potential applications including for video games, virtual travel and others. The technique creates new views of scenes which are reconstructed from a collection of densely sampled images or videos. The IBR concept has different classification such as knowing 3D models and the lighting conditions and be rendered using conventional graphic techniques. Another is lightfield or lumigraph rendering which depends on dense sampling with no or very little geometry for rendering without recovering the exact 3D-models.published_or_final_versio
An object-based compression system for a class of dynamic image-based representations
S P I E Conference on Visual Communications and Image Processing, Beijing, China, 12-15 July 2005This paper proposes a new object-based compression system for a class of dynamic image-based representations called plenoptic videos (PVs). PVs are simplified dynamic light fields, where the videos are taken at regularly spaced locations along line segments instead of a 2-D plane. The proposed system employs an object-based approach, where objects at different depth values are segmented to improve the rendering quality as in the pop-up light fields. Furthermore, by coding the plenoptic video at the object level, desirable functionalities such as scalability of contents, error resilience, and interactivity with individual IBR objects can be achieved. Besides supporting the coding of the texture and binary shape maps for IBR objects with arbitrary shapes, the proposed system also supports the coding of gray-scale alpha maps as well as geometry information in the form of depth maps to respectively facilitate the matting and rendering of the IBR objects. To improve the coding performance, the proposed compression system exploits both the temporal redundancy and spatial redundancy among the video object streams in the PV by employing disparity-compensated prediction or spatial prediction in its texture, shape and depth coding processes. To demonstrate the principle and effectiveness of the proposed system, a multiple video camera system was built and experimental results show that considerable improvements in coding performance are obtained for both synthetic scene and real scene, while supporting the stated object-based functionalities.published_or_final_versio
POPRAWA METOD KOMPENSACJI RUCHU PORUSZAJĄCYCH SIĘ OBIEKTÓW DYNAMICZNYCH W STREAMIE WIDEO SYSTEMU WIDEOKONFERENCYJNEGO
Videoconferencing gives us the opportunity to work and communicate in real time, as well as to use collective applications, interactive information exchange. Videoconferencing systems are one of the basic components of the organization of manegment, ensuring, the timeliness and necessary quality management of the implementation of objective control over the solution of the tasks. The quality of the image and the time of transmission of video information is unsatisfactory for the quality control of the troops. Considered ways to increase the efficiency of management and operational activities, due to methods of compensation of motion, using technology to reduce the volume of video data for quality improvement.Wideokonferencje dają możliwość pracy i komunikowania się w czasie rzeczywistym, a także korzystania ze zbiorowych aplikacji, interaktywnej wymiany informacji. Systemy wideokonferencyjne są jednym z podstawowych elementów organizacji zarządzania, zapewniając terminowość i niezbędne zarządzanie jakością w zakresie realizacji kontroli nad rozwiązaniem zadań. Jakość obrazu i czas transmisji informacji wideo jest niezadowalający dla kontroli jakości wojsk. Rozważono sposoby zwiększania efektywności zarządzania i działań operacyjnych, ze względu na metody kompensacji ruchu, z wykorzystaniem technologii zmniejszającej ilość danych wideo w celu poprawy jakości
Semi-Automatic Video Object Extraction Menggunakan Alpha Matting Berbasis Motion Estimation
Ekstraksi objek merupakan pekerjaan penting dalam aplikasi video editing,
karena objek independen diperlukan untuk proses compositing. Proses ekstraksi
dilakukan dengan image matting diawali dengan mendefinisikan scribble manual
untuk mewakili daerah foreground dan background, sedangkan daerah unknown
ditentukan dengan estimasi alpha.
Permasalahan dalam image matting adalah piksel dalam daerah unknown
tidak secara tegas menjadi bagian foreground atau background. Sedangkan dalam
domain temporal, scribble tidak memungkinkan untuk didefinisikan secara
independen di seluruh frame. Untuk mengatasi permasalahan tersebut, diusulkan
metode ekstraksi objek dengan tahapan estimasi adaptive threshold untuk alpha
matting, perbaikan akurasi image matting, dan estimasi temporal constraint untuk
propagasi scribble. Algoritma Fuzzy C-Means (FCM) dan Otsu diaplikasikan untuk
estimasi adaptive threshold.
Dengan FCM hasil evaluasi menggunakan Means Squared Error (MSE)
menunjukkan bahwa rata-rata kesalahan piksel di setiap frame berkurang dari
30.325,10 menjadi 26.999,33, sedangkan dengan Otsu menjadi 28.921,70. Kualitas
matting yang menurun akibat perubahan intensitas pada image terkompresi
diperbaiki menggunakan Discrete Cosine Transform (DCT-2D). Algoritma ini
menurunkan Root Means Squared Error (RMSE) dari 16.68 menjadi 11.44. Estimasi
temporal constraint untuk propagasi scribble dilakukan dengan memprediksi motion
vector dari frame sekarang ke frame selanjutnya. Prediksi motion vector yang
v
dilakukan menggunakan exhaustive search diperbaiki dengan mendefinisikan matrik
yang berukuran dinamis terhadap ukuran scribble, motion vector ditentukan dengan
Sum of Absolute Difference (SAD) antara frame sekarang dan frame berikutnya.
Hasilnya ketika diaplikasikan pada ruang warna RGB dapat menurunkan rata-rata
kesalahan piksel setiap frame dari 3.058,55 menjadi 1.533,35, sedangkan dalam
ruang waktu HSV menjadi 1.662,83.
KiMoHar yang merupakan framework yang diusulkan meliputi tiga hal
sebagai berikut. Pertama adalah image matting dengan adaptive threshold FCM
dapat meningkatkan akurasi sebesar 11.05 %. Kedua, perbaikan kualitas matting
pada image terkompresi menggunakan DCT-2D meningkatkan akurasi sebesar
31.41%. Sedangkan yang ketiga, estimasi temporal constraint pada ruang warna
RGB meningkatkan akurasi 56.30%, dan dalam ruang HSV 52.61%.
========================================================================================================
It is important to have object extraction in video editing application because
compositing process is necessary for independent object. Extraction process is
performed by image matting which is defining manual scribble to represent the
foreground and background area, and alpha estimation to determine the unknown
area.
In image matting, there are problem which are pixel in unknown area is not
firmly being the part of foreground or background, whereas, in temporal domain, it is
not possible to define the scribble independently in whole frame. In order to
overcome the problem, object extraction model with adaptive threshold estimation
phase for alpha matting, accuracy improvement for image matting, and temporal
constraint estimation for scribble propagation is proposed. Fuzzy C-Means (FCM)
Algorithm and Otsu are applied for adaptive threshold estimation.
By FCM,the evaluationresult byusingMeansSquaredError(MSE) showsthatthe
averageerrorof pixelsineachframeis reducedfrom30.325,10 to 26.999,33, while in the
use of Otsu, the result shows 28.921,70. The matting quality is reducing since the
intensity changing in compressed image improved by Discrete Cosine Transform
(DCT-2D). The algorithm reduces Root Means Squared Error (RMSE) value from
16.68 to 11.4. Temporal constraint estimation for scribble propagation is performed
by predicting motion vector from recent frame and forward. Motion vector prediction
performed using exhaustive search is improved by defining the matrix in dynamic size
to scribble; motion vector is determined by Sum of Absolute Difference (SAD)
v
between recent frame and forward. In its application to RGB space, it results the
averageerrorof pixelsineachframe from 3.058,55 to 1.533,35, and 1.662,83 in HSV
time space.
KiMoHar, the proposed framework, includes three things which are: First,
image matting by adaptive threshold FCM increases the accuracy to 11.05%. Second,
matting quality improvement in compressed image by DCT-2D increases the
accuracy to 31,41%. Three, temporal constraint estimation in RGB space increases
the accuracy to 56.30%, and 52.61% in HSV space
alternative clustering methods, sub-pixel accurate object extraction from still images, and generic video segmentation
This paper presents a practical approach for object extraction from still
images and video sequences that is both: simple to use and easy to implement.
Many image segmentation projects focus on special cases or try to use
complicated heuristics and classificators to cope with every special case. The
presented approach focuses on typical pictures and videos taken from everyday
life working under the assumption that the foreground objects are sufficiently
perceptual different from the background. The approach incorporates
experiences and user feedback from several projects that have integrated the
algorithm already. The segmentation works in realtime for video and is noise
robust and provides subpixel accuracy for still images
- …