Search CORE

123 research outputs found

3-D Motion Estimation and Wireframe Adaptation Including Photometric Effects for Model-Based Coding of Facial Image Sequences

Author: Bozdagi G.
Onural L.
Tekalp A. M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

Cataloged from PDF version of article.We propose a novel formulation where 3-D global and local motion estimation and the adaptation of a generic wireframe model to a particular speaker are considered simultaneously within an optical flow based framework including the photometric effects of the motion. We use a flexible wireframe model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wireframe model. Results with both simulated and real facial image sequences are provided

Bilkent University Institutional Repository

Effect of Architectures and Training Methods on the Performance of Learned Video Frame Prediction

Author: Tekalp A. Murat
Yilmaz M. Akin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/08/2020
Field of study

We analyze the performance of feedforward vs. recurrent neural network (RNN) architectures and associated training methods for learned frame prediction. To this effect, we trained a residual fully convolutional neural network (FCNN), a convolutional RNN (CRNN), and a convolutional long short-term memory (CLSTM) network for next frame prediction using the mean square loss. We performed both stateless and stateful training for recurrent networks. Experimental results show that the residual FCNN architecture performs the best in terms of peak signal to noise ratio (PSNR) at the expense of higher training and test (inference) computational complexity. The CRNN can be trained stably and very efficiently using the stateful truncated backpropagation through time procedure, and it requires an order of magnitude less inference runtime to achieve near real-time frame prediction with an acceptable performance.Comment: Accepted for publication at IEEE ICIP 201

arXiv.org e-Print Archive

Crossref

An improvement to MBASIC algorithm for 3D motion and depth estimation

Author: Bozdagi G.
Onural L.
Tekalp A. M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

Cataloged from PDF version of article.In model-based coding of facial images, the accuracy of motion and depth parameter estimates strongly affects the coding efficiency. MBASIC is a simple and effective iterative algorithm (recently proposed by Aizawa et al.) for 3-D motion and depth estimation when the initial depth estimates are relatively accurate. In this correspondence, we analyze its performance in the presence of errors in the initial depth estimates and propose a modification to MBASIC algorithm that significantly improves its robustness to random errors with only a small increase in the computational load

Bilkent University Institutional Repository

DCT Coding of nonrectangularly sampled images

Author: Cetin A. E.
Gunduzhan E.
Tekalp A. M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

Cataloged from PDF version of article.Discrete cosine transform ( DCT) coding is widely used for compression of rectangularly sampled images. In this letter, we address efficient DCT coding of rectangularly sampled images. To this effect, we discuss an efficient method for the computation of the DCT on nonrectangular sampling grids using the Smith-normal decomposition. Simulation are provided

Bilkent University Institutional Repository

Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression

Author: Tekalp A. Murat
Ulas O. Ugur
Yılmaz M. Akın
Publication venue
Publication date: 28/06/2023
Field of study

The lack of ability to adapt the motion compensation model to video content is an important limitation of current end-to-end learned video compression models. This paper advances the state-of-the-art by proposing an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression. In particular, we propose two novelties: i) a multi-scale deformable alignment scheme at the feature level combined with multi-scale conditional coding, ii) motion-content adaptive inference. In addition, we employ a gain unit, which enables a single model to operate at multiple rate-distortion operating points. We also exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding.Comment: Accepted for publication in IEEE International Conference on Image Processing (ICIP) 202

arXiv.org e-Print Archive

Report of the AHG on MPEG-7 Semantic Information Representation

Author: H.K. RISING III
M. TEKALP
R. LEONARDI
R. MEHROTRA
Publication venue
Publication date: 01/01/2000
Field of study

This AHG has been created at the last MPEG meeting in Maui to work in parallel with the CE experiment on the Semantic DS, so as to continue the refinement, both in terms of significance, usage and syntax of the DS’s that have been proposed during the Maui meeting [2] . Following the discussions on the email reflector, the results of a meeting of the US delegation in February and of the discussion during the AHG meeting Mar. 19th, 2000, in Noordwijkerhout, some clarifications were made, though a total convergence has not yet been reached. During the US delagate meeting, an alternative syntax had also been proposed for consideration of the continuation of the CE; it is likely that this will lead to the formulation of competitive solutions to select the best syntax and elementary components of the Semantic DS during the CE process to follow after the 51st MPEG meeting. Below, some listings of the discussions that took place, in reference to the individual mandates of this AHG

Archivio istituzionale della ricerca - Università di Brescia

Fast Outlier Rejection by Using Parallax-Based Rigidity Constraint for Epipolar Geometry Estimation

Author: M. Tekalp
O. Chum
R.I. Hartley
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

A novel approach is presented in order to reject correspondence outliers between frames using the parallax-based rigidity constraint for epipolar geometry estimation. In this approach, the invariance of 3-D relative projective structure of a stationary scene over different views is exploited to eliminate outliers, mostly due to independently moving objects of a typical scene. The proposed approach is compared against a well-known RANSAC-based algorithm by the help of a test-bed. The results showed that the speed-up, gained by utilization of the proposed technique as a preprocessing step before RANSAC-based approach, decreases the execution time of the overall outlier rejection, significantly

Crossref

OpenMETU (Middle East Technical University)

3D human action recognition in multiple view scenarios

Author: Canton Ferrer Cristian
Casas Pla Josep Ramon
Pardàs Feliu Montse
Sargin M E
Tekalp A M
Publication venue: Universitat Politècnica de Catalunya (UPC)
Publication date: 01/01/2006
Field of study

This paper presents a novel view-independent approach to the recognition of human gestures of several people in low resolution sequences from multiple calibrated cameras. In contraposition with other multi-ocular gesture recognition systems based on generating a classification on a fusion of features coming from different views, our system performs a data fusion (3D representation of the scene) and then a feature extraction and classification. Motion descriptors introduced by Bobick et al. for 2D data are extended to 3D and a set of features based on 3D invariant statistical moments are computed. Finally, a Bayesian classifier is employed to perform recognition over a small set of actions. Results are provided showing the effectiveness of the proposed algorithm in a SmartRoom scenario.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC