A system for 3-D reconstruction of a rigid object from monocular video sequences is introduced. Initially an object pose is estimated in each image by locating similar (unknown) texture assuming flat depth map for all images. Shape-from-silhouette  is then applied to construct a 3-D model which is used to obtain better pose estimates using a model-based method. Before repeating the process by building a new 3-D model, pose estimates are adjusted to reduce error by maximizing a quality measure for shape-fromsilhouette volume reconstruction. Translation of the object in the input sequence is compensated in two stages. The volume feedback is terminated when the updates in pose estimates become small. The final output is a pose index (the last set of pose estimates) and a 3-D model of the object. Good performance of the system is shown by experiments on a real video sequence of a human head. Our method has the following advantages: 1. No model is asssumed for the objest. 2. Feature points are neither detected nor tracked, thus no problematic feature matching or lengthy point tracking are required. 3. The method generates a high level pose index for the input images, these can be used for content-based retrieval. Our method can also be applied to 3-D object tracking in video. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.