10,809 research outputs found
Shape basis interpretation for monocular deformable 3D reconstruction
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper, we propose a novel interpretable shape model to encode object non-rigidity. We first use the initial frames of a monocular video to recover a rest shape, used later to compute a dissimilarity measure based on a distance matrix measurement. Spectral analysis is then applied to this matrix to obtain a reduced shape basis, that in contrast to existing approaches, can be physically interpreted. In turn, these pre-computed shape bases are used to linearly span the deformation of a wide variety of objects. We introduce the low-rank basis into a sequential approach to recover both camera motion and non-rigid shape from the monocular video, by simply optimizing the weights of the linear combination using bundle adjustment. Since the number of parameters to optimize per frame is relatively small, specially when physical priors are considered, our approach is fast and can potentially run in real time. Validation is done in a wide variety of real-world objects, undergoing both inextensible and extensible deformations. Our approach achieves remarkable robustness to artifacts such as noisy and missing measurements and shows an improved performance to competing methods.Peer ReviewedPostprint (author's final draft
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
Speechreading or lipreading is the technique of understanding and getting
phonetic features from a speaker's visual features such as movement of lips,
face, teeth and tongue. It has a wide range of multimedia applications such as
in surveillance, Internet telephony, and as an aid to a person with hearing
impairments. However, most of the work in speechreading has been limited to
text generation from silent videos. Recently, research has started venturing
into generating (audio) speech from silent video sequences but there have been
no developments thus far in dealing with divergent views and poses of a
speaker. Thus although, we have multiple camera feeds for the speech of a user,
but we have failed in using these multiple video feeds for dealing with the
different poses. To this end, this paper presents the world's first ever
multi-view speech reading and reconstruction system. This work encompasses the
boundaries of multimedia research by putting forth a model which leverages
silent video feeds from multiple cameras recording the same subject to generate
intelligent speech for a speaker. Initial results confirm the usefulness of
exploiting multiple camera views in building an efficient speech reading and
reconstruction system. It further shows the optimal placement of cameras which
would lead to the maximum intelligibility of speech. Next, it lays out various
innovative applications for the proposed system focusing on its potential
prodigious impact in not just security arena but in many other multimedia
analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul,
Republic of Kore
Compressive Hyperspectral Imaging Using Progressive Total Variation
Compressed Sensing (CS) is suitable for remote acquisition of hyperspectral
images for earth observation, since it could exploit the strong spatial and
spectral correlations, llowing to simplify the architecture of the onboard
sensors. Solutions proposed so far tend to decouple spatial and spectral
dimensions to reduce the complexity of the reconstruction, not taking into
account that onboard sensors progressively acquire spectral rows rather than
acquiring spectral channels. For this reason, we propose a novel progressive CS
architecture based on separate sensing of spectral rows and joint
reconstruction employing Total Variation. Experimental results run on raw
AVIRIS and AIRS images confirm the validity of the proposed system.Comment: To be published on ICASSP 2014 proceeding
Graph Signal Processing: Overview, Challenges and Applications
Research in Graph Signal Processing (GSP) aims to develop tools for
processing data defined on irregular graph domains. In this paper we first
provide an overview of core ideas in GSP and their connection to conventional
digital signal processing. We then summarize recent developments in developing
basic GSP tools, including methods for sampling, filtering or graph learning.
Next, we review progress in several application areas using GSP, including
processing and analysis of sensor network data, biological data, and
applications to image processing and machine learning. We finish by providing a
brief historical perspective to highlight how concepts recently developed in
GSP build on top of prior research in other areas.Comment: To appear, Proceedings of the IEE
- …