5,572 research outputs found
PAMPC: Perception-Aware Model Predictive Control for Quadrotors
We present the first perception-aware model predictive control framework for
quadrotors that unifies control and planning with respect to action and
perception objectives. Our framework leverages numerical optimization to
compute trajectories that satisfy the system dynamics and require control
inputs within the limits of the platform. Simultaneously, it optimizes
perception objectives for robust and reliable sens- ing by maximizing the
visibility of a point of interest and minimizing its velocity in the image
plane. Considering both perception and action objectives for motion planning
and control is challenging due to the possible conflicts arising from their
respective requirements. For example, for a quadrotor to track a reference
trajectory, it needs to rotate to align its thrust with the direction of the
desired acceleration. However, the perception objective might require to
minimize such rotation to maximize the visibility of a point of interest. A
model-based optimization framework, able to consider both perception and action
objectives and couple them through the system dynamics, is therefore necessary.
Our perception-aware model predictive control framework works in a
receding-horizon fashion by iteratively solving a non-linear optimization
problem. It is capable of running in real-time, fully onboard our lightweight,
small-scale quadrotor using a low-power ARM computer, to- gether with a
visual-inertial odometry pipeline. We validate our approach in experiments
demonstrating (I) the contradiction between perception and action objectives,
and (II) improved behavior in extremely challenging lighting conditions
Semantic multimedia remote display for mobile thin clients
Current remote display technologies for mobile thin clients convert practically all types of graphical content into sequences of images rendered by the client. Consequently, important information concerning the content semantics is lost. The present paper goes beyond this bottleneck by developing a semantic multimedia remote display. The principle consists of representing the graphical content as a real-time interactive multimedia scene graph. The underlying architecture features novel components for scene-graph creation and management, as well as for user interactivity handling. The experimental setup considers the Linux X windows system and BiFS/LASeR multimedia scene technologies on the server and client sides, respectively. The implemented solution was benchmarked against currently deployed solutions (VNC and Microsoft-RDP), by considering text editing and WWW browsing applications. The quantitative assessments demonstrate: (1) visual quality expressed by seven objective metrics, e.g., PSNR values between 30 and 42 dB or SSIM values larger than 0.9999; (2) downlink bandwidth gain factors ranging from 2 to 60; (3) real-time user event management expressed by network round-trip time reduction by factors of 4-6 and by uplink bandwidth gain factors from 3 to 10; (4) feasible CPU activity, larger than in the RDP case but reduced by a factor of 1.5 with respect to the VNC-HEXTILE
Automatic Subjective Quality Estimation of 3D Stereoscopic Videos: NR-RR Approach
A method for estimating subjective quality score of 3D stereoscopic
video is proposed which is based on decision trees. The
output of this estimation can be fed into encoding and transmission
units for compensation. The proposed method operates with
minimum dependency on reference video. Content characteristics,
no reference (NR) and reduced reference (RR) quality metrics
are extracted and summarised prior to training stage. Content features
are based on spatio-temporal activities within depth layers.
Quality features include NR blockiness, NR blurriness and RR
3D stereoscopic video quality metric. Due to fast and accurate requirements
for the quality estimation, decision trees are employed
where a 0.94 accuracy is achieved
FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality
We introduce FaceVR, a novel method for gaze-aware facial reenactment in the Virtual Reality (VR) context. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD), as well as a new data-driven approach for eye tracking from monocular videos. In addition to these face reconstruction components, FaceVR incorporates photo-realistic re-rendering in real time, thus allowing artificial modifications of face and eye appearances. For instance, we can alter facial expressions, change gaze directions, or remove the VR goggles in realistic re-renderings. In a live setup with a source and a target actor, we apply these newly-introduced algorithmic components. We assume that the source actor is wearing a VR device, and we capture his facial expressions and eye movement in real-time. For the target video, we mimic a similar tracking process; however, we use the source input to drive the animations of the target video, thus enabling gaze-aware facial reenactment. To render the modified target video on a stereo display, we augment our capture and reconstruction process with stereo data. In the end, FaceVR produces compelling results for a variety of applications, such as gaze-aware facial reenactment, reenactment in virtual reality, removal of VR goggles, and re-targeting of somebody's gaze direction in a video conferencing call
RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network
LiDAR point cloud frame interpolation, which synthesizes the intermediate
frame between the captured frames, has emerged as an important issue for many
applications. Especially for reducing the amounts of point cloud transmission,
it is by predicting the intermediate frame based on the reference frames to
upsample data to high frame rate ones. However, due to high-dimensional and
sparse characteristics of point clouds, it is more difficult to predict the
intermediate frame for LiDAR point clouds than videos. In this paper, we
propose a novel LiDAR point cloud frame interpolation method, which exploits
range images (RIs) as an intermediate representation with CNNs to conduct the
frame interpolation process. Considering the inherited characteristics of RIs
differ from that of color images, we introduce spatially adaptive convolutions
to extract range features adaptively, while a high-efficient flow estimation
method is presented to generate optical flows. The proposed model then warps
the input frames and range features, based on the optical flows to synthesize
the interpolated frame. Extensive experiments on the KITTI dataset have clearly
demonstrated that our method consistently achieves superior frame interpolation
results with better perceptual quality to that of using state-of-the-art video
frame interpolation methods. The proposed method could be integrated into any
LiDAR point cloud compression systems for inter prediction.Comment: Accepted by the IEEE International Symposium on Broadband Multimedia
Systems and Broadcasting 202
- …