3,803 research outputs found
StableFlow: a physics inspired digital video stabilization
This thesis addresses the problem of digital video stabilization. With the widespread use
of handheld devices and unmanned aerial vehicles (UAVs) that has the ability to record
videos, digital video stabilization becomes more important as the videos are often shaky
undermining the visual quality of the video. Digital video stabilization has been studied
for decades yielding an extensive amount of literature in the field, however, current approaches
suffer from either being computationally expensive or under-performing in terms
of visual quality . In this thesis, we firstly introduce a novel study of the effect of image
denoising on feature-based digital video stabilization. Then, we introduce SteadyFlow, a
novel technique for real-time stabilization inspired by the mass spring damper model. A
video frame is modelled as a mass suspended in each direction by a critically dampened
spring and damper which can be fine-tuned to adapt with different shaking patterns. The
proposed technique is tested on video sequences that have different types of shakiness
and diverse video contents. The obtained results significantly outperforms state-of-the
art stabilization techniques in terms of visual quality while performing in real time
Fine-grained Activities of People Worldwide
Every day, humans perform many closely related activities that involve subtle
discriminative motions, such as putting on a shirt vs. putting on a jacket, or
shaking hands vs. giving a high five. Activity recognition by ethical visual AI
could provide insights into our patterns of daily life, however existing
activity recognition datasets do not capture the massive diversity of these
human activities around the world. To address this limitation, we introduce
Collector, a free mobile app to record video while simultaneously annotating
objects and activities of consented subjects. This new data collection platform
was used to curate the Consented Activities of People (CAP) dataset, the first
large-scale, fine-grained activity dataset of people worldwide. The CAP dataset
contains 1.45M video clips of 512 fine grained activity labels of daily life,
collected by 780 subjects in 33 countries. We provide activity classification
and activity detection benchmarks for this dataset, and analyze baseline
results to gain insight into how people around with world perform common
activities. The dataset, benchmarks, evaluation tools, public leaderboards and
mobile apps are available for use at visym.github.io/cap
Real-time video stabilization without phantom movements for micro aerial vehicles
In recent times, micro aerial vehicles (MAVs) are becoming popular for several applications as rescue, surveillance, mapping, etc. Undesired motion between consecutive frames is a problem in a video recorded by MAVs. There are different approaches, applied in video post-processing, to solve this issue. However, there are only few algorithms able to be applied in real time. An additional and critical problem is the presence of false movements in the stabilized video. In this paper, we present a new approach of video stabilization which can be used in real time without generating false movements. Our proposal uses a combination of a low-pass filter and control action information to estimate the motion intention.Peer ReviewedPostprint (published version
Adaptive online deployment for resource constrained mobile smart clients
Nowadays mobile devices are more and more used as a platform for applications. Contrary to prior generation handheld devices configured with a predefined set of applications, today leading edge devices provide a platform for flexible and customized application deployment. However, these applications have to deal with the limitations (e.g. CPU speed, memory) of these mobile devices and thus cannot handle complex tasks. In order to cope with the handheld limitations and the ever changing device context (e.g. network connections, remaining battery time, etc.) we present a middleware solution that dynamically offloads parts of the software to the most appropriate server. Without a priori knowledge of the application, the optimal deployment is calculated, that lowers the cpu usage at the mobile client, whilst keeping the used bandwidth minimal. The information needed to calculate this optimum is gathered on the fly from runtime information. Experimental results show that the proposed solution enables effective execution of complex applications in a constrained environment. Moreover, we demonstrate that the overhead from the middleware components is below 2%
Medical student case presentation performance and perception when using mobile learning technology in the emergency department
Hand-held mobile learning technology provides opportunities for clinically relevant self-instructional modules to augment traditional bedside teaching. Using this technology as a teaching tool has not been well studied. We sought to evaluate medical students’ case presentation performance and perception when viewing short, just-in-time mobile learning videos using the iPod touch prior to patient encounters.Twenty-two fourth-year medical students were randomized to receive or not to receive instruction by video, using the iPod Touch, prior to patient encounters. After seeing a patient, they presented the case to their faculty, who completed a standard data collection sheet. Students were surveyed on their perceived confidence and effectiveness after using these videos.Twenty-two students completed a total of 67 patient encounters. There was a statistically significant improvement in presentations when the videos were viewed for the first time (p = 0.032). There was no difference when the presentations were summed for the entire rotation (p = 0.671). The reliable (alpha = 0.97) survey indicated that the videos were a useful teaching tool and gave students more confidence in their presentations.Medical student patient presentations were improved with the use of mobile instructional videos following first time use, suggesting mobile learning videos may be useful in medical student education. If direct bedside teaching is unavailable, just-in-time iPod touch videos can be an alternative instructional strategy to improve first-time patient presentations by medical students
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
An Inertial Device-based User Interaction with Occlusion-free Object Handling in a Handheld Augmented Reality
Augmented Reality (AR) is a technology used to merge virtual objects with real environments in real-time. In AR, the interaction which occurs between the end-user and the AR system has always been the frequently discussed topic. In addition, handheld AR is a new approach in which it delivers enriched 3D virtual objects when a user looks through the device’s video camera. One of the most accepted handheld devices nowadays is the smartphones which are equipped with powerful processors and cameras for capturing still images and video with a range of sensors capable of tracking location, orientation and motion of the user. These modern smartphones offer a sophisticated platform for implementing handheld AR applications. However, handheld display provides interface with the interaction metaphors which are developed with head-mounted display attached along and it might restrict with hardware which is inappropriate for handheld. Therefore, this paper will discuss a proposed real-time inertial device-based interaction technique for 3D object manipulation. It also explains the methods used such for selection, holding, translation and rotation. It aims to improve the limitation in 3D object manipulation when a user can hold the device with both hands without requiring the need to stretch out one hand to manipulate the 3D object. This paper will also recap of previous works in the field of AR and handheld AR. Finally, the paper provides the experimental results to offer new metaphors to manipulate the 3D objects using handheld devices
Video Magnification for Structural Analysis Testing
The goal of this thesis is to allow a user to see minute motion of an object at different frequencies, using a computer program, to aid in vibration testing analysis without the use of complex setups of accelerometers or expensive laser vibrometers. MIT’s phase-based video motion processing was modified to enable modal determination of structures in the field using a cell phone camera. The algorithm was modified by implementing a stabilization algorithm and permitting the magnification filter to operate on multiple frequency ranges to enable visualization of the natural frequencies of structures in the field. To implement multiple frequency ranges a new function was developed to implement the magnification filter at each relevant frequency range within the original video. The stabilization algorithm would allow for a camera to be hand-held instead of requiring a tripod mount. The following methods for stabilization were tested: fixed point video stabilization and image registration. Neither method removed the global motion from the hand-held video, even after masking was implemented, which resulted in poor results. Specifically, fixed point did not remove much motion or created sharp motions and image registration introduced a pulsing effect. The best results occurred when the object being observed had contrast from the background, was the largest feature in the video frame, and the video was captured from a tripod at an appropriate angle. The final program can amplify the motion in user selected frequency bands and can be used as an aid in structural analysis testing
- …