3,803 research outputs found

    StableFlow: a physics inspired digital video stabilization

    Get PDF
    This thesis addresses the problem of digital video stabilization. With the widespread use of handheld devices and unmanned aerial vehicles (UAVs) that has the ability to record videos, digital video stabilization becomes more important as the videos are often shaky undermining the visual quality of the video. Digital video stabilization has been studied for decades yielding an extensive amount of literature in the field, however, current approaches suffer from either being computationally expensive or under-performing in terms of visual quality . In this thesis, we firstly introduce a novel study of the effect of image denoising on feature-based digital video stabilization. Then, we introduce SteadyFlow, a novel technique for real-time stabilization inspired by the mass spring damper model. A video frame is modelled as a mass suspended in each direction by a critically dampened spring and damper which can be fine-tuned to adapt with different shaking patterns. The proposed technique is tested on video sequences that have different types of shakiness and diverse video contents. The obtained results significantly outperforms state-of-the art stabilization techniques in terms of visual quality while performing in real time

    Fine-grained Activities of People Worldwide

    Full text link
    Every day, humans perform many closely related activities that involve subtle discriminative motions, such as putting on a shirt vs. putting on a jacket, or shaking hands vs. giving a high five. Activity recognition by ethical visual AI could provide insights into our patterns of daily life, however existing activity recognition datasets do not capture the massive diversity of these human activities around the world. To address this limitation, we introduce Collector, a free mobile app to record video while simultaneously annotating objects and activities of consented subjects. This new data collection platform was used to curate the Consented Activities of People (CAP) dataset, the first large-scale, fine-grained activity dataset of people worldwide. The CAP dataset contains 1.45M video clips of 512 fine grained activity labels of daily life, collected by 780 subjects in 33 countries. We provide activity classification and activity detection benchmarks for this dataset, and analyze baseline results to gain insight into how people around with world perform common activities. The dataset, benchmarks, evaluation tools, public leaderboards and mobile apps are available for use at visym.github.io/cap

    Real-time video stabilization without phantom movements for micro aerial vehicles

    Get PDF
    In recent times, micro aerial vehicles (MAVs) are becoming popular for several applications as rescue, surveillance, mapping, etc. Undesired motion between consecutive frames is a problem in a video recorded by MAVs. There are different approaches, applied in video post-processing, to solve this issue. However, there are only few algorithms able to be applied in real time. An additional and critical problem is the presence of false movements in the stabilized video. In this paper, we present a new approach of video stabilization which can be used in real time without generating false movements. Our proposal uses a combination of a low-pass filter and control action information to estimate the motion intention.Peer ReviewedPostprint (published version

    Adaptive online deployment for resource constrained mobile smart clients

    Get PDF
    Nowadays mobile devices are more and more used as a platform for applications. Contrary to prior generation handheld devices configured with a predefined set of applications, today leading edge devices provide a platform for flexible and customized application deployment. However, these applications have to deal with the limitations (e.g. CPU speed, memory) of these mobile devices and thus cannot handle complex tasks. In order to cope with the handheld limitations and the ever changing device context (e.g. network connections, remaining battery time, etc.) we present a middleware solution that dynamically offloads parts of the software to the most appropriate server. Without a priori knowledge of the application, the optimal deployment is calculated, that lowers the cpu usage at the mobile client, whilst keeping the used bandwidth minimal. The information needed to calculate this optimum is gathered on the fly from runtime information. Experimental results show that the proposed solution enables effective execution of complex applications in a constrained environment. Moreover, we demonstrate that the overhead from the middleware components is below 2%

    Medical student case presentation performance and perception when using mobile learning technology in the emergency department

    Get PDF
    Hand-held mobile learning technology provides opportunities for clinically relevant self-instructional modules to augment traditional bedside teaching. Using this technology as a teaching tool has not been well studied. We sought to evaluate medical students’ case presentation performance and perception when viewing short, just-in-time mobile learning videos using the iPod touch prior to patient encounters.Twenty-two fourth-year medical students were randomized to receive or not to receive instruction by video, using the iPod Touch, prior to patient encounters. After seeing a patient, they presented the case to their faculty, who completed a standard data collection sheet. Students were surveyed on their perceived confidence and effectiveness after using these videos.Twenty-two students completed a total of 67 patient encounters. There was a statistically significant improvement in presentations when the videos were viewed for the first time (p = 0.032). There was no difference when the presentations were summed for the entire rotation (p = 0.671). The reliable (alpha = 0.97) survey indicated that the videos were a useful teaching tool and gave students more confidence in their presentations.Medical student patient presentations were improved with the use of mobile instructional videos following first time use, suggesting mobile learning videos may be useful in medical student education. If direct bedside teaching is unavailable, just-in-time iPod touch videos can be an alternative instructional strategy to improve first-time patient presentations by medical students

    MonoPerfCap: Human Performance Capture from Monocular Video

    Full text link
    We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

    An Inertial Device-based User Interaction with Occlusion-free Object Handling in a Handheld Augmented Reality

    Get PDF
    Augmented Reality (AR) is a technology used to merge virtual objects with real environments in real-time. In AR, the interaction which occurs between the end-user and the AR system has always been the frequently discussed topic. In addition, handheld AR is a new approach in which it delivers enriched 3D virtual objects when a user looks through the device’s video camera. One of the most accepted handheld devices nowadays is the smartphones which are equipped with powerful processors and cameras for capturing still images and video with a range of sensors capable of tracking location, orientation and motion of the user. These modern smartphones offer a sophisticated platform for implementing handheld AR applications. However, handheld display provides interface with the interaction metaphors which are developed with head-mounted display attached along and it might restrict with hardware which is inappropriate for handheld. Therefore, this paper will discuss a proposed real-time inertial device-based interaction technique for 3D object manipulation. It also explains the methods used such for selection, holding, translation and rotation. It aims to improve the limitation in 3D object manipulation when a user can hold the device with both hands without requiring the need to stretch out one hand to manipulate the 3D object. This paper will also recap of previous works in the field of AR and handheld AR. Finally, the paper provides the experimental results to offer new metaphors to manipulate the 3D objects using handheld devices

    Video Magnification for Structural Analysis Testing

    Get PDF
    The goal of this thesis is to allow a user to see minute motion of an object at different frequencies, using a computer program, to aid in vibration testing analysis without the use of complex setups of accelerometers or expensive laser vibrometers. MIT’s phase-based video motion processing ­was modified to enable modal determination of structures in the field using a cell phone camera. The algorithm was modified by implementing a stabilization algorithm and permitting the magnification filter to operate on multiple frequency ranges to enable visualization of the natural frequencies of structures in the field. To implement multiple frequency ranges a new function was developed to implement the magnification filter at each relevant frequency range within the original video. The stabilization algorithm would allow for a camera to be hand-held instead of requiring a tripod mount. The following methods for stabilization were tested: fixed point video stabilization and image registration. Neither method removed the global motion from the hand-held video, even after masking was implemented, which resulted in poor results. Specifically, fixed point did not remove much motion or created sharp motions and image registration introduced a pulsing effect. The best results occurred when the object being observed had contrast from the background, was the largest feature in the video frame, and the video was captured from a tripod at an appropriate angle. The final program can amplify the motion in user selected frequency bands and can be used as an aid in structural analysis testing
    corecore