75 research outputs found

    Automatic mashup generation of multiple-camera videos

    Get PDF
    The amount of user generated video content is growing enormously with the increase in availability and affordability of technologies for video capturing (e.g. camcorders, mobile-phones), storing (e.g. magnetic and optical devices, online storage services), and sharing (e.g. broadband internet, social networks). It has become a common sight at social occasions like parties, concerts, weddings, vacations that many people are shooting videos at approximately the same time. Such concurrent recordings provide multiple views of the same event. In professional video production, the use of multiple cameras is very common. In order to compose an interesting video to watch, audio and video segments from different recordings are mixed into a single video stream. However, in case of non-professional recordings, mixing different camera recordings is not common as the process is considered very time consuming and requires expertise to do. In this thesis, we research on how to automatically combine multiple-camera recordings in a single video stream, called as a mashup. Since non-professional recordings, in general, are characterized by low signal quality and lack of artistic appeal, our objective is to use mashups to enrich the viewing experience of such recordings. In order to define a target application and collect requirements for a mashup, we conducted a study by involving experts on video editing and general camera users by means of interviews and focus groups. Based on the study results, we decided to work on the domain of concert video. We listed the requirements for concert video mashups such as image quality, diversity, and synchronization. According to the requirements, we proposed a solution approach for mashup generation and introduced a formal model consisting of pre-processing, mashupcomposition and post-processing steps. This thesis describes the pre-processing and mashup-composition steps, which result in the automatic generation of a mashup satisfying a set of the elicited requirements. At the pre-processing step, we synchronized multiple-camera recordings to be represented in a common time-line. We proposed and developed synchronization methods based on detecting and matching audio and video features extracted from the recorded content. We developed three realizations of the approach using different features: still-camera flashes in video, audio-fingerprints and audio-onsets. The realizations are independent of the frame rate of the recordings, the number of cameras and provide the synchronization offset accuracy at frame level. Based on their performance in a common data-set, audio-fingerprint and audio-onset were found as the most suitable to apply in generating mashups of concert videos. In the mashup-composition step, we proposed an optimization based solution to compose a mashup from the synchronized recordings. The solution is based on maximizing an objective function containing a number of parameters, which represent the requirements that influence the mashup quality. The function is subjected to a number of constraints, which represent the requirements that must be fulfilled in a mashup. Different audio-visual feature extraction and analysis techniques were employed to measure the degree of fulfillment of the requirements represented in the objective function. We developed an algorithm, first-fit, to compose a mashup satisfying the constraints and maximizing the objective function. Finally, to validate our solution approach, we evaluated the mashups generated by the first-fit algorithm with the ones generated by two other methods. In the first method, naive, a mashup was generated by satisfying only the requirements given as constraints and in the second method, manual, a mashup was created by a professional. In the objective evaluation, first-fit mashups scored higher than both the manual and naive mashups. To assess the end-user satisfaction, we also conducted a user study where we measured user preferences on the mashups generated by the three methods on different aspects of mashup quality. In all the aspects, the naive mashup scored significantly low, while the manual and first-fit mashups scored similarly. We can conclude that the perceived quality of a mashup generated by the naive method is lower than first-fit and manual while the perceived quality of the mashups generated by first-fit and manual methods are similar

    Experimental investigation of the tire wear process using camera-assisted observation assessed by numerical modeling

    Get PDF
    This paper presents a novel experimental method to study the abrasion mechanism of car tires. It is based on the detection of microscopic movements associated with material damage (cracking) on the rubber tread. This is referred to as degrading layer relaxation. It correlates with the wear rate and, interestingly, the direction of the pattern's movement is opposite to the lateral forces during cornering. To measure and analyze the microscopic movements, a new camera-based method with feature point matching using video stabilization was developed. Besides extensive experimental investigation, the formation and propagation of microcracks are investigated using a simplified numerical model in which a phase field approach coupled with a viscoelastic constitutive behavior is implemented in a finite element framework

    StableFlow: a physics inspired digital video stabilization

    Get PDF
    This thesis addresses the problem of digital video stabilization. With the widespread use of handheld devices and unmanned aerial vehicles (UAVs) that has the ability to record videos, digital video stabilization becomes more important as the videos are often shaky undermining the visual quality of the video. Digital video stabilization has been studied for decades yielding an extensive amount of literature in the field, however, current approaches suffer from either being computationally expensive or under-performing in terms of visual quality . In this thesis, we firstly introduce a novel study of the effect of image denoising on feature-based digital video stabilization. Then, we introduce SteadyFlow, a novel technique for real-time stabilization inspired by the mass spring damper model. A video frame is modelled as a mass suspended in each direction by a critically dampened spring and damper which can be fine-tuned to adapt with different shaking patterns. The proposed technique is tested on video sequences that have different types of shakiness and diverse video contents. The obtained results significantly outperforms state-of-the art stabilization techniques in terms of visual quality while performing in real time

    Generalized Trackball and 3D Touch Interaction

    Get PDF
    This thesis faces the problem of 3D interaction by means of touch and mouse input. We propose a multitouch enabled adaptation of the classical mouse based trackball interaction scheme. In addition we introduce a new interaction metaphor based on visiting the space around a virtual object remaining at a given distance. This approach allows an intuitive navigation of topologically complex shapes enabling unexperienced users to visit hard to be reached parts

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results

    Investigation of Pedestrian-Cyclist Interactions through Machine Vision

    Get PDF
    For pedestrian-cyclist facilities where collisions and resulting injuries may not be fully covered in police reports, there is a need for improved safety indicators. After fifteen hours of video observation at Pickard Passageway, College Station, there appears to be four broad types of pedestrian-cyclist interactions: passing, weaving, turning, and avoiding. Within each of these behavior categories, there are both safe and unsafe maneuvers. In order to determine whether an event should qualify as a safety-critical event or near-miss, multiple factors should be taken into account, including relative distance, sudden change in velocity, and sudden change in path. While an improved understanding of the general interactions between pedestrian and cyclists in these underpass facilities can lead to an improvement of the safety research field, analyzing each path manually would take a prohibitively excessive time. This paper suggests ways in which machine learning can implement the behavior categorization of pedestrian-cyclist interactions for safety evaluation at pedestrian-cyclist facilities throughout the identification, classification, and safety evaluation phases

    Content-preserving image stitching with piecewise rectangular boundary constraints

    Get PDF
    This paper proposes an approach to content-preserving image stitching with regular boundary constraints, which aims to stitch multiple images to generate a panoramic image with a piecewise rectangular boundary. Existing methods treat image stitching and rectangling as two separate steps, which may result in suboptimal results as the stitching process is not aware of the further warping needs for rectangling. We address these limitations by formulating image stitching with regular boundaries in a unified optimization. Starting from the initial stitching results produced by the traditional warping-based optimization, we obtain the irregular boundary from the warped meshes by polygon Boolean operations which robustly handle arbitrary mesh compositions. By analyzing the irregular boundary, we construct a piecewise rectangular boundary. Based on this, we further incorporate line and regular boundary preservation constraints into the image stitching framework, and conduct iterative optimization to obtain an optimal piecewise rectangular boundary. Thus we can make the boundary of the stitching results as close as possible to a rectangle, while reducing unwanted distortions. We further extend our method to video stitching, by integrating the temporal coherence into the optimization. Experiments show that our method efficiently produces visually pleasing panoramas with regular boundaries and unnoticeable distortions
    • …
    corecore