1,039 research outputs found

    Personalized retrieval of sports video

    Full text link

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences

    Automatic Labelling of 3D Motion Capture Markers using Neural Networks

    Get PDF
    Il lavoro si concentra sullo sviluppo di un framework di machine learning basato su reti neurali, in grado di predire le assegnazioni di marcatori 3D da un sistema di motion capture, per un’applicazione esistente dedicata all’analisi biomeccanica in campo medico e sportivo. Partendo da un dataset di file 3D acquisiti in campo e in laboratorio da una eterogeneità di pazienti e atleti, contenenti movimenti specifici, lo scopo è quello di accelerare il processo di analisi attualmente svolto manualmente che prevede un etichettatura manuale dei dati tridimensionali, prima che i dati possano essere analizzati. Il framework di reti neurali implementato si basa su una rigorosa pre-elaborazione e post-elaborazione votata alla pulizia dei dati per ottenere un risultato migliore e per gestire la presenza di marcatori mancanti ed estranei, mentre il cuore dell’algoritmo è una serie di reti neurali LSTM. Il training della rete performato su di un primo sub-set è stato un successo, rilevando un’accuratezza in fase di test del 95%.The work focuses on the development of a machine learning framework based on neural networks, which is able to predict the labels of 3D markers from a motion capture system. This should be implemented in an existing application devoted to the biomechanical analysis in the medical and sports field. Starting from a dataset of 3D files acquired in field and in laboratory from an heterogeneity of patients and athletes, containing specific movements, the purpose is to speed up the process of analysis during which somebody until now proceeds with a manual labeling of the 3D cloud of points, before the data can be analyzed. The neural networks framework implemented is based on a strict pre-processing and post-processing voted to clean the data in order to get a better result and to handle the presence of missing and extraneous markers, while the core of the algorithm is a LSTM Neural Network. The training performed on the network on a first computable sub-set has been a success, revealing a test accuracy of 95%

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video
    • …
    corecore