344 research outputs found

    A survey of real-time crowd rendering

    Get PDF
    In this survey we review, classify and compare existing approaches for real-time crowd rendering. We first overview character animation techniques, as they are highly tied to crowd rendering performance, and then we analyze the state of the art in crowd rendering. We discuss different representations for level-of-detail (LoD) rendering of animated characters, including polygon-based, point-based, and image-based techniques, and review different criteria for runtime LoD selection. Besides LoD approaches, we review classic acceleration schemes, such as frustum culling and occlusion culling, and describe how they can be adapted to handle crowds of animated characters. We also discuss specific acceleration techniques for crowd rendering, such as primitive pseudo-instancing, palette skinning, and dynamic key-pose caching, which benefit from current graphics hardware. We also address other factors affecting performance and realism of crowds such as lighting, shadowing, clothing and variability. Finally we provide an exhaustive comparison of the most relevant approaches in the field.Peer ReviewedPostprint (author's final draft

    Edge Based RGB-D SLAM and SLAM Based Navigation

    Get PDF

    Deep Convolutional Pooling Transformer for Deepfake Detection

    Full text link
    Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improve the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.Comment: Accepted to be published in ACM TOM

    Video ControlNet: Towards Temporally Consistent Synthetic-to-Real Video Translation Using Conditional Image Diffusion Models

    Full text link
    In this study, we present an efficient and effective approach for achieving temporally consistent synthetic-to-real video translation in videos of varying lengths. Our method leverages off-the-shelf conditional image diffusion models, allowing us to perform multiple synthetic-to-real image generations in parallel. By utilizing the available optical flow information from the synthetic videos, our approach seamlessly enforces temporal consistency among corresponding pixels across frames. This is achieved through joint noise optimization, effectively minimizing spatial and temporal discrepancies. To the best of our knowledge, our proposed method is the first to accomplish diverse and temporally consistent synthetic-to-real video translation using conditional image diffusion models. Furthermore, our approach does not require any training or fine-tuning of the diffusion models. Extensive experiments conducted on various benchmarks for synthetic-to-real video translation demonstrate the effectiveness of our approach, both quantitatively and qualitatively. Finally, we show that our method outperforms other baseline methods in terms of both temporal consistency and visual quality

    Multi-sensor Mapping in natural environment: Three-Dimensional Reconstruction and temporal alignment

    Get PDF
    The objective of this thesis is the adaptation and development of robotic techniques, suitable for geometric three dimensional reconstruction of natural environments, leading into the temporal alignment of natural outdoor surveys. The objective has been achieved by adapting the state-of-the-art in field robotics and computer vision, such as sensor fusion and visual \acrfull{SLAM}. Throughout this thesis, we combine data generated by cameras, lasers and an inertial measurement unit, in order to geometrically reconstruct the surrounding scene as well as to estimate the trajectory. By supporting cameras with laser depth information, we show that it is possible to stabilize the state-of-the-art in visual odometry, and recover scale for visual maps. We also show that factor graphs are powerful tools for sensor fusion, and can be used for a generalized approach involving multiple sensors. Using semantic knowledge, we constrain the \acrfull{ICP} in order to build keyframes as well as to align them both spatially and temporally. Hierarchical clustering of ICP-generated transformations is then used to both eliminate outliers and find alignment consensus, followed by an optimization scheme based on a factor graph that includes loop closure. Data was captured using a portable robotic sensor suite consisting of three cameras, three dimensional lidar, and an inertial navigation system. Throughout this thesis, data was captured in the natural environment using a wearable sensor suite, conceived in the first months of this thesis. The data was acquired in monthly intervals over 12 months, by revisiting the same trajectory between August 2020 and July 2021. Finally, it has been shown that it is possible to align monthly surveys, taken over a year using the conceived sensor suite, and to provide insightful metrics for change evaluation in natural environment.Ph.D

    Robust and Efficient Camera-based Scene Reconstruction

    Get PDF
    For the simultaneous reconstruction of 3D scene geometry and camera poses from images or videos, there are two major approaches: On the one hand it is possible to perform a sparse reconstruction by extracting recognizable features from multiple images which correspond to the same 3D points in the scene. With those features, the positions of the 3D points as well as the camera poses can be obtained such that they explain the positions of the features in the images best. On the other hand, on video data, a dense reconstruction can be obtained by alternating between the tracking of the camera pose and updating a depth map representing the scene per frame of the video. In this dissertation, we introduce several improvements to both reconstruction strategies. We start from improving the reliability of image feature matches which leads to faster and more robust subsequent processing. Then, we present a sparse reconstruction pipeline completely optimized for high resolution and high frame rate video, exploiting the redundancy in the data to gain more efficiency. For (semi-)dense reconstruction on camera rigs which is prone to calibration inaccuracies, we show how to model and recover the rig calibration online in the reconstruction process. Finally, we explore the applicability of machine learning based on neural networks to the relative camera pose problem, focusing mainly on generating optimal training data. Robust and fast 3D reconstruction of the environment is demanded in several currently emerging applications ranging from set scanning for movies and computer games over inside-out tracking based augmented reality devices to autonomous robots and drones as well as self-driving cars.Für die gemeinsame Rekonstruktion von 3D Szenengeometrie und Kamera-Posen aus Bildern oder Videos gibt es zwei grundsätzliche Ansätze: Auf der einen Seite kann eine aus wenigen Oberflächen-Punkten bestehende Rekonstruktion erstellt werden, indem einzelne wiedererkennbare Features, die zum selben 3D-Punkt der Szene gehören, aus Bildern extrahiert werden. Mit diesen Features können die Position der 3D-Punkte sowie die Posen der Kameras so bestimmt werden, dass sie die Positionen der Features in den Bildern bestmöglich erklären. Auf der anderen Seite können bei Videos dichter gesampelte Oberflächen rekonstruiert werden, indem für jedes Einzelbild zuerst die Kamera-Pose bestimmt und dann die Szenengeometrie, die als Tiefenkarte vorhanden ist, verbessert wird. In dieser Dissertation werden verschiedene Verbesserungen für beide Rekonstruktionsstrategien vorgestellt. Wir beginnen damit, die Zuverlässigkeit beim Finden von Bildfeature-Paaren zu erhöhen, was zu einer robusteren und schnelleren Verarbeitung in den weiteren Rekonstruktionsschritten führt. Außerdem präsentieren wir eine Rekonstruktions-Pipeline für die Feature-basierte Rekonstruktion, die auf hohe Auflösungen und Bildwiederholraten optimiert ist und die Redundanz in entsprechenden Daten für eine effizientere Verarbeitung ausnutzt. Für die dichte Rekonstruktion von Oberflächen mit Multi-Kamera-Rigs, welche anfällig für Kalibrierungsungenauigkeiten ist, beschreiben wir, wie die Posen der Kameras innerhalb des Rigs modelliert und im Rekonstruktionsprozess laufend bestimmt werden können. Schließlich untersuchen wir die Anwendbarkeit von maschinellem Lernen basierend auf neuralen Netzen auf das Problem der Bestimmung der relativen Kamera-Pose. Unser Hauptaugenmerk liegt dabei auf dem Generieren möglichst optimaler Trainingsdaten. Eine robuste und schnelle 3D-Rekonstruktion der Umgebung wird in vielen zur Zeit aufstrebenden Anwendungsgebieten benötigt: Beim Erzeugen virtueller Abbilder realer Umgebungen für Filme und Computerspiele, bei inside-out Tracking basierten Augmented Reality Geräten, für autonome Roboter und Drohnen sowie bei selbstfahrenden Autos
    • …
    corecore