8 research outputs found

    Edited nearest neighbour for selecting keyframe summaries of egocentric videos

    Get PDF
    A keyframe summary of a video must be concise, comprehensive and diverse. Current video summarisation methods may not be able to enforce diversity of the summary if the events have highly similar visual content, as is the case of egocentric videos. We cast the problem of selecting a keyframe summary as a problem of prototype (instance) selection for the nearest neighbour classifier (1-nn). Assuming that the video is already segmented into events of interest (classes), and represented as a dataset in some feature space, we propose a Greedy Tabu Selector algorithm (GTS) which picks one frame to represent each class. An experiment with the UT (Egocentric) video database and seven feature representations illustrates the proposed keyframe summarisation method. GTS leads to improved match to the user ground truth compared to the closest-to-centroid baseline summarisation method. Best results were obtained with feature spaces obtained from a convolutional neural network (CNN).Leverhulme Trust, UKSao Paulo Research Foundation - FAPESPBangor Univ, Sch Comp Sci, Dean St, Bangor LL57 1UT, Gwynedd, WalesFed Univ Sao Paulo UNIFESP, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, BrazilFed Univ Sao Paulo UNIFESP, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, BrazilLeverhulme: RPG-2015-188FAPESP: 2016/06441-7Web of Scienc

    Deep Convolutional Pooling Transformer for Deepfake Detection

    Full text link
    Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed in distinguishing between real and fake. Most existing deep learning methods mainly focus on local features and relations within the face image using convolutional neural networks as a backbone. However, local features and relations are insufficient for model training to learn enough general information for Deepfake detection. Therefore, the existing Deepfake detection methods have reached a bottleneck to further improve the detection performance. To address this issue, we propose a deep convolutional Transformer to incorporate the decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy. Moreover, we employ the barely discussed image keyframes in model training for performance improvement and visualize the feature quantity gap between the key and normal image frames caused by video compression. We finally illustrate the transferability with extensive experiments on several Deepfake benchmark datasets. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.Comment: Accepted to be published in ACM TOM

    Fast-forward video visualization

    Get PDF
    Durch den Einsatz von Überwachungskameras kann sich eine sehr große Menge an Videomaterial ansammeln. Wenn es nötig ist, dieses Material manuell zu betrachten, so kann man durch die Verwendung des Schnellvorlaufs die Betrachtungszeit verkürzen. Bei digitalen Videos können dabei hohe Beschleunigungen durch das Überspringen von Bildern erreicht werden. Diese Sprünge können jedoch zu „Change Blindness“ führen und den Beobachter wichtige Ereignisse verpassen lassen. Im Rahmen dieser Diplomarbeit werden deshalb Methoden vorgestellt, die auf verschiedene Weise versuchen, die Informationen der übersprungenen Bilder wieder sichtbar zu machen. Biologisch motiviertes Blending erzeugt eine Bewegungsunschärfe bei den Objekten. Die Differenzen-Methode erzeugt multiple Instanzen der Objekte im Verlauf ihrer Bewegung. Beim Tracking werden die Objekte mit Pfeilen und Schweifen versehen, die Informationen über vergangene und zukünftige Bewegungen der Objekte vermitteln. Da es in Überwachungsvideos häufig Situationen gibt, in denen nichts passiert, kann man durch Adaptive Fast-Forward unterschiedliche Beschleunigungen nach Priorität der Ereignisse verwenden. Drei verschiedene Geschwindigkeitsvisualisierungen werden vorgestellt, die den Wechsel zwischen den Beschleunigungen besser vermitteln sollen. In einer Benutzerstudie werden dann die Methoden für den Schnellvorlauf genauer untersucht. Dabei wird die Objekterkennung und die Verfolgung von Bewegungen geprüft. Die Geschwindigkeitsvisualisierungen werden in einer zusätzlichen Aufgabe hinsichtlich ihrer Effektivität und Beanspruchung miteinander verglichen

    Collaborative Localization and Mapping for Autonomous Planetary Exploration : Distributed Stereo Vision-Based 6D SLAM in GNSS-Denied Environments

    Get PDF
    Mobile robots are a crucial element of present and future scientific missions to explore the surfaces of foreign celestial bodies such as Moon and Mars. The deployment of teams of robots allows to improve efficiency and robustness in such challenging environments. As long communication round-trip times to Earth render the teleoperation of robotic systems inefficient to impossible, on-board autonomy is a key to success. The robots operate in Global Navigation Satellite System (GNSS)-denied environments and thus have to rely on space-suitable on-board sensors such as stereo camera systems. They need to be able to localize themselves online, to model their surroundings, as well as to share information about the environment and their position therein. These capabilities constitute the basis for the local autonomy of each system as well as for any coordinated joint action within the team, such as collaborative autonomous exploration. In this thesis, we present a novel approach for stereo vision-based on-board and online Simultaneous Localization and Mapping (SLAM) for multi-robot teams given the challenges imposed by planetary exploration missions. We combine distributed local and decentralized global estimation methods to get the best of both worlds: A local reference filter on each robot provides real-time local state estimates required for robot control and fast reactive behaviors. We designed a novel graph topology to incorporate these state estimates into an online incremental graph optimization to compute global pose and map estimates that serve as input to higher-level autonomy functions. In order to model the 3D geometry of the environment, we generate dense 3D point cloud and probabilistic voxel-grid maps from noisy stereo data. We distribute the computational load and reduce the required communication bandwidth between robots by locally aggregating high-bandwidth vision data into partial maps that are then exchanged between robots and composed into global models of the environment. We developed methods for intra- and inter-robot map matching to recognize previously visited locations in semi- and unstructured environments based on their estimated local geometry, which is mostly invariant to light conditions as well as different sensors and viewpoints in heterogeneous multi-robot teams. A decoupling of observable and unobservable states in the local filter allows us to introduce a novel optimization: Enforcing all submaps to be gravity-aligned, we can reduce the dimensionality of the map matching from 6D to 4D. In addition to map matches, the robots use visual fiducial markers to detect each other. In this context, we present a novel method for modeling the errors of the loop closure transformations that are estimated from these detections. We demonstrate the robustness of our methods by integrating them on a total of five different ground-based and aerial mobile robots that were deployed in a total of 31 real-world experiments for quantitative evaluations in semi- and unstructured indoor and outdoor settings. In addition, we validated our SLAM framework through several different demonstrations at four public events in Moon and Mars-like environments. These include, among others, autonomous multi-robot exploration tests at a Moon-analogue site on top of the volcano Mt. Etna, Italy, as well as the collaborative mapping of a Mars-like environment with a heterogeneous robotic team of flying and driving robots in more than 35 public demonstration runs
    corecore