22 research outputs found

    The PS-Battles Dataset - an Image Collection for Image Manipulation Detection

    Full text link
    The boost of available digital media has led to a significant increase in derivative work. With tools for manipulating objects becoming more and more mature, it can be very difficult to determine whether one piece of media was derived from another one or tampered with. As derivations can be done with malicious intent, there is an urgent need for reliable and easily usable tampering detection methods. However, even media considered semantically untampered by humans might have already undergone compression steps or light post-processing, making automated detection of tampering susceptible to false positives. In this paper, we present the PS-Battles dataset which is gathered from a large community of image manipulation enthusiasts and provides a basis for media derivation and manipulation detection in the visual domain. The dataset consists of 102'028 images grouped into 11'142 subsets, each containing the original image as well as a varying number of manipulated derivatives.Comment: The dataset introduced in this paper can be found on https://github.com/dbisUnibas/PS-Battle

    Enabling energy-awareness for internet video

    Get PDF
    Continuous improvements to the state of the art have made it easier to create, send and receive vast quantities of video over the Internet. Catalysed by these developments, video is now the largest, and fastest growing type of traffic on modern IP networks. In 2015, video was responsible for 70% of all traffic on the Internet, with an compound annual growth rate of 27%. On the other hand, concerns about the growing energy consumption of ICT in general, continue to rise. It is not surprising that there is a significant energy cost associated with these extensive video usage patterns. In this thesis, I examine the energy consumption of typical video configurations during decoding (playback) and encoding through empirical measurements on an experimental test-bed. I then make extrapolations to a global scale to show the opportunity for significant energy savings, achievable by simple modifications to these video configurations. Based on insights gained from these measurements, I propose a novel, energy-aware Quality of Experience (QoE) metric for digital video - the Energy - Video Quality Index (EnVI). Then, I present and evaluate vEQ-benchmark, a benchmarking and measurement tool for the purpose of generating EnVI scores. The tool enables fine-grained resource-usage analyses on video playback systems, and facilitates the creation of statistical models of power usage for these systems. I propose GreenDASH, an energy-aware extension of the existing Dynamic Adaptive Streaming over HTTP standard (DASH). GreenDASH incorporates relevant energy-usage and video quality information into the existing standard. It could enable dynamic, energy-aware adaptation for video in response to energy-usage and user ‘green’ preferences. I also evaluate the subjective perception of such energy-aware, adaptive video streaming by means of a user study featuring 36 participants. I examine how video may be adapted to save energy without a significant impact on the Quality of Experience of these users. In summary, this thesis highlights the significant opportunities for energy savings if Internet users gain an awareness about their energy usage, and presents a technical discussion how this can be achieved by straightforward extensions to the current state of the art

    A vision-based fully automated approach to robust image cropping detection

    Get PDF
    The definition of valid and robust methodologies for assessing the authenticity of digital information is nowadays critical to contrast social manipulation through the media. A key research topic in multimedia forensics is the development of methods for detecting tampered content in large image collections without any human intervention. This paper introduces AMARCORD (Automatic Manhattan-scene AsymmetRically CrOpped imageRy Detector), a fully automated detector for exposing evidences of asymmetrical image cropping on Manhattan-World scenes. The proposed solution estimates and exploits the camera principal point, i.e., a physical feature extracted directly from the image content that is quite insensitive to image processing operations, such as compression and resizing, typical of social media platforms. Robust computer vision techniques are employed throughout, so as to cope with large sources of noise in the data and improve detection performance. The method leverages a novel metric based on robust statistics, and is also capable to decide autonomously whether the image at hand is tractable or not. The results of an extensive experimental evaluation covering several cropping scenarios demonstrate the effectiveness and robustness of our approac

    Managing Network Delay for Browser Multiplayer Games

    Get PDF
    Latency is one of the key performance elements affecting the quality of experience (QoE) in computer games. Latency in the context of games can be defined as the time between the user input and the result on the screen. In order for the QoE to be satisfactory the game needs to be able to react fast enough to player input. In networked multiplayer games, latency is composed of network delay and local delays. Some major sources of network delay are queuing delay and head-of-line (HOL) blocking delay. Network delay in the Internet can be even in the order of seconds. In this thesis we discuss what feasible networking solutions exist for browser multiplayer games. We conduct a literature study to analyze the Differentiated Services architecture, some salient Active Queue Management (AQM) algorithms (RED, PIE, CoDel and FQ-CoDel), the Explicit Congestion Notification (ECN) concept and network protocols for web browser (WebSocket, QUIC and WebRTC). RED, PIE and CoDel as single-queue implementations would be sub-optimal for providing low latency to game traffic. FQ-CoDel is a multi-queue AQM and provides flow separation that is able to prevent queue-building bulk transfers from notably hampering latency-sensitive flows. WebRTC Data-Channel seems promising for games since it can be used for sending arbitrary application data and it can avoid HOL blocking. None of the network protocols, however, provide completely satisfactory support for the transport needs of multiplayer games: WebRTC is not designed for client-server connections, QUIC is not designed for traffic patterns typical for multiplayer games and WebSocket would require parallel connections to mitigate the effects of HOL blocking

    A study of the influence of the sensor sampling frequency on the performance of wearable fall detectors

    Get PDF
    Last decade has witnessed a major research interest on wearable fall detection systems. Sampling rate in these detectors strongly affects the power consumption and required complexity of the employed wearables. This study investigates the effect of the sampling frequency on the efficacy of the detection process. For this purpose, we train a convolutional neural network to directly discriminate falls from conventional activities based on the raw acceleration signals captured by a transportable sensor. Then, we analyze the changes in the performance of this classifier when the sampling rate is progressively reduced. In contrast with previous studies, the detector is tested against a wide set of public repositories of benchmarking traces. The quality metrics achieved for the different frequencies and the analysis of the spectrum of the signals reveal that a sampling rate of 20 Hz can be enough to maximize the effectiveness of a fall detector.This research was funded by the Andalusian Regional Government (-Junta de Andalucía-) under grants FEDER UMA18-FEDERJA-022 and PAIDI P18-RT-1652, and by the Universidad de Málaga, Campus de Excelencia Internacional Andalucia Tech. Funding for open access charge: Universidad de Malaga / CBUA

    Examining the Impact of Provenance-Enabled Media on Trust and Accuracy Perceptions

    Full text link
    In recent years, industry leaders and researchers have proposed to use technical provenance standards to address visual misinformation spread through digitally altered media. By adding immutable and secure provenance information such as authorship and edit date to media metadata, social media users could potentially better assess the validity of the media they encounter. However, it is unclear how end users would respond to provenance information, or how to best design provenance indicators to be understandable to laypeople. We conducted an online experiment with 595 participants from the US and UK to investigate how provenance information altered users' accuracy perceptions and trust in visual content shared on social media. We found that provenance information often lowered trust and caused users to doubt deceptive media, particularly when it revealed that the media was composited. We additionally tested conditions where the provenance information itself was shown to be incomplete or invalid, and found that these states have a significant impact on participants' accuracy perceptions and trust in media, leading them, in some cases, to disbelieve honest media. Our findings show that provenance, although enlightening, is still not a concept well-understood by users, who confuse media credibility with the orthogonal (albeit related) concept of provenance credibility. We discuss how design choices may contribute to provenance (mis)understanding, and conclude with implications for usable provenance systems, including clearer interfaces and user education.Comment: Accepted to CSCW 202

    An analytical comparison of datasets of Real-World and simulated falls intended for the evaluation of wearable fall alerting systems

    Get PDF
    Automatic fall detection is one of the most promising applications of wearables in the field of mobile health. The characterization of the effectiveness of wearable fall detectors is hampered by the inherent difficulty of testing these devices with real-world falls. In fact, practically all the proposals in the literature assess the detection algorithms with ‘scripted’ falls that are simulated in a controlled laboratory environment by a group of volunteers (normally young and healthy participants). Aiming at appraising the adequacy of this method, this work systematically compares the statistical characteristics of the acceleration signals from two databases with real falls and those computed from the simulated falls provided by 18 well-known repositories commonly employed by the related works. The results show noteworthy differences between the dynamics of emulated and real-life falls, which undermines the testing procedures followed to date and forces to rethink the strategies for evaluating wearable fall detectors.Funding for open access charge: Universidad de Málaga / CBUA. This research was funded by FEDER Funds (under grant UMA18-FEDERJA-022), Andalusian Regional Government (-Junta de Andalucía- grant PAIDI P18-RT-1652) and Universidad de Málaga, Campus de Excelencia Internacional Andalucia Tech

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences

    Unsupervised quantification of entity consistency between photos and text in real-world news

    Get PDF
    Das World Wide Web und die sozialen Medien übernehmen im heutigen Informationszeitalter eine wichtige Rolle für die Vermittlung von Nachrichten und Informationen. In der Regel werden verschiedene Modalitäten im Sinne der Informationskodierung wie beispielsweise Fotos und Text verwendet, um Nachrichten effektiver zu vermitteln oder Aufmerksamkeit zu erregen. Kommunikations- und Sprachwissenschaftler erforschen das komplexe Zusammenspiel zwischen Modalitäten seit Jahrzehnten und haben unter Anderem untersucht, wie durch die Kombination der Modalitäten zusätzliche Informationen oder eine neue Bedeutungsebene entstehen können. Die Anzahl gemeinsamer Konzepte oder Entitäten (beispielsweise Personen, Orte und Ereignisse) zwischen Fotos und Text stellen einen wichtigen Aspekt für die Bewertung der Gesamtaussage und Bedeutung eines multimodalen Artikels dar. Automatisierte Ansätze zur Quantifizierung von Bild-Text-Beziehungen können für zahlreiche Anwendungen eingesetzt werden. Sie ermöglichen beispielsweise eine effiziente Exploration von Nachrichten, erleichtern die semantische Suche von Multimedia-Inhalten in (Web)-Archiven oder unterstützen menschliche Analysten bei der Evaluierung der Glaubwürdigkeit von Nachrichten. Allerdings gibt es bislang nur wenige Ansätze, die sich mit der Quantifizierung von Beziehungen zwischen Fotos und Text beschäftigen. Diese Ansätze berücksichtigen jedoch nicht explizit die intermodalen Beziehungen von Entitäten, welche eine wichtige Rolle in Nachrichten darstellen, oder basieren auf überwachten multimodalen Deep-Learning-Techniken. Diese überwachten Lernverfahren können ausschließlich die intermodalen Beziehungen von Entitäten detektieren, die in annotierten Trainingsdaten enthalten sind. Um diese Forschungslücke zu schließen, wird in dieser Arbeit ein unüberwachter Ansatz zur Quantifizierung der intermodalen Konsistenz von Entitäten zwischen Fotos und Text in realen multimodalen Nachrichtenartikeln vorgestellt. Im ersten Teil dieser Arbeit werden neuartige Verfahren auf Basis von Deep Learning zur Extrahierung von Informationen aus Fotos vorgestellt, um Ereignisse (Events), Orte, Zeitangaben und Personen automatisch zu erkennen. Diese Verfahren bilden eine wichtige Voraussetzung, um die Beziehungen von Entitäten zwischen Bild und Text zu bewerten. Zunächst wird ein Ansatz zur Ereignisklassifizierung präsentiert, der neuartige Optimierungsfunktionen und Gewichtungsschemata nutzt um Ontologie-Informationen aus einer Wissensdatenbank in ein Deep-Learning-Verfahren zu integrieren. Das Training erfolgt anhand eines neu vorgestellten Datensatzes, der 570.540 Fotos und eine Ontologie mit 148 Ereignistypen enthält. Der Ansatz übertrifft die Ergebnisse von Referenzsystemen die keine strukturierten Ontologie-Informationen verwenden. Weiterhin wird ein DeepLearning-Ansatz zur Schätzung des Aufnahmeortes von Fotos vorgeschlagen, der Kontextinformationen über die Umgebung (Innen-, Stadt-, oder Naturaufnahme) und von Erdpartitionen unterschiedlicher Granularität verwendet. Die vorgeschlagene Lösung übertrifft die bisher besten Ergebnisse von aktuellen Forschungsarbeiten, obwohl diese deutlich mehr Fotos zum Training verwenden. Darüber hinaus stellen wir den ersten Datensatz zur Schätzung des Aufnahmejahres von Fotos vor, der mehr als eine Million Bilder aus den Jahren 1930 bis 1999 umfasst. Dieser Datensatz wird für das Training von zwei Deep-Learning-Ansätzen zur Schätzung des Aufnahmejahres verwendet, welche die Aufgabe als Klassifizierungs- und Regressionsproblem behandeln. Beide Ansätze erzielen sehr gute Ergebnisse und übertreffen Annotationen von menschlichen Probanden. Schließlich wird ein neuartiger Ansatz zur Identifizierung von Personen des öffentlichen Lebens und ihres gemeinsamen Auftretens in Nachrichtenfotos aus der digitalen Bibliothek Internet Archiv präsentiert. Der Ansatz ermöglicht es unstrukturierte Webdaten aus dem Internet Archiv mit Metadaten, beispielsweise zur semantischen Suche, zu erweitern. Experimentelle Ergebnisse haben die Effektivität des zugrundeliegenden Deep-Learning-Ansatzes zur Personenerkennung bestätigt. Im zweiten Teil dieser Arbeit wird ein unüberwachtes System zur Quantifizierung von BildText-Beziehungen in realen Nachrichten vorgestellt. Im Gegensatz zu bisherigen Verfahren liefert es automatisch neuartige Maße der intermodalen Konsistenz für verschiedene Entitätstypen (Personen, Orte und Ereignisse) sowie den Gesamtkontext. Das System ist nicht auf vordefinierte Datensätze angewiesen, und kann daher mit der Vielzahl und Diversität von Entitäten und Themen in Nachrichten umgehen. Zur Extrahierung von Entitäten aus dem Text werden geeignete Methoden der natürlichen Sprachverarbeitung eingesetzt. Examplarbilder für diese Entitäten werden automatisch aus dem Internet beschafft. Die vorgeschlagenen Methoden zur Informationsextraktion aus Fotos werden auf die Nachrichten- und heruntergeladenen Exemplarbilder angewendet, um die intermodale Konsistenz von Entitäten zu quantifizieren. Es werden zwei Aufgaben untersucht um die Qualität des vorgeschlagenen Ansatzes in realen Anwendungen zu bewerten. Experimentelle Ergebnisse für die Dokumentverifikation und die Beschaffung von Nachrichten mit geringer (potenzielle Fehlinformation) oder hoher multimodalen Konsistenz zeigen den Nutzen und das Potenzial des Ansatzes zur Unterstützung menschlicher Analysten bei der Untersuchung von Nachrichten.In today’s information age, the World Wide Web and social media are important sources for news and information. Different modalities (in the sense of information encoding) such as photos and text are typically used to communicate news more effectively or to attract attention. Communication scientists, linguists, and semioticians have studied the complex interplay between modalities for decades and investigated, e.g., how their combination can carry additional information or add a new level of meaning. The number of shared concepts or entities (e.g., persons, locations, and events) between photos and text is an important aspect to evaluate the overall message and meaning of an article. Computational models for the quantification of image-text relations can enable many applications. For example, they allow for more efficient exploration of news, facilitate semantic search and multimedia retrieval in large (web) archives, or assist human assessors in evaluating news for credibility. To date, only a few approaches have been suggested that quantify relations between photos and text. However, they either do not explicitly consider the cross-modal relations of entities – which are important in the news – or rely on supervised deep learning approaches that can only detect the cross-modal presence of entities covered in the labeled training data. To address this research gap, this thesis proposes an unsupervised approach that can quantify entity consistency between photos and text in multimodal real-world news articles. The first part of this thesis presents novel approaches based on deep learning for information extraction from photos to recognize events, locations, dates, and persons. These approaches are an important prerequisite to measure the cross-modal presence of entities in text and photos. First, an ontology-driven event classification approach that leverages new loss functions and weighting schemes is presented. It is trained on a novel dataset of 570,540 photos and an ontology with 148 event types. The proposed system outperforms approaches that do not use structured ontology information. Second, a novel deep learning approach for geolocation estimation is proposed that uses additional contextual information on the environmental setting (indoor, urban, natural) and from earth partitions of different granularity. The proposed solution outperforms state-of-the-art approaches, which are trained with significantly more photos. Third, we introduce the first large-scale dataset for date estimation with more than one million photos taken between 1930 and 1999, along with two deep learning approaches that treat date estimation as a classification and regression problem. Both approaches achieve very good results that are superior to human annotations. Finally, a novel approach is presented that identifies public persons and their co-occurrences in news photos extracted from the Internet Archive, which collects time-versioned snapshots of web pages that are rarely enriched with metadata relevant to multimedia retrieval. Experimental results confirm the effectiveness of the deep learning approach for person identification. The second part of this thesis introduces an unsupervised approach capable of quantifying image-text relations in real-world news. Unlike related work, the proposed solution automatically provides novel measures of cross-modal consistency for different entity types (persons, locations, and events) as well as the overall context. The approach does not rely on any predefined datasets to cope with the large amount and diversity of entities and topics covered in the news. State-of-the-art tools for natural language processing are applied to extract named entities from the text. Example photos for these entities are automatically crawled from the Web. The proposed methods for information extraction from photos are applied to both news images and example photos to quantify the cross-modal consistency of entities. Two tasks are introduced to assess the quality of the proposed approach in real-world applications. Experimental results for document verification and retrieval of news with either low (potential misinformation) or high cross-modal similarities demonstrate the feasibility of the approach and its potential to support human assessors to study news
    corecore