40 research outputs found

    GrOVe: Ownership Verification of Graph Neural Networks using Embeddings

    Full text link
    Graph neural networks (GNNs) have emerged as a state-of-the-art approach to model and draw inferences from large scale graph-structured data in various application settings such as social networking. The primary goal of a GNN is to learn an embedding for each graph node in a dataset that encodes both the node features and the local graph structure around the node. Embeddings generated by a GNN for a graph node are unique to that GNN. Prior work has shown that GNNs are prone to model extraction attacks. Model extraction attacks and defenses have been explored extensively in other non-graph settings. While detecting or preventing model extraction appears to be difficult, deterring them via effective ownership verification techniques offer a potential defense. In non-graph settings, fingerprinting models, or the data used to build them, have shown to be a promising approach toward ownership verification. We present GrOVe, a state-of-the-art GNN model fingerprinting scheme that, given a target model and a suspect model, can reliably determine if the suspect model was trained independently of the target model or if it is a surrogate of the target model obtained via model extraction. We show that GrOVe can distinguish between surrogate and independent models even when the independent model uses the same training dataset and architecture as the original target model. Using six benchmark datasets and three model architectures, we show that consistently achieves low false-positive and false-negative rates. We demonstrate that is robust against known fingerprint evasion techniques while remaining computationally efficient.Comment: 11 pages, 5 figure

    Protecting the Intellectual Property of Diffusion Models by the Watermark Diffusion Process

    Full text link
    Diffusion models have emerged as state-of-the-art deep generative architectures with the increasing demands for generation tasks. Training large diffusion models for good performance requires high resource costs, making them valuable intellectual properties to protect. While most of the existing ownership solutions, including watermarking, mainly focus on discriminative models. This paper proposes WDM, a novel watermarking method for diffusion models, including watermark embedding, extraction, and verification. WDM embeds the watermark data through training or fine-tuning the diffusion model to learn a Watermark Diffusion Process (WDP), different from the standard diffusion process for the task data. The embedded watermark can be extracted by sampling using the shared reverse noise from the learned WDP without degrading performance on the original task. We also provide theoretical foundations and analysis of the proposed method by connecting the WDP to the diffusion process with a modified Gaussian kernel. Extensive experiments are conducted to demonstrate its effectiveness and robustness against various attacks

    Personality-aware Human-centric Multimodal Reasoning: A New Task

    Full text link
    Multimodal reasoning, an area of artificial intelligence that aims at make inferences from multimodal signals such as vision, language and speech, has drawn more and more attention in recent years. People with different personalities may respond differently to the same situation. However, such individual personalities were ignored in the previous studies. In this work, we introduce a new Personality-aware Human-centric Multimodal Reasoning (Personality-aware HMR) task, and accordingly construct a new dataset based on The Big Bang Theory television shows, to predict the behavior of a specific person at a specific moment, given the multimodal information of its past and future moments. The Myers-Briggs Type Indicator (MBTI) was annotated and utilized in the task to represent individuals' personalities. We benchmark the task by proposing three baseline methods, two were adapted from the related tasks and one was newly proposed for our task. The experimental results demonstrate that personality can effectively improve the performance of human-centric multimodal reasoning. To further solve the lack of personality annotation in real-life scenes, we introduce an extended task called Personality-predicted HMR, and propose the corresponding methods, to predict the MBTI personality at first, and then use the predicted personality to help multimodal reasoning. The experimental results show that our method can accurately predict personality and achieves satisfactory multimodal reasoning performance without relying on personality annotations

    Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks

    Full text link
    The commercial use of Machine Learning (ML) is spreading; at the same time, ML models are becoming more complex and more expensive to train, which makes Intellectual Property Protection (IPP) of trained models a pressing issue. Unlike other domains that can build on a solid understanding of the threats, attacks and defenses available to protect their IP, the ML-related research in this regard is still very fragmented. This is also due to a missing unified view as well as a common taxonomy of these aspects. In this paper, we systematize our findings on IPP in ML, while focusing on threats and attacks identified and defenses proposed at the time of writing. We develop a comprehensive threat model for IP in ML, categorizing attacks and defenses within a unified and consolidated taxonomy, thus bridging research from both the ML and security communities

    Gesture retrieval and its application to the study of multimodal communication

    Full text link
    Comprehending communication is dependent on analyzing the different modalities of conversation, including audio, visual, and others. This is a natural process for humans, but in digital libraries, where preservation and dissemination of digital information are crucial, it is a complex task. A rich conversational model, encompassing all modalities and their co-occurrences, is required to effectively analyze and interact with digital information. Currently, the analysis of co-speech gestures in videos is done through manual annotation by linguistic experts based on textual searches. However, this approach is limited and does not fully utilize the visual modality of gestures. This paper proposes a visual gesture retrieval method using a deep learning architecture to extend current research in this area. The method is based on body keypoints and uses an attention mechanism to focus on specific groups. Experiments were conducted on a subset of the NewsScape dataset, which presents challenges such as multiple people, camera perspective changes, and occlusions. A user study was conducted to assess the usability of the results, establishing a baseline for future gesture retrieval methods in real-world video collections. The results of the experiment demonstrate the high potential of the proposed method in multimodal communication research and highlight the significance of visual gesture retrieval in enhancing interaction with video content. The integration of visual similarity search for gestures in the open-source multimedia retrieval stack, vitrivr, can greatly contribute to the field of computational linguistics. This research advances the understanding of the role of the visual modality in co-speech gestures and highlights the need for further development in this area

    Backdoor Attacks and Defences on Deep Neural Networks

    Get PDF
    Nowadays, due to the huge amount of resources required for network training, pre-trained models are commonly exploited in all kinds of deep learning tasks, like image classification, natural language processing, etc. These models are directly deployed in the real environments, or only fine-tuned on a limited set of data that are collected, for instance, from the Internet. However, a natural question arises: can we trust pre-trained models or the data downloaded from the Internet? The answer is ‘No’. An attacker can easily perform a so-called backdoor attack to hide a backdoor into a pre-trained model by poisoning the dataset used for training or indirectly releasing some poisoned data on the Internet as a bait. Such an attack is stealthy since the hidden backdoor does not affect the behaviour of the network in normal operating conditions, and the malicious behaviour being activated only when a triggering signal is presented at the network input. In this thesis, we present a general framework for backdoor attacks and defences, and overview the state-of-the-art backdoor attacks and the corresponding defences in the field image classification, by casting them in the introduced framework. By focusing on the face recognition domain, two new backdoor attacks were proposed, effective under different threat models. Finally, we design a universal method to defend against backdoor attacks, regardless of the specific attack setting, namely the poisoning strategy and the triggering signal

    Spatially Localised Immersive Contemporary and Historic Photo Presentation on Mobile Devices in Augmented Reality

    Full text link
    These days, taking a photo is the most common way of capturing a moment. Some of these photos captured in the moment are never to be seen again. Others are almost immediately shared with the world. Yet, the context of the captured moment can only be shared to a limited extent. The continuous improvement of mobile devices has not only led to higher resolution cameras and, thus, visually more appealing pictures but also to a broader and more precise range of accompanying sensor metadata. Positional and bearing information can provide context for photos and is thus an integral aspect of the captured moment. However, it is commonly only used to sort photos by time and possibly group by place. Such more precise sensor metadata, combined with the increased computing power of mobile devices, can enable more and more powerful Augmented Reality (AR) capabilities, especially for communicating the context of a captured photo. Users can thereby witness the captured moment in its real location and also experience its spatial contextualization. With the help of a suitable data augmentation, such context-preserving presentation can be extended even to non-digitally born content, including historical images. This offers new immersive ways to experience the cultural history of one's current location. In this paper, we present an approach for location-based image presentation in AR on mobile devices. With this approach, users can experience captured moments in their physical context. We demonstrate the power of this approach based on a prototype implementation and evaluate it in a user study

    Automatic understanding of multimodal content for Web-based learning

    Get PDF
    Web-based learning has become an integral part of everyday life for all ages and backgrounds. On the one hand, the advantages of this learning type, such as availability, accessibility, flexibility, and cost, are apparent. On the other hand, the oversupply of content can lead to learners struggling to find optimal resources efficiently. The interdisciplinary research field Search as Learning is concerned with the analysis and improvement of Web-based learning processes, both on the learner and the computer science side. So far, automatic approaches that assess and recommend learning resources in Search as Learning (SAL) focus on textual, resource, and behavioral features. However, these approaches commonly ignore multimodal aspects. This work addresses this research gap by proposing several approaches that address the question of how multimodal retrieval methods can help support learning on the Web. First, we evaluate whether textual metadata of the TIB AV-Portal can be exploited and enriched by semantic word embeddings to generate video recommendations and, in addition, a video summarization technique to improve exploratory search. Then we turn to the challenging task of knowledge gain prediction that estimates the potential learning success given a specific learning resource. We used data from two user studies for our approaches. The first one observes the knowledge gain when learning with videos in a Massive Open Online Course (MOOC) setting, while the second one provides an informal Web-based learning setting where the subjects have unrestricted access to the Internet. We then extend the purely textual features to include visual, audio, and cross-modal features for a holistic representation of learning resources. By correlating these features with the achieved knowledge gain, we can estimate the impact of a particular learning resource on learning success. We further investigate the influence of multimodal data on the learning process by examining how the combination of visual and textual content generally conveys information. For this purpose, we draw on work from linguistics and visual communications, which investigated the relationship between image and text by means of different metrics and categorizations for several decades. We concretize these metrics to enable their compatibility for machine learning purposes. This process includes the derivation of semantic image-text classes from these metrics. We evaluate all proposals with comprehensive experiments and discuss their impacts and limitations at the end of the thesis.Web-basiertes Lernen ist ein fester Bestandteil des Alltags aller Alters- und Bevölkerungsschichten geworden. Einerseits liegen die Vorteile dieser Art des Lernens wie VerfĂŒgbarkeit, ZugĂ€nglichkeit, FlexibilitĂ€t oder Kosten auf der Hand. Andererseits kann das Überangebot an Inhalten auch dazu fĂŒhren, dass Lernende nicht in der Lage sind optimale Ressourcen effizient zu finden. Das interdisziplinĂ€re Forschungsfeld Search as Learning beschĂ€ftigt sich mit der Analyse und Verbesserung von Web-basierten Lernprozessen. Bisher sind automatische AnsĂ€tze bei der Bewertung und Empfehlung von Lernressourcen fokussiert auf monomodale Merkmale, wie Text oder Dokumentstruktur. Die multimodale Betrachtung ist hingegen noch nicht ausreichend erforscht. Daher befasst sich diese Arbeit mit der Frage wie Methoden des Multimedia Retrievals dazu beitragen können das Lernen im Web zu unterstĂŒtzen. ZunĂ€chst wird evaluiert, ob textuelle Metadaten des TIB AV-Portals genutzt werden können um in Verbindung mit semantischen Worteinbettungen einerseits Videoempfehlungen zu generieren und andererseits Visualisierungen zur Inhaltszusammenfassung von Videos abzuleiten. Anschließend wenden wir uns der anspruchsvollen Aufgabe der Vorhersage des Wissenszuwachses zu, die den potenziellen Lernerfolg einer Lernressource schĂ€tzt. Wir haben fĂŒr unsere AnsĂ€tze Daten aus zwei Nutzerstudien verwendet. In der ersten wird der Wissenszuwachs beim Lernen mit Videos in einem MOOC-Setting beobachtet, wĂ€hrend die zweite eine informelle web-basierte Lernumgebung bietet, in der die Probanden uneingeschrĂ€nkten Internetzugang haben. Anschließend erweitern wir die rein textuellen Merkmale um visuelle, akustische und cross-modale Merkmale fĂŒr eine ganzheitliche Darstellung der Lernressourcen. Durch die Korrelation dieser Merkmale mit dem erzielten Wissenszuwachs können wir den Einfluss einer Lernressource auf den Lernerfolg vorhersagen. Weiterhin untersuchen wir wie verschiedene Kombinationen von visuellen und textuellen Inhalten Informationen generell vermitteln. Dazu greifen wir auf Arbeiten aus der Linguistik und der visuellen Kommunikation zurĂŒck, die seit mehreren Jahrzehnten die Beziehung zwischen Bild und Text untersucht haben. Wir konkretisieren vorhandene Metriken, um ihre Verwendung fĂŒr maschinelles Lernen zu ermöglichen. Dieser Prozess beinhaltet die Ableitung semantischer Bild-Text-Klassen. Wir evaluieren alle AnsĂ€tze mit umfangreichen Experimenten und diskutieren ihre Auswirkungen und Limitierungen am Ende der Arbeit

    The Emerald handbook of research management and administration around the world

    Get PDF
    Over past decades, scholars and practitioners around the world observed an emergence of professionals, research managers and administrators (RMAs) who play an essential role in the advancement of academic research. RMAs have extensive knowledge of the research ecosystem, including funding opportunities, proposals, budgeting and pricing, ethics, open research, project management, finance, negotiation, strategy, systems, and assessment. Until now, limited efforts have been made to investigate RMAs in a cross-regional, comparative manner, or to understand the recent surge of the profession in a larger policy context. Addressing this gap, an international group of experts share diverse perspectives to provide a comprehensive account of RMA as a profession, offer an analytical framework to understand their role in higher education and academic science. Covering countries in Africa, Australasia, East Asia and India, Western Europe, Central and Eastern Europe, the Middle East, North America, and South America, the work provides trans-cultural coverage of the profession. Drawing on theories from related fields, it also provides insights and understanding of RMAs as a social phenomenon. The Emerald Handbook of Research Management and Administration Around the World is the most comprehensive book about practitioners working in research management and administration. The book provides basic knowledge for students and professionals considering a career in this field, and serves as reference material for policymakers as well as academic researchers. By presenting evidence-based observations from around the world and discussing global trends, this text promotes social awareness of RMAs, shares state-of-the-art knowledge on the profession, and offers insights into the future of academic research
    corecore