40 research outputs found
GrOVe: Ownership Verification of Graph Neural Networks using Embeddings
Graph neural networks (GNNs) have emerged as a state-of-the-art approach to
model and draw inferences from large scale graph-structured data in various
application settings such as social networking. The primary goal of a GNN is to
learn an embedding for each graph node in a dataset that encodes both the node
features and the local graph structure around the node. Embeddings generated by
a GNN for a graph node are unique to that GNN. Prior work has shown that GNNs
are prone to model extraction attacks. Model extraction attacks and defenses
have been explored extensively in other non-graph settings. While detecting or
preventing model extraction appears to be difficult, deterring them via
effective ownership verification techniques offer a potential defense. In
non-graph settings, fingerprinting models, or the data used to build them, have
shown to be a promising approach toward ownership verification. We present
GrOVe, a state-of-the-art GNN model fingerprinting scheme that, given a target
model and a suspect model, can reliably determine if the suspect model was
trained independently of the target model or if it is a surrogate of the target
model obtained via model extraction. We show that GrOVe can distinguish between
surrogate and independent models even when the independent model uses the same
training dataset and architecture as the original target model. Using six
benchmark datasets and three model architectures, we show that consistently
achieves low false-positive and false-negative rates. We demonstrate that is
robust against known fingerprint evasion techniques while remaining
computationally efficient.Comment: 11 pages, 5 figure
Protecting the Intellectual Property of Diffusion Models by the Watermark Diffusion Process
Diffusion models have emerged as state-of-the-art deep generative
architectures with the increasing demands for generation tasks. Training large
diffusion models for good performance requires high resource costs, making them
valuable intellectual properties to protect. While most of the existing
ownership solutions, including watermarking, mainly focus on discriminative
models. This paper proposes WDM, a novel watermarking method for diffusion
models, including watermark embedding, extraction, and verification. WDM embeds
the watermark data through training or fine-tuning the diffusion model to learn
a Watermark Diffusion Process (WDP), different from the standard diffusion
process for the task data. The embedded watermark can be extracted by sampling
using the shared reverse noise from the learned WDP without degrading
performance on the original task. We also provide theoretical foundations and
analysis of the proposed method by connecting the WDP to the diffusion process
with a modified Gaussian kernel. Extensive experiments are conducted to
demonstrate its effectiveness and robustness against various attacks
Personality-aware Human-centric Multimodal Reasoning: A New Task
Multimodal reasoning, an area of artificial intelligence that aims at make
inferences from multimodal signals such as vision, language and speech, has
drawn more and more attention in recent years. People with different
personalities may respond differently to the same situation. However, such
individual personalities were ignored in the previous studies. In this work, we
introduce a new Personality-aware Human-centric Multimodal Reasoning
(Personality-aware HMR) task, and accordingly construct a new dataset based on
The Big Bang Theory television shows, to predict the behavior of a specific
person at a specific moment, given the multimodal information of its past and
future moments. The Myers-Briggs Type Indicator (MBTI) was annotated and
utilized in the task to represent individuals' personalities. We benchmark the
task by proposing three baseline methods, two were adapted from the related
tasks and one was newly proposed for our task. The experimental results
demonstrate that personality can effectively improve the performance of
human-centric multimodal reasoning. To further solve the lack of personality
annotation in real-life scenes, we introduce an extended task called
Personality-predicted HMR, and propose the corresponding methods, to predict
the MBTI personality at first, and then use the predicted personality to help
multimodal reasoning. The experimental results show that our method can
accurately predict personality and achieves satisfactory multimodal reasoning
performance without relying on personality annotations
Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks
The commercial use of Machine Learning (ML) is spreading; at the same time,
ML models are becoming more complex and more expensive to train, which makes
Intellectual Property Protection (IPP) of trained models a pressing issue.
Unlike other domains that can build on a solid understanding of the threats,
attacks and defenses available to protect their IP, the ML-related research in
this regard is still very fragmented. This is also due to a missing unified
view as well as a common taxonomy of these aspects.
In this paper, we systematize our findings on IPP in ML, while focusing on
threats and attacks identified and defenses proposed at the time of writing. We
develop a comprehensive threat model for IP in ML, categorizing attacks and
defenses within a unified and consolidated taxonomy, thus bridging research
from both the ML and security communities
Gesture retrieval and its application to the study of multimodal communication
Comprehending communication is dependent on analyzing the different modalities of conversation, including audio, visual, and others. This is a natural process for humans, but in digital libraries, where preservation and dissemination of digital information are crucial, it is a complex task. A rich conversational model, encompassing all modalities and their co-occurrences, is required to effectively analyze and interact with digital information. Currently, the analysis of co-speech gestures in videos is done through manual annotation by linguistic experts based on textual searches. However, this approach is limited and does not fully utilize the visual modality of gestures. This paper proposes a visual gesture retrieval method using a deep learning architecture to extend current research in this area. The method is based on body keypoints and uses an attention mechanism to focus on specific groups. Experiments were conducted on a subset of the NewsScape dataset, which presents challenges such as multiple people, camera perspective changes, and occlusions. A user study was conducted to assess the usability of the results, establishing a baseline for future gesture retrieval methods in real-world video collections. The results of the experiment demonstrate the high potential of the proposed method in multimodal communication research and highlight the significance of visual gesture retrieval in enhancing interaction with video content. The integration of visual similarity search for gestures in the open-source multimedia retrieval stack, vitrivr, can greatly contribute to the field of computational linguistics. This research advances the understanding of the role of the visual modality in co-speech gestures and highlights the need for further development in this area
Backdoor Attacks and Defences on Deep Neural Networks
Nowadays, due to the huge amount of resources required for network training, pre-trained models are commonly exploited in all kinds of deep learning tasks, like image classification, natural language processing, etc. These models are directly deployed in the real environments, or only fine-tuned on a limited set of data that are collected, for instance, from the Internet. However, a natural question arises: can we trust pre-trained models or the data downloaded from the Internet? The answer is âNoâ. An attacker can easily perform a so-called backdoor attack to hide a backdoor into a pre-trained model by poisoning the dataset used for training or indirectly releasing some poisoned data on the Internet as a bait. Such an attack is stealthy since the hidden backdoor does not affect the behaviour of the network in normal operating conditions, and the malicious behaviour being activated only when a triggering signal is presented at the network input.
In this thesis, we present a general framework for backdoor attacks and defences, and overview the state-of-the-art backdoor attacks and the corresponding defences in the field image classification, by casting them in the introduced framework. By focusing on the face recognition domain, two new backdoor attacks were proposed, effective under different threat models. Finally, we design a universal method to defend against backdoor attacks, regardless of the specific attack setting, namely the poisoning strategy and the triggering signal
Spatially Localised Immersive Contemporary and Historic Photo Presentation on Mobile Devices in Augmented Reality
These days, taking a photo is the most common way of capturing a moment. Some of these photos captured in the moment are never to be seen again. Others are almost immediately shared with the world. Yet, the context of the captured moment can only be shared to a limited extent. The continuous improvement of mobile devices has not only led to higher resolution cameras and, thus, visually more appealing pictures but also to a broader and more precise range of accompanying sensor metadata. Positional and bearing information can provide context for photos and is thus an integral aspect of the captured moment. However, it is commonly only used to sort photos by time and possibly group by place. Such more precise sensor metadata, combined with the increased computing power of mobile devices, can enable more and more powerful Augmented Reality (AR) capabilities, especially for communicating the context of a captured photo. Users can thereby witness the captured moment in its real location and also experience its spatial contextualization. With the help of a suitable data augmentation, such context-preserving presentation can be extended even to non-digitally born content, including historical images. This offers new immersive ways to experience the cultural history of one's current location. In this paper, we present an approach for location-based image presentation in AR on mobile devices. With this approach, users can experience captured moments in their physical context. We demonstrate the power of this approach based on a prototype implementation and evaluate it in a user study
Automatic understanding of multimodal content for Web-based learning
Web-based learning has become an integral part of everyday life for all ages and backgrounds. On the one hand, the advantages of this learning type, such as availability, accessibility, flexibility, and cost, are apparent. On the other hand, the oversupply of content can lead to learners struggling to find optimal resources efficiently. The interdisciplinary research field Search as Learning is concerned with the analysis and improvement of Web-based learning processes, both on the learner and the computer science side.
So far, automatic approaches that assess and recommend learning resources in Search as Learning (SAL) focus on textual, resource, and behavioral features. However, these approaches commonly ignore multimodal aspects. This work addresses this research gap by proposing several approaches that address the question of how multimodal retrieval methods can help support learning on the Web. First, we evaluate whether textual metadata of the TIB AV-Portal can be exploited and enriched by semantic word embeddings to generate video recommendations and, in addition, a video summarization technique to improve exploratory search. Then we turn to the challenging task of knowledge gain prediction that estimates the potential learning success given a specific learning resource. We used data from two user studies for our approaches. The first one observes the knowledge gain when learning with videos in a Massive Open Online Course (MOOC) setting, while the second one provides an informal Web-based learning setting where the subjects have unrestricted access to the Internet. We then extend the purely textual features to include visual, audio, and cross-modal features for a holistic representation of learning resources. By correlating these features with the achieved knowledge gain, we can estimate the impact of a particular learning resource on learning success.
We further investigate the influence of multimodal data on the learning process by examining how the combination of visual and textual content generally conveys information. For this purpose, we draw on work from linguistics and visual communications, which investigated the relationship between image and text by means of different metrics and categorizations for several decades. We concretize these metrics to enable their compatibility for machine learning purposes. This process includes the derivation of semantic image-text classes from these metrics. We evaluate all proposals with comprehensive experiments and discuss their impacts and limitations at the end of the thesis.Web-basiertes Lernen ist ein fester Bestandteil des Alltags aller Alters- und Bevölkerungsschichten geworden. Einerseits liegen die Vorteile dieser Art des Lernens wie VerfĂŒgbarkeit, ZugĂ€nglichkeit, FlexibilitĂ€t oder Kosten auf der Hand. Andererseits kann das Ăberangebot an Inhalten auch dazu fĂŒhren, dass Lernende nicht in der Lage sind optimale Ressourcen effizient zu finden. Das interdisziplinĂ€re Forschungsfeld Search as Learning beschĂ€ftigt sich mit der Analyse und Verbesserung von Web-basierten Lernprozessen.
Bisher sind automatische AnsĂ€tze bei der Bewertung und Empfehlung von Lernressourcen fokussiert auf monomodale Merkmale, wie Text oder Dokumentstruktur. Die multimodale Betrachtung ist hingegen noch nicht ausreichend erforscht. Daher befasst sich diese Arbeit mit der Frage wie Methoden des Multimedia Retrievals dazu beitragen können das Lernen im Web zu unterstĂŒtzen. ZunĂ€chst wird evaluiert, ob textuelle Metadaten des TIB AV-Portals genutzt werden können um in Verbindung mit semantischen Worteinbettungen einerseits Videoempfehlungen zu generieren und andererseits Visualisierungen zur Inhaltszusammenfassung von Videos abzuleiten. AnschlieĂend wenden wir uns der anspruchsvollen Aufgabe der Vorhersage des Wissenszuwachses zu, die den potenziellen Lernerfolg einer Lernressource schĂ€tzt. Wir haben fĂŒr unsere AnsĂ€tze Daten aus zwei Nutzerstudien verwendet. In der ersten wird der Wissenszuwachs beim Lernen mit Videos in einem MOOC-Setting beobachtet, wĂ€hrend die zweite eine informelle web-basierte Lernumgebung bietet, in der die Probanden uneingeschrĂ€nkten Internetzugang haben. AnschlieĂend erweitern wir die rein textuellen Merkmale um visuelle, akustische und cross-modale Merkmale fĂŒr eine ganzheitliche Darstellung der Lernressourcen. Durch die Korrelation dieser Merkmale mit dem erzielten Wissenszuwachs können wir den Einfluss einer Lernressource auf den Lernerfolg vorhersagen.
Weiterhin untersuchen wir wie verschiedene Kombinationen von visuellen und textuellen Inhalten Informationen generell vermitteln. Dazu greifen wir auf Arbeiten aus der Linguistik und der visuellen Kommunikation zurĂŒck, die seit mehreren Jahrzehnten die Beziehung zwischen Bild und Text untersucht haben. Wir konkretisieren vorhandene Metriken, um ihre Verwendung fĂŒr maschinelles Lernen zu ermöglichen. Dieser Prozess beinhaltet die Ableitung semantischer Bild-Text-Klassen. Wir evaluieren alle AnsĂ€tze mit umfangreichen Experimenten und diskutieren ihre Auswirkungen und Limitierungen am Ende der Arbeit
The Emerald handbook of research management and administration around the world
Over past decades, scholars and practitioners around the world observed an emergence of professionals, research managers and administrators (RMAs) who play an essential role in the advancement of academic research. RMAs have extensive knowledge of the research ecosystem, including funding opportunities, proposals, budgeting and pricing, ethics, open research, project management, finance, negotiation, strategy, systems, and assessment. Until now, limited efforts have been made to investigate RMAs in a cross-regional, comparative manner, or to understand the recent surge of the profession in a larger policy context.
Addressing this gap, an international group of experts share diverse perspectives to provide a comprehensive account of RMA as a profession, offer an analytical framework to understand their role in higher education and academic science. Covering countries in Africa, Australasia, East Asia and India, Western Europe, Central and Eastern Europe, the Middle East, North America, and South America, the work provides trans-cultural coverage of the profession. Drawing on theories from related fields, it also provides insights and understanding of RMAs as a social phenomenon.
The Emerald Handbook of Research Management and Administration Around the World is the most comprehensive book about practitioners working in research management and administration. The book provides basic knowledge for students and professionals considering a career in this field, and serves as reference material for policymakers as well as academic researchers. By presenting evidence-based observations from around the world and discussing global trends, this text promotes social awareness of RMAs, shares state-of-the-art knowledge on the profession, and offers insights into the future of academic research