2,333 research outputs found
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
This technical report presents AutoGen, a new framework that enables
development of LLM applications using multiple agents that can converse with
each other to solve tasks. AutoGen agents are customizable, conversable, and
seamlessly allow human participation. They can operate in various modes that
employ combinations of LLMs, human inputs, and tools. AutoGen's design offers
multiple advantages: a) it gracefully navigates the strong but imperfect
generation and reasoning abilities of these LLMs; b) it leverages human
understanding and intelligence, while providing valuable automation through
conversations between agents; c) it simplifies and unifies the implementation
of complex LLM workflows as automated agent chats. We provide many diverse
examples of how developers can easily use AutoGen to effectively solve tasks or
build applications, ranging from coding, mathematics, operations research,
entertainment, online decision-making, question answering, etc.Comment: 28 page
Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks
Malware still constitutes a major threat in the cybersecurity landscape, also
due to the widespread use of infection vectors such as documents. These
infection vectors hide embedded malicious code to the victim users,
facilitating the use of social engineering techniques to infect their machines.
Research showed that machine-learning algorithms provide effective detection
mechanisms against such threats, but the existence of an arms race in
adversarial settings has recently challenged such systems. In this work, we
focus on malware embedded in PDF files as a representative case of such an arms
race. We start by providing a comprehensive taxonomy of the different
approaches used to generate PDF malware, and of the corresponding
learning-based detection systems. We then categorize threats specifically
targeted against learning-based PDF malware detectors, using a well-established
framework in the field of adversarial machine learning. This framework allows
us to categorize known vulnerabilities of learning-based PDF malware detectors
and to identify novel attacks that may threaten such systems, along with the
potential defense mechanisms that can mitigate the impact of such threats. We
conclude the paper by discussing how such findings highlight promising research
directions towards tackling the more general challenge of designing robust
malware detectors in adversarial settings
Deep Adversarial Frameworks for Visually Explainable Periocular Recognition
Machine Learning (ML) models have pushed stateÂofÂtheÂart performance closer to (and
even beyond) human level. However, the core of such algorithms is usually latent and
hardly understandable. Thus, the field of Explainability focuses on researching and adopting techniques that can explain the reasons that support a model’s predictions. Such explanations of the decisionÂmaking process would help to build trust between said model
and the human(s) using it. An explainable system also allows for better debugging, during
the training phase, and fixing, upon deployment. But why should a developer devote time
and effort into refactoring or rethinking Artificial Intelligence (AI) systems, to make them
more transparent? Don’t they work just fine?
Despite the temptation to answer ”yes”, are we really considering the cases where these
systems fail? Are we assuming that ”almost perfect” accuracy is good enough? What if,
some of the cases where these systems get it right, were just a small margin away from
a complete miss? Does that even matter? Considering the everÂgrowing presence of ML
models in crucial areas like forensics, security and healthcare services, it clearly does.
Motivating these concerns is the fact that powerful systems often operate as blackÂboxes,
hiding the core reasoning underneath layers of abstraction [Gue]. In this scenario, there
could be some seriously negative outcomes if opaque algorithms gamble on the presence
of tumours in XÂray images or the way autonomous vehicles behave in traffic.
It becomes clear, then, that incorporating explainability with AI is imperative. More recently, the politicians have addressed this urgency through the General Data Protection
Regulation (GDPR) [Com18]. With this document, the European Union (EU) brings forward several important concepts, amongst which, the ”right to an explanation”. The definition and scope are still subject to debate [MF17], but these are definite strides to formally
regulate the explainable depth of autonomous systems.
Based on the preface above, this work describes a periocular recognition framework that
not only performs biometric recognition but also provides clear representations of the features/regions that support a prediction. Being particularly designed to explain nonÂmatch
(”impostors”) decisions, our solution uses adversarial generative techniques to synthesise
a large set of ”genuine” image pairs, from where the most similar elements with respect to
a query are retrieved. Then, assuming the alignment between the query/retrieved pairs,
the elementÂwise differences between the query and a weighted average of the retrieved
elements yields a visual explanation of the regions in the query pair that would have to
be different to transform it into a ”genuine” pair. Our quantitative and qualitative experiments validate the proposed solution, yielding recognition rates that are similar to the
stateÂofÂtheÂart, while adding visually pleasing explanations
Machine Learning for Synthetic Data Generation: A Review
Data plays a crucial role in machine learning. However, in real-world
applications, there are several problems with data, e.g., data are of low
quality; a limited number of data points lead to under-fitting of the machine
learning model; it is hard to access the data due to privacy, safety and
regulatory concerns. Synthetic data generation offers a promising new avenue,
as it can be shared and used in ways that real-world data cannot. This paper
systematically reviews the existing works that leverage machine learning models
for synthetic data generation. Specifically, we discuss the synthetic data
generation works from several perspectives: (i) applications, including
computer vision, speech, natural language, healthcare, and business; (ii)
machine learning methods, particularly neural network architectures and deep
generative models; (iii) privacy and fairness issue. In addition, we identify
the challenges and opportunities in this emerging field and suggest future
research directions
Location reliability and gamification mechanisms for mobile crowd sensing
People-centric sensing with smart phones can be used for large scale sensing of the physical world by leveraging the sensors on the phones. This new type of sensing can be a scalable and cost-effective alternative to deploying static wireless sensor networks for dense sensing coverage across large areas. However, mobile people-centric sensing has two main issues: 1) Data reliability in sensed data and 2) Incentives for participants. To study these issues, this dissertation designs and develops McSense, a mobile crowd sensing system which provides monetary and social incentives to users.
This dissertation proposes and evaluates two protocols for location reliability as a step toward achieving data reliability in sensed data, namely, ILR (Improving Location Reliability) and LINK (Location authentication through Immediate Neighbors Knowledge). ILR is a scheme which improves the location reliability of mobile crowd sensed data with minimal human efforts based on location validation using photo tasks and expanding the trust to nearby data points using periodic Bluetooth scanning. LINK is a location authentication protocol working independent of wireless carriers, in which nearby users help authenticate each other’s location claims using Bluetooth communication. The results of experiments done on Android phones show that the proposed protocols are capable of detecting a significant percentage of the malicious users claiming false location. Furthermore, simulations with the LINK protocol demonstrate that LINK can effectively thwart a number of colluding user attacks.
This dissertation also proposes a mobile sensing game which helps collect crowd sensing data by incentivizing smart phone users to play sensing games on their phones. We design and implement a first person shooter sensing game, “Alien vs. Mobile User”, which employs techniques to attract users to unpopular regions. The user study results show that mobile gaming can be a successful alternative to micro-payments for fast and efficient area coverage in crowd sensing. It is observed that the proposed game design succeeds in achieving good player engagement
Privacy Intelligence: A Survey on Image Sharing on Online Social Networks
Image sharing on online social networks (OSNs) has become an indispensable
part of daily social activities, but it has also led to an increased risk of
privacy invasion. The recent image leaks from popular OSN services and the
abuse of personal photos using advanced algorithms (e.g. DeepFake) have
prompted the public to rethink individual privacy needs when sharing images on
OSNs. However, OSN image sharing itself is relatively complicated, and systems
currently in place to manage privacy in practice are labor-intensive yet fail
to provide personalized, accurate and flexible privacy protection. As a result,
an more intelligent environment for privacy-friendly OSN image sharing is in
demand. To fill the gap, we contribute a systematic survey of 'privacy
intelligence' solutions that target modern privacy issues related to OSN image
sharing. Specifically, we present a high-level analysis framework based on the
entire lifecycle of OSN image sharing to address the various privacy issues and
solutions facing this interdisciplinary field. The framework is divided into
three main stages: local management, online management and social experience.
At each stage, we identify typical sharing-related user behaviors, the privacy
issues generated by those behaviors, and review representative intelligent
solutions. The resulting analysis describes an intelligent privacy-enhancing
chain for closed-loop privacy management. We also discuss the challenges and
future directions existing at each stage, as well as in publicly available
datasets.Comment: 32 pages, 9 figures. Under revie
- …