25,660 research outputs found
Knowledge is at the Edge! How to Search in Distributed Machine Learning Models
With the advent of the Internet of Things and Industry 4.0 an enormous amount
of data is produced at the edge of the network. Due to a lack of computing
power, this data is currently send to the cloud where centralized machine
learning models are trained to derive higher level knowledge. With the recent
development of specialized machine learning hardware for mobile devices, a new
era of distributed learning is about to begin that raises a new research
question: How can we search in distributed machine learning models? Machine
learning at the edge of the network has many benefits, such as low-latency
inference and increased privacy. Such distributed machine learning models can
also learn personalized for a human user, a specific context, or application
scenario. As training data stays on the devices, control over possibly
sensitive data is preserved as it is not shared with a third party. This new
form of distributed learning leads to the partitioning of knowledge between
many devices which makes access difficult. In this paper we tackle the problem
of finding specific knowledge by forwarding a search request (query) to a
device that can answer it best. To that end, we use a entropy based quality
metric that takes the context of a query and the learning quality of a device
into account. We show that our forwarding strategy can achieve over 95%
accuracy in a urban mobility scenario where we use data from 30 000 people
commuting in the city of Trento, Italy.Comment: Published in CoopIS 201
"May I borrow Your Filter?" Exchanging Filters to Combat Spam in a Community
Leveraging social networks in computer systems can be effective in dealing with a number of trust and security issues. Spam is one such issue where the "wisdom of crowds" can be harnessed by mining the collective knowledge of ordinary individuals. In this paper, we present a mechanism through which members of a virtual community can exchange information to combat spam. Previous attempts at collaborative spam filtering have concentrated on digest-based indexing techniques to share digests or fingerprints of emails that are known to be spam. We take a different approach and allow users to share their spam filters instead, thus dramatically reducing the amount of traffic generated in the network. The resultant diversity in the filters and cooperation in a community allows it to respond to spam in an autonomic fashion. As a test case for exchanging filters we use the popular SpamAssassin spam filtering software and show that exchanging spam filters provides an alternative method to improve spam filtering performance
Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment
Learning from the collective knowledge of data dispersed across private
sources can provide neural networks with enhanced generalization capabilities.
Federated learning, a method for collaboratively training a machine learning
model across remote clients, achieves this by combining client models via the
orchestration of a central server. However, current approaches face two
critical limitations: i) they struggle to converge when client domains are
sufficiently different, and ii) current aggregation techniques produce an
identical global model for each client. In this work, we address these issues
by reformulating the typical federated learning setup: rather than learning a
single global model, we learn N models each optimized for a common objective.
To achieve this, we apply a weighted distance minimization to model parameters
shared in a peer-to-peer topology. The resulting framework, Iterative Parameter
Alignment, applies naturally to the cross-silo setting, and has the following
properties: (i) a unique solution for each participant, with the option to
globally converge each model in the federation, and (ii) an optional
early-stopping mechanism to elicit fairness among peers in collaborative
learning settings. These characteristics jointly provide a flexible new
framework for iteratively learning from peer models trained on disparate
datasets. We find that the technique achieves competitive results on a variety
of data partitions compared to state-of-the-art approaches. Further, we show
that the method is robust to divergent domains (i.e. disjoint classes across
peers) where existing approaches struggle.Comment: Published at IEEE Big Data 202
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
On the Activity Privacy of Blockchain for IoT
Security is one of the fundamental challenges in the Internet of Things (IoT)
due to the heterogeneity and resource constraints of the IoT devices. Device
classification methods are employed to enhance the security of IoT by detecting
unregistered devices or traffic patterns. In recent years, blockchain has
received tremendous attention as a distributed trustless platform to enhance
the security of IoT. Conventional device identification methods are not
directly applicable in blockchain-based IoT as network layer packets are not
stored in the blockchain. Moreover, the transactions are broadcast and thus
have no destination IP address and contain a public key as the user identity,
and are stored permanently in blockchain which can be read by any entity in the
network. We show that device identification in blockchain introduces privacy
risks as the malicious nodes can identify users' activity pattern by analyzing
the temporal pattern of their transactions in the blockchain. We study the
likelihood of classifying IoT devices by analyzing their information stored in
the blockchain, which to the best of our knowledge, is the first work of its
kind. We use a smart home as a representative IoT scenario. First, a blockchain
is populated according to a real-world smart home traffic dataset. We then
apply machine learning algorithms on the data stored in the blockchain to
analyze the success rate of device classification, modeling both an informed
and a blind attacker. Our results demonstrate success rates over 90\% in
classifying devices. We propose three timestamp obfuscation methods, namely
combining multiple packets into a single transaction, merging ledgers of
multiple devices, and randomly delaying transactions, to reduce the success
rate in classifying devices. The proposed timestamp obfuscation methods can
reduce the classification success rates to as low as 20%
Heterogeneous Federated Learning: State-of-the-art and Research Challenges
Federated learning (FL) has drawn increasing attention owing to its potential
use in large-scale industrial applications. Existing federated learning works
mainly focus on model homogeneous settings. However, practical federated
learning typically faces the heterogeneity of data distributions, model
architectures, network environments, and hardware devices among participant
clients. Heterogeneous Federated Learning (HFL) is much more challenging, and
corresponding solutions are diverse and complex. Therefore, a systematic survey
on this topic about the research challenges and state-of-the-art is essential.
In this survey, we firstly summarize the various research challenges in HFL
from five aspects: statistical heterogeneity, model heterogeneity,
communication heterogeneity, device heterogeneity, and additional challenges.
In addition, recent advances in HFL are reviewed and a new taxonomy of existing
HFL methods is proposed with an in-depth analysis of their pros and cons. We
classify existing methods from three different levels according to the HFL
procedure: data-level, model-level, and server-level. Finally, several critical
and promising future research directions in HFL are discussed, which may
facilitate further developments in this field. A periodically updated
collection on HFL is available at https://github.com/marswhu/HFL_Survey.Comment: 42 pages, 11 figures, and 4 table
- …