1,963 research outputs found
Recommended from our members
MapReduce based RDF assisted distributed SVM for high throughput spam filtering
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses.
Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart.
Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure.
The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin
A survey on cost-effective context-aware distribution of social data streams over energy-efficient data centres
Social media have emerged in the last decade as a viable and ubiquitous means of communication. The ease of user content generation within these platforms, e.g. check-in information, multimedia data, etc., along with the proliferation of Global Positioning System (GPS)-enabled, always-connected capture devices lead to data streams of unprecedented amount and a radical change in information sharing. Social data streams raise a variety of practical challenges, including derivation of real-time meaningful insights from effectively gathered social information, as well as a paradigm shift for content distribution with the leverage of contextual data associated with user preferences, geographical characteristics and devices in general. In this article we present a comprehensive survey that outlines the state-of-the-art situation and organizes challenges concerning social media streams and the infrastructure of the data centres supporting the efficient access to data streams in terms of content distribution, data diffusion, data replication, energy efficiency and network infrastructure. We systematize the existing literature and proceed to identify and analyse the main research points and industrial efforts in the area as far as modelling, simulation and performance evaluation are concerned
Methods for improving resilience in communication networks and P2P overlays
Resilience to failures and deliberate attacks is becoming an essential requirement in most communication networks today. This also applies to P2P Overlays which on the one hand are created on top of communication infrastructures, and therefore are equally affected by failures of the underlying infrastructure, but which on the other hand introduce new possibilities like the creation of arbitrary links within the overlay.
In this article, we present a survey of strategies to improve resilience in communication networks as well as in P2P overlay networks. Furthermore, our intention is to point out differences and similarities in the resilience-enhancing measures for both types of networks.
By revising some basic concepts from graph theory, we show that many concepts for communication networks are based on well-known graph-theoretical problems. Especially, some methods for the construction of protection paths in advance of a failure are based on very hard problems, indeed many of them are in NP and can only be solved heuristically or on certain topologies.
P2P overlay networks evidently benefit from resilience-enhancing strategies in the underlying communication infrastructure, but beyond that, their specific properties pose the need for more sophisticated mechanisms. The dynamic nature of peers requires to take some precautions, like estimating the reliability of peers, redundantly storing information, and provisioning a reliable routing
Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations
In the growing world of artificial intelligence, federated learning is a
distributed learning framework enhanced to preserve the privacy of individuals'
data. Federated learning lays the groundwork for collaborative research in
areas where the data is sensitive. Federated learning has several implications
for real-world problems. In times of crisis, when real-time decision-making is
critical, federated learning allows multiple entities to work collectively
without sharing sensitive data. This distributed approach enables us to
leverage information from multiple sources and gain more diverse insights. This
paper is a systematic review of the literature on privacy-preserving machine
learning in the last few years based on the Preferred Reporting Items for
Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have
presented an extensive review of supervised/unsupervised machine learning
algorithms, ensemble methods, meta-heuristic approaches, blockchain technology,
and reinforcement learning used in the framework of federated learning, in
addition to an overview of federated learning applications. This paper reviews
the literature on the components of federated learning and its applications in
the last few years. The main purpose of this work is to provide researchers and
practitioners with a comprehensive overview of federated learning from the
machine learning point of view. A discussion of some open problems and future
research directions in federated learning is also provided
Toxicity in the Decentralized Web and the Potential for Model Sharing
The "Decentralised Web" (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is challenging. This is because there is no central entity that can define toxicity, nor a large central pool of data that can be used to build universal classifiers. It is therefore unsurprising that there have been several high-profile cases of the DW being misused to coordinate and disseminate harmful material. Using a dataset of 9.9M posts from 117K users on Pleroma (a popular DW microblogging service), we quantify the presence of toxic content. We find that toxic content is prevalent and spreads rapidly between instances. We show that automating per-instance content moderation is challenging due to the lack of sufficient training data available and the effort required in labelling. We therefore propose and evaluate ModPair, a model sharing system that effectively detects toxic content, gaining an average per-instance macro-F1 score 0.89
Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges
[EN] If last decade viewed computational services as a utility then surely
this decade has transformed computation into a commodity. Computation
is now progressively integrated into the physical networks in
a seamless way that enables cyber-physical systems (CPS) and the
Internet of Things (IoT) meet their latency requirements. Similar to
the concept of ¿platform as a service¿ or ¿software as a service¿, both
cloudlets and fog computing have found their own use cases. Edge
devices (that we call end or user devices for disambiguation) play the
role of personal computers, dedicated to a user and to a set of correlated
applications. In this new scenario, the boundaries between
the network node, the sensor, and the actuator are blurring, driven
primarily by the computation power of IoT nodes like single board
computers and the smartphones. The bigger data generated in this
type of networks needs clever, scalable, and possibly decentralized
computing solutions that can scale independently as required. Any
node can be seen as part of a graph, with the capacity to serve as a
computing or network router node, or both. Complex applications can
possibly be distributed over this graph or network of nodes to improve
the overall performance like the amount of data processed over time.
In this paper, we identify this new computing paradigm that we call
Social Dispersed Computing, analyzing key themes in it that includes
a new outlook on its relation to agent based applications. We architect
this new paradigm by providing supportive application examples that
include next generation electrical energy distribution networks, next
generation mobility services for transportation, and applications for
distributed analysis and identification of non-recurring traffic congestion
in cities. The paper analyzes the existing computing paradigms
(e.g., cloud, fog, edge, mobile edge, social, etc.), solving the ambiguity
of their definitions; and analyzes and discusses the relevant foundational
software technologies, the remaining challenges, and research
opportunities.Garcia Valls, MS.; Dubey, A.; Botti, V. (2018). Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges. Journal of Systems Architecture. 91:83-102. https://doi.org/10.1016/j.sysarc.2018.05.007S831029
- …