Search CORE

1,963 research outputs found

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

A survey on cost-effective context-aware distribution of social data streams over energy-efficient data centres

Author: Bashroush Rabih
Fernández Cerero Damián
Fernández Montes González Alejandro
Kilanioti Irene
Mettouris Christos
Nejkovic Valentina
Papadopoulos George A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Social media have emerged in the last decade as a viable and ubiquitous means of communication. The ease of user content generation within these platforms, e.g. check-in information, multimedia data, etc., along with the proliferation of Global Positioning System (GPS)-enabled, always-connected capture devices lead to data streams of unprecedented amount and a radical change in information sharing. Social data streams raise a variety of practical challenges, including derivation of real-time meaningful insights from effectively gathered social information, as well as a paradigm shift for content distribution with the leverage of contextual data associated with user preferences, geographical characteristics and devices in general. In this article we present a comprehensive survey that outlines the state-of-the-art situation and organizes challenges concerning social media streams and the infrastructure of the data centres supporting the efficient access to data streams in terms of content distribution, data diffusion, data replication, energy efficiency and network infrastructure. We systematize the existing literature and proceed to identify and analyse the main research points and industrial efforts in the area as far as modelling, simulation and performance evaluation are concerned

idUS. Depósito de Investigación Universidad de Sevilla

Methods for improving resilience in communication networks and P2P overlays

Author: Brinkmeier Michael
Fischer Mathias
Grau Sascha
Schäfer Günter
Strufe Thorsten
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2009
Field of study

Resilience to failures and deliberate attacks is becoming an essential requirement in most communication networks today. This also applies to P2P Overlays which on the one hand are created on top of communication infrastructures, and therefore are equally affected by failures of the underlying infrastructure, but which on the other hand introduce new possibilities like the creation of arbitrary links within the overlay. In this article, we present a survey of strategies to improve resilience in communication networks as well as in P2P overlay networks. Furthermore, our intention is to point out differences and similarities in the resilience-enhancing measures for both types of networks. By revising some basic concepts from graph theory, we show that many concepts for communication networks are based on well-known graph-theoretical problems. Especially, some methods for the construction of protection paths in advance of a failure are based on very hard problems, indeed many of them are in NP and can only be solved heuristically or on certain topologies. P2P overlay networks evidently benefit from resilience-enhancing strategies in the underlying communication infrastructure, but beyond that, their specific properties pose the need for more sophisticated mechanisms. The dynamic nature of peers requires to take some precautions, like estimating the reliability of peers, redundantly storing information, and provisioning a reliable routing

TUbiblio

Digitale Bibliothek Thüringen

Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations

Author: Jafarigol Elaheh
Razzaghi Talayeh
Trafalis Theodore
Zamankhani Mona
Publication venue
Publication date: 17/11/2023
Field of study

In the growing world of artificial intelligence, federated learning is a distributed learning framework enhanced to preserve the privacy of individuals' data. Federated learning lays the groundwork for collaborative research in areas where the data is sensitive. Federated learning has several implications for real-world problems. In times of crisis, when real-time decision-making is critical, federated learning allows multiple entities to work collectively without sharing sensitive data. This distributed approach enables us to leverage information from multiple sources and gain more diverse insights. This paper is a systematic review of the literature on privacy-preserving machine learning in the last few years based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have presented an extensive review of supervised/unsupervised machine learning algorithms, ensemble methods, meta-heuristic approaches, blockchain technology, and reinforcement learning used in the framework of federated learning, in addition to an overview of federated learning applications. This paper reviews the literature on the components of federated learning and its applications in the last few years. The main purpose of this work is to provide researchers and practitioners with a comprehensive overview of federated learning from the machine learning point of view. A discussion of some open problems and future research directions in federated learning is also provided

arXiv.org e-Print Archive

Toxicity in the Decentralized Web and the Potential for Model Sharing

Author: Anaobi Ishaku Hassan
Castro Ignacio
Cristofaro Emiliano De
Raman Aravindh
Sastry Nishanth
Tyson Gareth
Zia Haris Bin
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 06/06/2022
Field of study

The "Decentralised Web" (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is challenging. This is because there is no central entity that can define toxicity, nor a large central pool of data that can be used to build universal classifiers. It is therefore unsurprising that there have been several high-profile cases of the DW being misused to coordinate and disseminate harmful material. Using a dataset of 9.9M posts from 117K users on Pleroma (a popular DW microblogging service), we quantify the presence of toxic content. We find that toxic content is prevalent and spreads rapidly between instances. We show that automating per-instance content moderation is challenging due to the lack of sufficient training data available and the effort required in labelling. We therefore propose and evaluate ModPair, a model sharing system that effectively detects toxic content, gaining an average per-instance macro-F1 score 0.89

UCL Discovery

Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges

Author: Abhishek Dubey
Al Mallah
Alzahrani
Artikis
Azaria
Beck
Beloglazov
Benson
Bergquist
Bessani
Bessani
Blackstock
Boissier
Bonomi
Bormann
Botta
Buysse
Chen
Cho
Choudhary
Chow
Cox
Cugola
Dag
del Val
Denti
Dewri
Dolui
Dubey
Eisele
Gai
García-Fornes
García-Valls
García-Valls
García-Valls
García-Valls
García-Valls
Ghafouri
Ghosh
Hall
Hara
Hewitt
Hindman
Hu
Hunkeler
Jararweh
Kamijo
Kandoi
Khan
King
Kleiner
Kok
Kong
Kreutz
Krčo
Kvaternik
Kwoczek
Kwoczek
Lamport
Lamport
Lamport
Laszka
Lev-Ari
Levin
Li
Liu
Liu
Lockwood
Lu
Luck
Mao
Marisol García-Valls
Masdari
Mavridou
McKeown
Mell
Melton
Mocevicius
Mollah
Morsy
Mueffelmann
Mukherjee
Neagoe
Ongaro
Ongaro
Paolucci
Preden
Rasmussen
Rhea
Robinson
Sapienza
Satyanarayanan
Sheth
Shi
Shi
Sierra
Simmhan
Spillner
Stojmenovic
Storey
Suhothayan
Varghese
Veeraraghavan
Verbelen
Vicent Botti
Willis
Wooldridge
Wooldridge
Xu
Yang
Yi
Yi
Yuan
Zygouras
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

[EN] If last decade viewed computational services as a utility then surely this decade has transformed computation into a commodity. Computation is now progressively integrated into the physical networks in a seamless way that enables cyber-physical systems (CPS) and the Internet of Things (IoT) meet their latency requirements. Similar to the concept of ¿platform as a service¿ or ¿software as a service¿, both cloudlets and fog computing have found their own use cases. Edge devices (that we call end or user devices for disambiguation) play the role of personal computers, dedicated to a user and to a set of correlated applications. In this new scenario, the boundaries between the network node, the sensor, and the actuator are blurring, driven primarily by the computation power of IoT nodes like single board computers and the smartphones. The bigger data generated in this type of networks needs clever, scalable, and possibly decentralized computing solutions that can scale independently as required. Any node can be seen as part of a graph, with the capacity to serve as a computing or network router node, or both. Complex applications can possibly be distributed over this graph or network of nodes to improve the overall performance like the amount of data processed over time. In this paper, we identify this new computing paradigm that we call Social Dispersed Computing, analyzing key themes in it that includes a new outlook on its relation to agent based applications. We architect this new paradigm by providing supportive application examples that include next generation electrical energy distribution networks, next generation mobility services for transportation, and applications for distributed analysis and identification of non-recurring traffic congestion in cities. The paper analyzes the existing computing paradigms (e.g., cloud, fog, edge, mobile edge, social, etc.), solving the ambiguity of their definitions; and analyzes and discusses the relevant foundational software technologies, the remaining challenges, and research opportunities.Garcia Valls, MS.; Dubey, A.; Botti, V. (2018). Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges. Journal of Systems Architecture. 91:83-102. https://doi.org/10.1016/j.sysarc.2018.05.007S831029

Crossref

RiuNet