691 research outputs found
Recommended from our members
MapReduce based RDF assisted distributed SVM for high throughput spam filtering
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses.
Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart.
Effectively exploiting large scale, âCloudâ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure.
The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin
Efficient Decision Support Systems
This series is directed to diverse managerial professionals who are leading the transformation of individual domains by using expert information and domain knowledge to drive decision support systems (DSSs). The series offers a broad range of subjects addressed in specific areas such as health care, business management, banking, agriculture, environmental improvement, natural resource and spatial management, aviation administration, and hybrid applications of information technology aimed to interdisciplinary issues. This book series is composed of three volumes: Volume 1 consists of general concepts and methodology of DSSs; Volume 2 consists of applications of DSSs in the biomedical domain; Volume 3 consists of hybrid applications of DSSs in multidisciplinary domains. The book is shaped upon decision support strategies in the new infrastructure that assists the readers in full use of the creative technology to manipulate input data and to transform information into useful decisions for decision makers
Processing spam: Conducting processed listening and rhythmedia to (re)produce people and territories
This thesis provides a transdisciplinary investigation of âdeviantâ media categories, specifically spam and noise, and the way they are constructed and used to (re)produce territories and people. Spam, I argue, is a media phenomenon that has always existed, and received different names in different times. The changing definitions of spam, the reasons and actors behind these changes are thus the focus of this research. It brings to the forefront a longer history of the politics of knowledge production with and in media, and its consequences. This thesis makes a contribution to the media and communication field by looking at neglected media phenomena through fields such as sound studies, software studies, law and history to have richer understanding that disciplinary boundaries fail to achieve.
The thesis looks at three different case studies: the conceptualisation of noise in the early 20th century through Bell Telephone Company, web metric standardisation in the European Union 2000s legislation, and unwanted behaviours on Facebook. What these cases show is that media practitioners have been constructing âdeviantâ categories in different media and periods by using seven sonic epistemological strategies: training of the (digital) body, restructuring of territories, new experts, standardising measurements (tools and units), filtering, de-politicising and licensing.
Informed by my empirical work, I developed two concepts - processed listening and rhythmedia - offering a new theoretical framework to analyse how media practitioners construct power relations by knowing people in mediated territories and then spatially and temporally (re)ordering them. Shifting the attention from theories of vision allows media researchers to have a better understanding of practitioners who work in multi-layered digital/datafied spaces, tuning in and out to continuously measure and record peopleâs behaviours. Such knowledge is being fed back in a recursive feedback-loop conducted by a particular rhythmedia constantly processing, ordering, shaping and regulating people, objects and spaces. Such actions (re)configure the boundaries of what it means to be human, worker and medium
Collective Multi-relational Network Mining
Our world is becoming increasingly interconnected, and the study of networks and graphs are becoming more important than ever. Domains such as biological and pharmaceutical networks, online social networks, the World Wide Web, recommender systems, and scholarly networks are just a few examples that include explicit or implicit network structures. Most networks are formed between different types of nodes and contain different types of links. Leveraging these multi-relational and heterogeneous structures is an important factor in developing better models for these real-world networks. Another important aspect of developing models for network data to make predictions about entities such as nodes or links, is the connections between such entities. These connections invalidate the i.i.d. assumptions about the data in most traditional machine learning methods. Hence, unlike models for non-network data where predictions about entities are made independently of each other, the inter-connectivity of the entities in networks should cause the inferred information about one entity to change the models belief about other related entities.
In this dissertation, I present models that can effectively leverage the multi-relational nature of networks and collectively make predictions on links and nodes. In both tasks, I empirically show the importance of considering the multi-relational characteristics and collective predictions. In the first part, I present models to make predictions on nodes by leveraging the graph structure, links generation sequence, and making collective predictions. I apply the node classification methods to detect social spammers in evolving multi-relational social networks and show their effectiveness in identifying spammers without the need of using the textual content. In the second part, I present a generalized augmented multi-relational bi-typed network. I then propose a template for link inference models on these networks and show their application in pharmaceutical discoveries and recommender systems. In the third part, I show that my proposed collective link prediction model is an instance of a general graph-based prediction model that relies on a neighborhood graph for predictions. I then propose a framework that can dynamically adapt the neighborhood graph based on the state of variables from intermediate inference results, as well as structural properties of the relations connecting them to improve the predictive performance of the model
Concepts in Action
This open access book is a timely contribution in presenting recent issues, approaches, and results that are not only central to the highly interdisciplinary field of concept research but also particularly important to newly emergent paradigms and challenges. The contributors present a unique, holistic picture for the understanding and use of concepts from a wide range of fields including cognitive science, linguistics, philosophy, psychology, artificial intelligence, and computer science. The chapters focus on three distinct points of view that lie at the core of concept research: representation, learning, and application. The contributions present a combination of theoretical, experimental, computational, and applied methods that appeal to students and researchers working in these fields
Concepts in Action
This open access book is a timely contribution in presenting recent issues, approaches, and results that are not only central to the highly interdisciplinary field of concept research but also particularly important to newly emergent paradigms and challenges. The contributors present a unique, holistic picture for the understanding and use of concepts from a wide range of fields including cognitive science, linguistics, philosophy, psychology, artificial intelligence, and computer science. The chapters focus on three distinct points of view that lie at the core of concept research: representation, learning, and application. The contributions present a combination of theoretical, experimental, computational, and applied methods that appeal to students and researchers working in these fields
- âŠ