Search CORE

571 research outputs found

Recommended from our members

MapReduce based RDF assisted distributed SVM for high throughput spam filtering

Author: Caruana Godwin
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses. Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart. Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure. The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin

Brunel University Research Archive

Review on recent advances in information mining from big consumer opinion data for product design

Author: Ji Ping
Jin Jian
Kwong C. K.
Liu Ying
Publication venue: 'ASME International'
Publication date: 17/09/2018
Field of study

In this paper, based on more than ten years' studies on this dedicated research thrust, a comprehensive review concerning information mining from big consumer opinion data in order to assist product design is presented. First, the research background and the essential terminologies regarding online consumer opinion data are introduced. Next, studies concerning information extraction and information utilization of big consumer opinion data for product design are reviewed. Studies on information extraction of big consumer opinion data are explained from various perspectives, including data acquisition, opinion target recognition, feature identification and sentiment analysis, opinion summarization and sampling, etc. Reviews on information utilization of big consumer opinion data for product design are explored in terms of how to extract critical customer needs from big consumer opinion data, how to connect the voice of the customers with product design, how to make effective comparisons and reasonable ranking on similar products, how to identify ever-evolving customer concerns efficiently, and so on. Furthermore, significant and practical aspects of research trends are highlighted for future studies. This survey will facilitate researchers and practitioners to understand the latest development of relevant studies and applications centered on how big consumer opinion data can be processed, analyzed, and exploited in aiding product design

Online Research @ Cardiff

Cyber Security and Critical Infrastructures

Author
Publication venue: 'MDPI AG'
Publication date: 16/09/2022
Field of study

This book contains the manuscripts that were accepted for publication in the MDPI Special Topic "Cyber Security and Critical Infrastructure" after a rigorous peer-review process. Authors from academia, government and industry contributed their innovative solutions, consistent with the interdisciplinary nature of cybersecurity. The book contains 16 articles: an editorial explaining current challenges, innovative solutions, real-world experiences including critical infrastructure, 15 original papers that present state-of-the-art innovative solutions to attacks on critical systems, and a review of cloud, edge computing, and fog's security and privacy issues

Directory of Open Access Books (DOAB)

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Security In The Internet Of Things - A Systematic Mapping Study

Author: Khakurel Jayden
Knutas Antti
Porras Jari
Pänkäläinen Jouni
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2018
Field of study

The Internet of Things (IoT) concept is emerging and evolving rapidly. Various technical solutions for multiple purposes have been proposed for its implementation. The rapid evolution and utilization of IoT technologies has raised security concerns and created a feeling of uncertainty among IoT adopters. The purpose of this paper is to examine the current research trends related to security concerns of the IoT concept and provide a detailed understanding of the topic. We thus applied systematic mapping study as the methodological approach. Based on the chosen search strategy, 38 articles (of close to 3500 articles in the field) were selected for a closer examination. Out of these articles, the concerns, solutions and research gaps for the security in the IoT concept were extracted. The mapping study identifies nine main concerns and 11 solutions. However, the findings also reveal challenges, such as secure privacy management and cloud integration that still require efficient solutions

Crossref

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

A Novel Design Science Approach for Integrating Chinese User-Generated Content in Non-Chinese Market Intelligence

Author: Baur Aaron
Bick Markus
Bühler Julian
Lipenkova Janna
Publication venue: AIS Electronic Library (AISeL)
Publication date: 13/12/2015
Field of study

Market research has long relied on reactive means of data gathering, such as questionnaires or focus groups. With the wide-spread use of social media, millions of comments about customer opinions and feedback regarding products and brands are available. However, before using this ‘wisdom of the crowd’ as a source for marketing research, several challenges have to be tackled: the sheer volume of posts, their unstructured format, and the dozens of different languages used on the internet. All of them make automated usage of this data challenging. In this paper, we draw on dashboard design principles and follow a design science research approach to develop a framework for search, integration, and analysis of cross-language user-generated content. With ‘MarketMiner’, we implement the framework in the automotive industry by analyzing Chinese auto forums. The results are promising in that MarketMiner can dramatically improve utilization of foreign-language social media content for market intelligence purposes

AIS Electronic Library (AISeL)

DESIGN AND EXPLORATION OF NEW MODELS FOR SECURITY AND PRIVACY-SENSITIVE COLLABORATION SYSTEMS

Author: Sandhu Ramandeep Kaur
Publication venue: VCU Scholars Compass
Publication date: 01/01/2022
Field of study

Collaboration has been an area of interest in many domains including education, research, healthcare supply chain, Internet of things, and music etc. It enhances problem solving through expertise sharing, ideas sharing, learning and resource sharing, and improved decision making. To address the limitations in the existing literature, this dissertation presents a design science artifact and a conceptual model for collaborative environment. The first artifact is a blockchain based collaborative information exchange system that utilizes blockchain technology and semi-automated ontology mappings to enable secure and interoperable health information exchange among different health care institutions. The conceptual model proposed in this dissertation explores the factors that influences professionals continued use of video- conferencing applications. The conceptual model investigates the role the perceived risks and benefits play in influencing professionals’ attitude towards VC apps and consequently its active and automatic use

VCU Scholars Compass

Text Mining for Information Systems Researchers: An Annotated Topic Modeling Tutorial

Author: Debortoli Stefan
Junglas Iris
Müller Oliver
vom Brocke Jan
Publication venue: 'Association for Information Systems'
Publication date: 01/01/2016
Field of study

Analysts have estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video)—much of it expressed in rich and ambiguous natural language. Traditionally, to analyze natural language, one has used qualitative data-analysis approaches, such as manual coding. Yet, the size of text data sets obtained from the Internet makes manual analysis virtually impossible. In this tutorial, we discuss the challenges encountered when applying automated text-mining techniques in information systems research. In particular, we showcase how to use probabilistic topic modeling via Latent Dirichlet allocation, an unsupervised text-mining technique, with a LASSO multinomial logistic regression to explain user satisfaction with an IT artifact by automatically analyzing more than 12,000 online customer reviews. For fellow information systems researchers, this tutorial provides guidance for conducting text-mining studies on their own and for evaluating the quality of others

Crossref

The IT University of Copenhagen's Repository

AIS Electronic Library (AISeL)

A survey on opinion summarization technique s for social media

Author: Haggag Mohamed H.
Mohamed Ensaf Hussein
Moussa Mohammed Elsaid
Publication venue: Arab Journals Platform
Publication date: 14/06/2020
Field of study

The volume of data on the social media is huge and even keeps increasing. The need for efficient processing of this extensive information resulted in increasing research interest in knowledge engineering tasks such as Opinion Summarization. This survey shows the current opinion summarization challenges for social media, then the necessary pre-summarization steps like preprocessing, features extraction, noise elimination, and handling of synonym features. Next, it covers the various approaches used in opinion summarization like Visualization, Abstractive, Aspect based, Query-focused, Real Time, Update Summarization, and highlight other Opinion Summarization approaches such as Contrastive, Concept-based, Community Detection, Domain Specific, Bilingual, Social Bookmarking, and Social Media Sampling. It covers the different datasets used in opinion summarization and future work suggested in each technique. Finally, it provides different ways for evaluating opinion summarization

Arab Journals Platform