Search CORE

7,675 research outputs found

KACST Arabic Text Classification Project: Overview and Preliminary Results

Author: Al-Rajeh A.
Alharbi S.
Almuhareb A.
Althubaity A.
Khorsheed M.
Publication venue
Publication date: 01/01/2008
Field of study

Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques

Southampton (e-Prints Soton)

An ontology enhanced parallel SVM for scalable spam filter training

Author: Bauer
Blanco
Blanzieri
Blei
Breiman
Cao
Caruana
Chawla
Colas
Cristianini
Dean
Do
Gansterer
Godwin Caruana
Graf
Hall
Huang
Kearns
Kim
Maozhen Li
Mei
Platt
Suykens
Taura
Vapnik
Wang
Woodsend
Yang Liu
Zanghirati
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

Crossref

Brunel University Research Archive

Evaluation of Bayesian Spam Filter and SVM Spam Filter

Author
Publication venue
Publication date: 01/01/2004
Field of study

CD-RO

Future University Hakodate Academic Archive