593 research outputs found
Machine Learning Approaches for Modeling Spammer Behavior
Spam is commonly known as unsolicited or unwanted email messages in the
Internet causing potential threat to Internet Security. Users spend a valuable
amount of time deleting spam emails. More importantly, ever increasing spam
emails occupy server storage space and consume network bandwidth. Keyword-based
spam email filtering strategies will eventually be less successful to model
spammer behavior as the spammer constantly changes their tricks to circumvent
these filters. The evasive tactics that the spammer uses are patterns and these
patterns can be modeled to combat spam. This paper investigates the
possibilities of modeling spammer behavioral patterns by well-known
classification algorithms such as Na\"ive Bayesian classifier (Na\"ive Bayes),
Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary
experimental results demonstrate a promising detection rate of around 92%,
which is considerably an enhancement of performance compared to similar spammer
behavior modeling research.Comment: 12 pages, 3 figures, 5 tables, Submitted to AIRS 201
The application of user log for online business environment using content-based Image retrieval system
Over the past few years, inter-query learning has gained much attention in the research and development of content-based image retrieval (CBIR) systems. This is largely due to the capability of inter-query approach to enable learning from the retrieval patterns of previous query sessions. However, much of the research works in this field have been focusing on analyzing image retrieval patterns stored in the database. This is not suitable for a dynamic environment such as the World Wide Web (WWW) where images are constantly added or removed. A better alternative is to use an image's visual features to capture the knowledge gained from the previous query sessions. Based on the previous work (Chung et al., 2006), the aim of this paper is to propose a framework of inter-query learning for the WWW-CBIR systems. Such framework can be extremely useful for those online companies whose core business involves providing multimedia content-based services and products to their customers
Evolutionary discriminative confidence estimation for spoken term detection
The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0913-zSpoken term detection (STD) is the task of searching for occurrences
of spoken terms in audio archives. It relies on robust confidence estimation
to make a hit/false alarm (FA) decision. In order to optimize the decision
in terms of the STD evaluation metric, the confidence has to be discriminative.
Multi-layer perceptrons (MLPs) and support vector machines (SVMs) exhibit
good performance in producing discriminative confidence; however they are
severely limited by the continuous objective functions, and are therefore less
capable of dealing with complex decision tasks. This leads to a substantial
performance reduction when measuring detection of out-of-vocabulary (OOV)
terms, where the high diversity in term properties usually leads to a complicated
decision boundary.
In this paper we present a new discriminative confidence estimation approach
based on evolutionary discriminant analysis (EDA). Unlike MLPs and
SVMs, EDA uses the classification error as its objective function, resulting
in a model optimized towards the evaluation metric. In addition, EDA combines
heterogeneous projection functions and classification strategies in decision
making, leading to a highly flexible classifier that is capable of dealing
with complex decision tasks. Finally, the evolutionary strategy of EDA reduces the risk of local minima. We tested the EDA-based confidence with a
state-of-the-art phoneme-based STD system on an English meeting domain
corpus, which employs a phoneme speech recognition system to produce lattices
within which the phoneme sequences corresponding to the enquiry terms
are searched. The test corpora comprise 11 hours of speech data recorded with
individual head-mounted microphones from 30 meetings carried out at several
institutes including ICSI; NIST; ISL; LDC; the Virginia Polytechnic Institute
and State University; and the University of Edinburgh. The experimental results
demonstrate that EDA considerably outperforms MLPs and SVMs on
both classification and confidence measurement in STD, and the advantage
is found to be more significant on OOV terms than on in-vocabulary (INV)
terms. In terms of classification performance, EDA achieved an equal error
rate (EER) of 11% on OOV terms, compared to 34% and 31% with MLPs and
SVMs respectively; for INV terms, an EER of 15% was obtained with EDA
compared to 17% obtained with MLPs and SVMs. In terms of STD performance
for OOV terms, EDA presented a significant relative improvement of
1.4% and 2.5% in terms of average term-weighted value (ATWV) over MLPs
and SVMs respectively.This work was partially supported by the French Ministry of Industry
(Innovative Web call) under contract 09.2.93.0966, ‘Collaborative Annotation for Video
Accessibility’ (ACAV) and by ‘The Adaptable Ambient Living Assistant’ (ALIAS) project
funded through the joint national Ambient Assisted Living (AAL) programme
Review of Classification Algorithms with Changing Inter-Class Distances
Peer reviewedPublisher PD
Recommended from our members
MapReduce based RDF assisted distributed SVM for high throughput spam filtering
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityElectronic mail has become cast and embedded in our everyday lives. Billions of legitimate emails are sent on a daily basis. The widely established underlying infrastructure, its widespread availability as well as its ease of use have all acted as catalysts to such pervasive proliferation. Unfortunately, the same can be alleged about unsolicited bulk email, or rather spam. Various methods, as well as enabling architectures are available to try to mitigate spam permeation. In this respect, this dissertation compliments existing survey work in this area by contributing an extensive literature review of traditional and emerging spam filtering approaches. Techniques, approaches and architectures employed for spam filtering are appraised, critically assessing respective strengths and weaknesses.
Velocity, volume and variety are key characteristics of the spam challenge. MapReduce (M/R) has become increasingly popular as an Internet scale, data intensive processing platform. In the context of machine learning based spam filter training, support vector machine (SVM) based techniques have been proven effective. SVM training is however a computationally intensive process. In this dissertation, a M/R based distributed SVM algorithm for scalable spam filter training, designated MRSMO, is presented. By distributing and processing subsets of the training data across multiple participating computing nodes, the distributed SVM reduces spam filter training time significantly. To mitigate the accuracy degradation introduced by the adopted approach, a Resource Description Framework (RDF) based feedback loop is evaluated. Experimental results demonstrate that this improves the accuracy levels of the distributed SVM beyond the original sequential counterpart.
Effectively exploiting large scale, ‘Cloud’ based, heterogeneous processing capabilities for M/R in what can be considered a non-deterministic environment requires the consideration of a number of perspectives. In this work, gSched, a Hadoop M/R based, heterogeneous aware task to node matching and allocation scheme is designed. Using MRSMO as a baseline, experimental evaluation indicates that gSched improves on the performance of the out-of-the box Hadoop counterpart in a typical Cloud based infrastructure.
The focal contribution to knowledge is a scalable, heterogeneous infrastructure and machine learning based spam filtering scheme, able to capitalize on collaborative accuracy improvements through RDF based, end user feedback. MapReduce based RDF Assisted Distributed SVM for High Throughput Spam Filterin
Online Tracking of the Contents of Conscious Perception Using Real-Time fMRI
Perception is an active process that interprets and structures the stimulus input based on assumptions about its possible causes. We use real-time functional magnetic resonance imaging (rtfMRI) to investigate a particularly powerful demonstration of dynamic object integration in which the same physical stimulus intermittently elicits categorically different conscious object percepts. In this study, we simulated an outline object that is moving behind a narrow slit. With such displays, the physically identical stimulus can elicit categorically different percepts that either correspond closely to the physical stimulus (vertically moving line segments) or represent a hypothesis about the underlying cause of the physical stimulus (a horizontally moving object that is partly occluded). In the latter case, the brain must construct an object from the input sequence. Combining rtfMRI with machine learning techniques we show that it is possible to determine online the momentary state of a subject’s conscious percept from time resolved BOLD-activity. In addition, we found that feedback about the currently decoded percept increased the decoding rates compared to prior fMRI recordings of the same stimulus without feedback presentation. The analysis of the trained classifier revealed a brain network that discriminates contents of conscious perception with antagonistic interactions between early sensory areas that represent physical stimulus properties and higher-tier brain areas. During integrated object percepts, brain activity decreases in early sensory areas and increases in higher-tier areas. We conclude that it is possible to use BOLD responses to reliably track the contents of conscious visual perception with a relatively high temporal resolution. We suggest that our approach can also be used to investigate the neural basis of auditory object formation and discuss the results in the context of predictive coding theory
- …