Search CORE

75 research outputs found

An improved switching hybrid recommender system using naive Bayes classifier and collaborative filtering

Author: Ghazanfar Mustansar
Prugel-Bennett Adam
Publication venue
Publication date: 20/04/2010
Field of study

Recommender Systems apply machine learning and data mining techniques for filtering unseen information and can predict whether a user would like a given resource. To date a number of recommendation algorithms have been proposed, where collaborative filtering and content-based filtering are the two most famous and adopted recommendation techniques. Collaborative filtering recommender systems recommend items by identifying other users with similar taste and use their opinions for recommendation; whereas content-based recommender systems recommend items based on the content information of the items. These systems suffer from scalability, data sparsity, over specialization, and cold-start problems resulting in poor quality recommendations and reduced coverage. Hybrid recommender systems combine individual systems to avoid certain aforementioned limitations of these systems. In this paper, we proposed a unique switching hybrid recommendation approach by combining a Naive Bayes classification approach with the collaborative filtering. Experimental results on two different data sets, show that the proposed algorithm is scalable and provide better performance – in terms of accuracy and coverage – than other algorithms while at the same time eliminates some recorded problems with the recommender systems

Southampton (e-Prints Soton)

A Low Dimensional Approximation For Competence In Bacillus Subtilis

Author: Dasmahapatra Srinandan
Nguyen An
Prugel-Bennett Adam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/05/2015
Field of study

The behaviour of a high dimensional stochastic system described by a Chemical Master Equation (CME) depends on many parameters, rendering explicit simulation an inefficient method for exploring the properties of such models. Capturing their behaviour by low-dimensional models makes analysis of system behaviour tractable. In this paper, we present low dimensional models for the noise-induced excitable dynamics in Bacillus subtilis, whereby a key protein ComK, which drives a complex chain of reactions leading to bacterial competence, gets expressed rapidly in large quantities (competent state) before subsiding to low levels of expression (vegetative state). These rapid reactions suggest the application of an adiabatic approximation of the dynamics of the regulatory model that, however, lead to competence durations that are incorrect by a factor of 2. We apply a modified version of an iterative functional procedure that faithfully approximates the time-course of the trajectories in terms of a 2-dimensional model involving proteins ComK and ComS. Furthermore, in order to describe the bimodal bivariate marginal probability distribution obtained from the Gillespie simulations of the CME, we introduce a tunable multiplicative noise term in a 2-dimensional Langevin model whose stationary state is described by the time-independent solution of the corresponding Fokker-Planck equation.Comment: 12 pages, to be published in IEEE/ACM Transactions on Computational Biology and Bioinformatic

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Crossref

Incremental Kernel Mapping Algorithms for Scalable Recommender Systems

Author: Ghazanfar Mustansar
Prugel-Bennett Adam
Szedmak Sandor
Publication venue
Publication date
Field of study

Recommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given item. Kernel Mapping Recommender (KMR)system algorithms have been proposed, which offer state-of-the-art performance. One potential drawback of the KMR algorithms is that the training is done in one step and hence they cannot accommodate the incremental update with the arrival of new data making them unsuitable for the dynamic environments. From this line of research, we propose a new heuristic, which can build the model incrementally without retraining the whole model from scratch when new data (item or user) are added to the recommender system dataset. Furthermore, we proposed a novel perceptron type algorithm, which is a fast incremental algorithm for building the model that maintains a good level of accuracy and scales well with the data. We show empirically over two datasets that the proposed algorithms give quite accurate results while providing significant computation savings

Southampton (e-Prints Soton)

Unsupervised clustering approach for network anomaly detection

Author: Prugel-Bennett Adam
Syarif Iwan
Wills Gary B.
Publication venue
Publication date: 24/04/2012
Field of study

This paper describes the advantages of using the anomaly detection approach over the misuse detection technique in detecting unknown network intrusions or attacks. It also investigates the performance of various clustering algorithms when applied to anomaly detection. Five different clustering algorithms: k-Means, improved k-Means, k-Medoids, EM clustering and distance-based outlier detection algorithms are used. Our experiment shows that misuse detection techniques, which implemented four different classifiers (naïve Bayes, rule induction, decision tree and nearest neighbour) failed to detect network traffic, which contained a large number of unknown intrusions; where the highest accuracy was only 63.97% and the lowest false positive rate was 17.90%. On the other hand, the anomaly detection module showed promising results where the distance-based outlier detection algorithm outperformed other algorithms with an accuracy of 80.15%. The accuracy for EM clustering was 78.06%, for k-Medoids it was 76.71%, for improved k-Means it was 65.40% and for k-Means it was 57.81%. Unfortunately, our anomaly detection module produces high false positive rate (more than 20%) for all four clustering algorithms. Therefore, our future work will be more focus in reducing the false positive rate and improving the accuracy using more advance machine learning technique

Southampton (e-Prints Soton)

Application of bagging, boosting and stacking to intrusion detection

Author: Prugel-Bennett Adam
Syarif Iwan
Wills Gary
Zaluska Ed
Publication venue
Publication date: 13/07/2012
Field of study

This paper investigates the possibility of using ensemble algorithms to improve the performance of network intrusion detection systems. We use an ensemble of three different methods, bagging, boosting and stacking, in order to improve the accuracy and reduce the false positive rate. We use four different data mining algorithms, naïve bayes, J48 (decision tree), JRip (rule induction) and iBK( nearest neighbour), as base classifiers for those ensemble methods. Our experiment shows that the prototype which implements four base classifiers and three ensemble algorithms achieves an accuracy of more than 99% in detecting known intrusions, but failed to detect novel intrusions with the accuracy rates of around just 60%. The use of bagging, boosting and stacking is unable to significantly improve the accuracy. Stacking is the only method that was able to reduce the false positive rate by a significantly high amount (46.84%); unfortunately, this method has the longest execution time and so is insufficient to implement in the intrusion detection fiel

Southampton (e-Prints Soton)

On the landscape of combinatorial optimization problems

Author: Prugel-Bennett Adam
Tayarani Najaran Mohammad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/09/2013
Field of study

This paper carries out a comparison of the fitness landscape for four classic optimization problems: Max-Sat, graph-coloring, traveling salesman, and quadratic assignment. We have focused on two types of properties, local average properties of the landscape, and properties of the local optima. For the local optima we give a fairly comprehensive description of the properties, including the expected time to reach a local optimum, the number of local optima at different cost levels, the distance between optima, and the expected probability of reaching the optima. Principle component analysis is used to understand the correlations between the local optima. Most of the properties that we examine have not been studied previously, particularly those concerned with properties of the local optima. We compare and contrast the behavior of the four different problems. Although the problems are very different at the low level, many of the long-range properties exhibit a remarkable degree of similarity

Southampton (e-Prints Soton)

SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance

Author: Prugel-Bennett Adam
Syarif Iwan
Wills Gary
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/12/2016
Field of study

Machine Learning algorithms have been widely used to solve various kinds of data classification problems. Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicated and computationally expensive, especially when the number of possible different combinations of variables is so high. Support Vector Machine (SVM) has been proven to perform much better when dealing with high dimensional datasets and numerical features. Although SVM works well with default value, the performance of SVM can be improved significantly using parameter optimization. We applied two methods which are Grid Search and Genetic Algorithm (GA) to optimize the SVM parameters. Our experiment showed that SVM parameter optimization using grid search always finds near optimal parameter combination within the given ranges. However, grid search was very slow; therefore it was very reliable only in low dimensional datasets with few parameters. SVM parameter optimization using GA can be used to solve the problem of grid search. GA has proven to be more stable than grid search. Based on average running time on 9 datasets, GA was almost 16 times faster than grid search. Futhermore, the GA’s results were slighlty better than the grid search in 8 of 9 datasets

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System