65 research outputs found

    An improved switching hybrid recommender system using naive Bayes classifier and collaborative filtering

    No full text
    Recommender Systems apply machine learning and data mining techniques for filtering unseen information and can predict whether a user would like a given resource. To date a number of recommendation algorithms have been proposed, where collaborative filtering and content-based filtering are the two most famous and adopted recommendation techniques. Collaborative filtering recommender systems recommend items by identifying other users with similar taste and use their opinions for recommendation; whereas content-based recommender systems recommend items based on the content information of the items. These systems suffer from scalability, data sparsity, over specialization, and cold-start problems resulting in poor quality recommendations and reduced coverage. Hybrid recommender systems combine individual systems to avoid certain aforementioned limitations of these systems. In this paper, we proposed a unique switching hybrid recommendation approach by combining a Naive Bayes classification approach with the collaborative filtering. Experimental results on two different data sets, show that the proposed algorithm is scalable and provide better performance – in terms of accuracy and coverage – than other algorithms while at the same time eliminates some recorded problems with the recommender systems

    A Low Dimensional Approximation For Competence In Bacillus Subtilis

    Full text link
    The behaviour of a high dimensional stochastic system described by a Chemical Master Equation (CME) depends on many parameters, rendering explicit simulation an inefficient method for exploring the properties of such models. Capturing their behaviour by low-dimensional models makes analysis of system behaviour tractable. In this paper, we present low dimensional models for the noise-induced excitable dynamics in Bacillus subtilis, whereby a key protein ComK, which drives a complex chain of reactions leading to bacterial competence, gets expressed rapidly in large quantities (competent state) before subsiding to low levels of expression (vegetative state). These rapid reactions suggest the application of an adiabatic approximation of the dynamics of the regulatory model that, however, lead to competence durations that are incorrect by a factor of 2. We apply a modified version of an iterative functional procedure that faithfully approximates the time-course of the trajectories in terms of a 2-dimensional model involving proteins ComK and ComS. Furthermore, in order to describe the bimodal bivariate marginal probability distribution obtained from the Gillespie simulations of the CME, we introduce a tunable multiplicative noise term in a 2-dimensional Langevin model whose stationary state is described by the time-independent solution of the corresponding Fokker-Planck equation.Comment: 12 pages, to be published in IEEE/ACM Transactions on Computational Biology and Bioinformatic

    Application of bagging, boosting and stacking to intrusion detection

    No full text
    This paper investigates the possibility of using ensemble algorithms to improve the performance of network intrusion detection systems. We use an ensemble of three different methods, bagging, boosting and stacking, in order to improve the accuracy and reduce the false positive rate. We use four different data mining algorithms, naïve bayes, J48 (decision tree), JRip (rule induction) and iBK( nearest neighbour), as base classifiers for those ensemble methods. Our experiment shows that the prototype which implements four base classifiers and three ensemble algorithms achieves an accuracy of more than 99% in detecting known intrusions, but failed to detect novel intrusions with the accuracy rates of around just 60%. The use of bagging, boosting and stacking is unable to significantly improve the accuracy. Stacking is the only method that was able to reduce the false positive rate by a significantly high amount (46.84%); unfortunately, this method has the longest execution time and so is insufficient to implement in the intrusion detection fiel

    SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance

    Get PDF
    Machine Learning algorithms have been widely used to solve various kinds of data classification problems. Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicated and computationally expensive, especially when the number of possible different combinations of variables is so high. Support Vector Machine (SVM) has been proven to perform much better when dealing with high dimensional datasets and numerical features. Although SVM works well with default value, the performance of SVM can be improved significantly using parameter optimization. We applied two methods which are Grid Search and Genetic Algorithm (GA) to optimize the SVM parameters. Our experiment showed that SVM parameter optimization using grid search always finds near optimal parameter combination within the given ranges. However, grid search was very slow; therefore it was very reliable only in low dimensional datasets with few parameters. SVM parameter optimization using GA can be used to solve the problem of grid search. GA has proven to be more stable than grid search. Based on average running time on 9 datasets, GA was almost 16 times faster than grid search. Futhermore, the GA’s results were slighlty better than the grid search in 8 of 9 datasets

    Annotation of Heterogenous Media Using OntoMedia

    No full text
    While ontologies exist for the annotation of monomedia, interoperability between these schemes is an important issue. The OntoMedia ontology consists of a generic core, capable of representing a diverse range of media, as well as extension ontologies to focus on specific formats. This paper provides an overview of the OntoMedia ontologies, together with a detailed case study when applied to video, a scripted form, and an associated short story

    Novel centroid selection approaches for KMeans-clustering based recommender systems

    Get PDF
    Recommender systems have the ability to filter unseen information for predicting whether a particular user would prefer a given item when making a choice. Over the years, this process has been dependent on robust applications of data mining and machine learning techniques, which are known to have scalability issues when being applied for recommender systems. In this paper, we propose a k-means clustering-based recommendation algorithm, which addresses the scalability issues associated with traditional recommender systems. An issue with traditional k-means clustering algorithms is that they choose the initial k centroid randomly, which leads to inaccurate recommendations and increased cost for offline training of clusters. The work in this paper highlights how centroid selection in k-means based recommender systems can improve performance as well as being cost saving. The proposed centroid selection method has the ability to exploit underlying data correlation structures, which has been proven to exhibit superior accuracy and performance in comparison to the traditional centroid selection strategies, which choose centroids randomly. The proposed approach has been validated with an extensive set of experiments based on five different datasets (from movies, books, and music domain). These experiments prove that the proposed approach provides a better quality cluster and converges quicker than existing approaches, which in turn improves accuracy of the recommendation provided

    Novel Significance Weighting Schemes for Collaborative Filtering: Generating Improved Recommendations in Sparse Environments

    No full text
    Recommender systems apply machine learning and data mining techniques for filtering unseen information and can predict whether a user would like a given resource. To date, a number of recommender system algorithms have been proposed, where collaborative filtering is the most famous and adopted recommendation algorithm. Collaborative filtering recommender systems recommend items by identifying other similar users, in case of user-based collaborative filtering, or similar items, in case of item-based collaborative filtering. Significance weighting schemes assign different weights to neighbouring users/items found against an active user/item. Several significance weighting schemes have been proposed [1], [2], [3], [4]. In this paper, we claim that these proposed schemes are flawed by the fact that they can not be applied to general recommender system datasets. We provide the correct generalized significance weighting schemes using different novel heuristics, and by extensive experimental results on three different data sets, show how significance weighting schemes affect the performance of a recommender system. Furthermore, we claim that the conventional weighted sum prediction formula used in item-based [5] collaborative filtering is not correct for very sparse datasets. We provide the correct prediction formula and empirically evaluate it
    corecore