4,034 research outputs found
Polychotomiser for case-based reasoning beyond the traditional Bayesian classification approach
This work implements an enhanced Bayesian classifier with better performance as compared to the ordinary naïve Bayes classifier when used with domains and datasets of varying characteristics. Text classification is an active and on-going research field of Artificial Intelligence (AI). Text classification is defined as the task of learning methods for categorising collections of electronic text documents into their annotated classes, based on its contents. An increasing number of statistical approaches have been developed for text classification, including k-nearest neighbor classification, naïve Bayes classification, decision tree, rules induction, and the algorithm implementing the structural risk minimisation theory called the support vector machine. Among the approaches used in these applications, naïve Bayes classifiers have been widely used because of its simplicity. However this generative method has been reported to be less accurate than the discriminative methods such as SVM. Some researches have proven that the naïve Bayes classifier performs surprisingly well in many other domains with certain specialised characteristics. The main aim of this work is to quantify the weakness of traditional naïve Bayes classification and introduce an enhance Bayesian classification approach with additional innovative techniques to perform better than the traditional naïve Bayes classifier. Our research goal is to develop an enhanced Bayesian probabilistic classifier by introducing different tournament structures ranking algorithms along with a high relevance keywords extraction facility and an accurately calculated weighting factors facility. These were done to improve the performance of the classification tasks for specific datasets with different characteristics. Other researches have used general datasets, such as Reuters-21578 and 20_newsgroups to validate the performance of their classifiers. Our approach is easily adapted to datasets with different characteristics in terms of the degree of similarity between classes, multi-categorised documents, and different dataset organisations. As previously mentioned we introduce several techniques such as tournament structures ranking algorithms, higher relevance keyword extraction, and automatically computed document dependent (ACDD) weighting factors. Each technique has unique response while been implemented in datasets with different characteristics but has shown to give outstanding performance in most cases. We have successfully optimised our techniques for individual datasets with different characteristics based on our experimental results
Using online linear classifiers to filter spam Emails
The performance of two online linear classifiers - the Perceptron and Littlestone’s Winnow – is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering
A traffic classification method using machine learning algorithm
Applying concepts of attack investigation in IT industry, this idea has been developed to design
a Traffic Classification Method using Data Mining techniques at the intersection of Machine
Learning Algorithm, Which will classify the normal and malicious traffic. This classification will
help to learn about the unknown attacks faced by IT industry. The notion of traffic classification
is not a new concept; plenty of work has been done to classify the network traffic for
heterogeneous application nowadays. Existing techniques such as (payload based, port based
and statistical based) have their own pros and cons which will be discussed in this
literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
Stacking classifiers for anti-spam filtering of e-mail
We evaluate empirically a scheme for combining classifiers, known as stacked
generalization, in the context of anti-spam filtering, a novel cost-sensitive
application of text categorization. Unsolicited commercial e-mail, or "spam",
floods mailboxes, causing frustration, wasting bandwidth, and exposing minors
to unsuitable content. Using a public corpus, we show that stacking can improve
the efficiency of automatically induced anti-spam filters, and that such
filters can be used in real-life applications
Створення та тестування спеціалізованих словників для аналізу тексту
Practitioners in many domains–e.g., clinical psychologists, college instructors,
researchers–collect written responses from clients. A well-developed method that has been applied
to texts from sources like these is the computer application Linguistic Inquiry and Word Count
(LIWC). LIWC uses the words in texts as cues to a person’s thought processes, emotional states,
intentions, and motivations. In the present study, we adopt analytic principles from LIWC and
develop and test an alternative method of text analysis using naïve Bayes methods. We further
show how output from the naïve Bayes analysis can be used for mark up of student work in order
to provide immediate, constructive feedback to students and instructors.Робота фахівців-практиків у багатьох галузях, наприклад, клінічних
психологів, викладачів кол д ів, дослідників п р дбача збір пись ових відповід хніх
клі нтів чи студ нтів. обр розробл ни тод, яки застосову ться сьогодні до т кстів
такого типу, ц ко п’ют рни додаток Linguistic Inquiry and Word Count (LIWC).
Програма LIWC тракту слова в т кстах як індикатори нтальних проц сів людини,
оці них станів, на ірів і отивів. У статті використано аналітичні принципи LIWC,
розробл но та прот стовано альт рнативни тод аналізу т ксту з використання тодів
на вного ба сового класифікатора. Автори д онструють, як р зультати аналізу за на вни
ба сови класифікаторо о уть бути використані для аналізу студ нтсько роботи з
тою надання н га ного, конструктивного зворотного зв’язку і студ нта і викладача
- …