Search CORE

1,472 research outputs found

Evaluation methods and decision theory for classification of streaming data with temporal dependence

Author: Bifet Albert
Holmes Geoffrey
Pfahringer Bernhard
Read Jesse
Žliobaitė Indrė
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Predictive modeling on data streams plays an important role in modern data analysis, where data arrives continuously and needs to be mined in real time. In the stream setting the data distribution is often evolving over time, and models that update themselves during operation are becoming the state-of-the-art. This paper formalizes a learning and evaluation scheme of such predictive models. We theoretically analyze evaluation of classifiers on streaming data with temporal dependence. Our findings suggest that the commonly accepted data stream classification measures, such as classification accuracy and Kappa statistic, fail to diagnose cases of poor performance when temporal dependence is present, therefore they should not be used as sole performance indicators. Moreover, classification accuracy can be misleading if used as a proxy for evaluating change detectors with datasets that have temporal dependence. We formulate the decision theory for streaming data classification with temporal dependence and develop a new evaluation methodology for data stream classification that takes temporal dependence into account. We propose a combined measure for classification performance, that takes into account temporal dependence, and we recommend using it as the main performance measure in classification of streaming data

Crossref

Research Commons@Waikato

The effect of locality based learning on software defect prediction

Author: Lemon Bryan
Publication venue: The Research Repository @ WVU
Publication date: 01/08/2010
Field of study

Software defect prediction poses many problems during classification. A common solution used to improve software defect prediction is to train on similar, or local, data to the testing data. Prior work [12, 64] shows that locality improves the performance of classifiers. This approach has been commonly applied to the field of software defect prediction. In this thesis, we compare the performance of many classifiers, both locality based and non-locality based. We propose a novel classifier called Clump, with the goals of improving classification while providing an explanation as to how the decisions were reached. We also explore the effects of standard clustering and relevancy filtering algorithms.;Through experimentation, we show that locality does not improve classification performance when applied to software defect prediction. The performance of the algorithms is impacted more by the datasets used than by the algorithmic choices made. More research is needed to explore locality based learning and the impact of the datasets chosen

The Research Repository @ WVU (West Virginia University)

Credit Card Fraud Detection Using Machine Learning Techniques

Author: Elhusseny Nermin Samy
Idrees Amira M., AMI
ouf shimaa mohamed
Publication venue: Arab Journals Platform
Publication date: 03/07/2022
Field of study

This is a systematic literature review to reflect the previous studies that dealt with credit card fraud detection and highlight the different machine learning techniques to deal with this problem. Credit cards are now widely utilized daily. The globe has just begun to shift toward financial inclusion, with marginalized people being introduced to the financial sector. As a result of the high volume of e-commerce, there has been a significant increase in credit card fraud. One of the most important parts of today\u27s banking sector is fraud detection. Fraud is one of the most serious concerns in terms of monetary losses, not just for financial institutions but also for individuals. as technology and usage patterns evolve, making credit card fraud detection a particularly difficult task. Traditional statistical approaches for identifying credit card fraud take much more time, and the result accuracy cannot be guaranteed. Machine learning algorithms have been widely employed in the detection of credit card fraud. The main goal of this review intends to present the previous research studies accomplished on Credit Card Fraud Detection (CCFD), and how they dealt with this problem by using different machine learning techniques

Arab Journals Platform

Text classification supervised algorithms with term frequency inverse document frequency and global vectors for word representation: a comparative study

Author: Bahassine Said
Benabbes Khalid
Hamou Aadi Fatima Zahrae Ait
Housni Khalid
Labd Zakia
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2024
Field of study

Over the course of the previous two decades, there has been a rise in the quantity of text documents stored digitally. The ability to organize and categorize those documents in an automated mechanism, is known as text categorization which is used to classify them into a set of predefined categories so they may be preserved and sorted more efficiently. Identifying appropriate structures, architectures, and methods for text classification presents a challenge for researchers. This is due to the significant impact this concept has on content management, contextual search, opinion mining, product review analysis, spam filtering, and text sentiment mining. This study analyzes the generic categorization strategy and examines supervised machine learning approaches and their ability to comprehend complex models and nonlinear data interactions. Among these methods are k-nearest neighbors (KNN), support vector machine (SVM), and ensemble learning algorithms employing various evaluation techniques. Thereafter, an evaluation is conducted on the constraints of every technique and how they can be applied to real-life situations

Institute of Advanced Engineering and Science

Recommended from our members

Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

Author: Buttler D J
Cardenas A F
Pon R K
Publication venue: Lawrence Livermore National Laboratory
Publication date: 19/09/2007
Field of study

The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifier with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds

UNT Digital Library

Bayesian networks for classification, clustering, and high-dimensional data visualisation

Author: Ruz Heredia Gonzalo Andres
Publication venue
Publication date
Field of study

This thesis presents new developments for a particular class of Bayesian networks which are limited in the number of parent nodes that each node in the network can have. This restriction yields structures which have low complexity (number of edges), thus enabling the formulation of optimal learning algorithms for Bayesian networks from data. The new developments are focused on three topics: classification, clustering, and high-dimensional data visualisation (topographic map formation). For classification purposes, a new learning algorithm for Bayesian networks is introduced which generates simple Bayesian network classifiers. This approach creates a completely new class of networks which previously was limited mostly to two well known models, the naive Bayesian (NB) classifier and the Tree Augmented Naive Bayes (TAN) classifier. The proposed learning algorithm enhances the NB model by adding a Bayesian monitoring system. Therefore, the complexity of the resulting network is determined according to the input data yielding structures which model the data distribution in a more realistic way which improves the classification performance. Research on Bayesian networks for clustering has not been as popular as for classification tasks. A new unsupervised learning algorithm for three types of Bayesian network classifiers, which enables them to carry out clustering tasks, is introduced. The resulting models can perform cluster assignments in a probabilistic way using the posterior probability of a data point belonging to one of the clusters. A key characteristic of the proposed clustering models, which traditional clustering techniques do not have, is the ability to show the probabilistic dependencies amongst the variables for each cluster. This feature enables a better understanding of each cluster. The final part of this thesis introduces one of the first developments for Bayesian networks to perform topographic mapping. A new unsupervised learning algorithm for the NB model is presented which enables the projection of high-dimensional data into a two-dimensional space for visualisation purposes. The Bayesian network formalism of the model allows the learning algorithm to generate a density model of the input data and the presence of a cost function to monitor the convergence during the training process. These important features are limitations which other mapping techniques have and which have been overcome in this research

Online Research @ Cardiff