Search CORE

299 research outputs found

Analysis of group evolution prediction in complex networks

Author: Bródka Piotr
Kazienko Przemysław
Koziarski Michał
Saganowski Stanisław
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

In the world, in which acceptance and the identification with social communities are highly desired, the ability to predict evolution of groups over time appears to be a vital but very complex research problem. Therefore, we propose a new, adaptable, generic and mutli-stage method for Group Evolution Prediction (GEP) in complex networks, that facilitates reasoning about the future states of the recently discovered groups. The precise GEP modularity enabled us to carry out extensive and versatile empirical studies on many real-world complex / social networks to analyze the impact of numerous setups and parameters like time window type and size, group detection method, evolution chain length, prediction models, etc. Additionally, many new predictive features reflecting the group state at a given time have been identified and tested. Some other research problems like enriching learning evolution chains with external data have been analyzed as well

arXiv.org e-Print Archive

Directory of Open Access Journals

A survey of cost-sensitive decision tree induction algorithms

Author: Bradford J. P.
Elkan C.
Esmeir S.
Esmeir S.
Estruch V.
Fan W.
Ferri C.
Freund Y.
Hart A. E.
Knoll U.
Li J.
Lin F. Y.
Liu X.
Mease D.
Murthy S.
Ni A.
Norton S. W.
Pazzani M.
Quinlan J. R.
Quinlan J. R.
Schapire R. E.
Sunil Vadera
Susan Lomax
Swets J.
Tan M.
Ting K.
Ting K.
Ting K. M.
von Neumann J.
Zadrozny B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2013
Field of study

The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

University of Salford Institutional Repository

Crossref

Click-through rate prediction : a comparative study of ensemble techniques in real-time bidding

Author: Blanc Maria do Canto e Castro Faria
Publication venue
Publication date: 03/04/2019
Field of study

Dissertation presented as a partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Business Intelligence and Knowledge ManagementReal-Time Bidding is an automated mechanism to buy and sell ads in real time that uses data collected from internet users, to accurately deliver the right audience to the best-matched advertisers. It goes beyond contextual advertising by motivating the bidding focused on user data and also, it is different from the sponsored search auction where the bid price is associated with keywords. There is extensive literature regarding the classification and prediction of performance metrics such as click-through-rate, impression rate and bidding price. However, there is limited research on the application of advanced machine learning techniques, such as ensemble methods, on predicting click-through rate of real-time bidding campaigns. This paper presents an in-depth analysis of predicting click-through rate in real-time bidding campaigns by comparing the classification results from six traditional classification models (Linear Discriminant Analysis, Logistic Regression, Regularised Regression, Decision trees, k-nearest neighbors and Support Vector Machines) with two popular ensemble learning techniques (Voting and BootStrap Aggregation). The goal of our research is to determine whether ensemble methods can accurately predict click-through rate and compared to standard classifiers. Results showed that ensemble techniques outperformed simple classifiers performance. Moreover, also, highlights the excellent performance of linear algorithms (Linear Discriminant Analysis and Regularized Regression)

Repositório da Universidade Nova de Lisboa

An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts

Author: Syed Ziaur Rahman
Publication venue: Global Journals Inc. (US)
Publication date: 14/05/2014
Field of study

In Data mining and Knowledge Discovery hidden and valuable knowledge from the data sources is discovered. The traditional algorithms used for knowledge discovery are bottle necked due to wide range of data sources availability. Class imbalance is a one of the problem arises due to data source which provide unequal class i.e. examples of one class in a training data set vastly outnumber examples of the other class(es). Researchers have rigorously studied several techniques to alleviate the problem of class imbalance, including resampling algorithms, and feature selection approaches to this problem. In this paper, we present a new hybrid frame work dubbed as Majority Under-sampling based on Cluster Disjunct (MAJOR_CD) for learning from skewed training data. This algorithm provides a simpler and faster alternative by using cluster disjunct concept. We conduct experiments using twelve UCI data sets from various application domains using five algorithms for comparison on six evaluation metrics. The empirical study suggests that MAJOR_CD have been believed to be effective in addressing the class imbalance problem

Global Journal of Computer Science and Technology (GJCST)

EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data

Author: Alcalá-Fdez
Alpaydin
Barua
Batista
Blaszczynski
Breiman
Cano
Castro
Chawla
Chris Cornelis
Cover
Das
Datta
Demšar
Díez-Pastor
Fawcett
Friedman
Galar
García
García
García
García
García-Pedrajas
Hand
He
Hido
Isaac Triguero
Khoshgoftaar
Kononenko
Krawczyk
Krawczyk
Kuncheva
Lee
Lin
López
López
Neri
Pawlak
Ramentol
Sarah Vluymans
Schapire
Seiffert
Storn
Ting
Triguero
Triguero
Triguero
Triguero
Wang
Wilson
Wilson
Yijing
Yu
Yule
Yvan Saeys
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored to handle class imbalance, called EPRENNID. It performs an evolutionary prototype reduction focused on providing diverse solutions to prevent the method from overfitting the training set. It also allows us to explicitly reduce the underrepresented class, which the most common preprocessing solutions handling class imbalance usually protect. As part of the experimental study, we show that the proposed prototype reduction method outperforms state-of-the-art preprocessing techniques. The preprocessing step yields multiple prototype sets that are later used in an ensemble, performing a weighted voting scheme with the nearest neighbor classifier. EPRENNID is experimentally shown to significantly outperform previous proposals

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Ghent University Academic Bibliography

Comparative analysis using supervised learning methods in anti-money laundering of Bitcoin data

Author: Alarab I.
Nacer M.I.
Prakoonwit Simant
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/06/2020
Field of study

With the advance of Bitcoin technology, money laundering has been incentivised as a den of Bitcoin blockchain, in which the user's identity is hidden behind a pseudonym known as address. Although this trait permits concealing in the plain sight, the public ledger of Bitcoin blockchain provides more power for investigators and allows collective intelligence for anti-money laundering and forensic analysis. This fascinating paradox arises in the strength of Bitcoin technology. Machine learning techniques have attained promising results in forensic analysis, in order to spot suspicious behaviour in Bitcoin blockchain. This paper presents a comparative analysis of the performance of classical supervised learning methods using a recently published data set derived from Bitcoin blockchain, to predict licit and illicit transactions in the network. Besides, an ensemble learning method is utilised using a combination of the given supervised learning models, which outperforms the given classical methods. This experiment is performed using a newly published data set derived from Bitcoin blockchain. Our main contribution points out that using ensemble learning approach outperforms the performance of the classical learning models used in the original paper, using Elliptic data set, a time series of Bitcoin transaction graph with node transactions and directed payments flow edges. Using the same data set, we show that we are able to predict licit/illicit transactions with an accuracy of 98.13% and F1 score equals to 83.36% using the proposed method. We discuss the variety of supervised learning methods, and their capabilities of assisting forensic analysis, and propose future work directions

Crossref

Bournemouth University Research Online

PROPAGATION OF MISCLASSIFIED INSTANCES TO HANDLE NONSTATIONARY IMBALANCED DATA STREAM

Author: MEENAKSHI A. THALOR
S. T. PATIL
Publication venue: Taylor's University
Publication date: 01/04/2018
Field of study

Learning on the data stream with nonstationary and imbalanced property is an interesting and complicated problem in data mining as change in class distribution may result in class unbalancing. Many real time problems like intrusion detection, credit card fraud detection, weather forecasting and many more applications suffer concept drift as well as class imbalance as they change with time. The rationale of this paper is to present an effective learning for nonstationary imbalanced data stream which emphasis on misclassified examples with the focus on two-class problems. At the end of paper, proposed algorithms is compared with existing similar approaches using various evaluation metrics

Directory of Open Access Journals

Stacked Generalizations in Imbalanced Fraud Data Sets using Resampling Methods

Author: Bastian Nathaniel D.
Kerwin Kathleen
Publication venue
Publication date: 03/04/2020
Field of study

This study uses stacked generalization, which is a two-step process of combining machine learning methods, called meta or super learners, for improving the performance of algorithms in step one (by minimizing the error rate of each individual algorithm to reduce its bias in the learning set) and then in step two inputting the results into the meta learner with its stacked blended output (demonstrating improved performance with the weakest algorithms learning better). The method is essentially an enhanced cross-validation strategy. Although the process uses great computational resources, the resulting performance metrics on resampled fraud data show that increased system cost can be justified. A fundamental key to fraud data is that it is inherently not systematic and, as of yet, the optimal resampling methodology has not been identified. Building a test harness that accounts for all permutations of algorithm sample set pairs demonstrates that the complex, intrinsic data structures are all thoroughly tested. Using a comparative analysis on fraud data that applies stacked generalizations provides useful insight needed to find the optimal mathematical formula to be used for imbalanced fraud data sets.Comment: 19 pages, 3 figures, 8 table

arXiv.org e-Print Archive

USMA Digital Commons (United States Military Academy, West Point)

Selecting Representative Data Sets

Author: Borovicka Tomas
Jirina Jr., Marcel
Jirina Marcel
Kordik Pavel
Publication venue: 'IntechOpen'
Publication date: 12/09/2012
Field of study

IntechOpen

Rule-based classification approach for railway wagon health monitoring

Author: Shafiullah G. M.
Shawkat Ali A. B. M.
Thompson Adam
Wolfs Peter J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Modern machine learning techniques have encouraged interest in the development of vehicle health monitoring systems that ensure secure and reliable operations of rail vehicles. In an earlier study, an energy-efficient data acquisition method was investigated to develop a monitoring system for railway applications using modern machine learning techniques, more specific classification algorithms. A suitable classifier was proposed for railway monitoring based on relative weighted performance metrics. To improve the performance of the existing approach, a rule-based learning method using statistical analysis has been proposed in this paper to select a unique classifier for the same application. This selected algorithm works more efficiently and improves the overall performance of the railway monitoring systems. This study has been conducted using six classifiers, namely REPTree, J48, Decision Stump, IBK, PART and OneR, with twenty-five datasets. The Waikato Environment for Knowledge Analysis (WEKA) learning tool has been used in this study to develop the prediction models

Deakin Research Online