Search CORE

7,430 research outputs found

Data mining for detecting Bitcoin Ponzi schemes

Author: Bartoletti Massimo
Pes Barbara
Serusi Sergio
Publication venue
Publication date: 01/01/2018
Field of study

Soon after its introduction in 2009, Bitcoin has been adopted by cyber-criminals, which rely on its pseudonymity to implement virtually untraceable scams. One of the typical scams that operate on Bitcoin are the so-called Ponzi schemes. These are fraudulent investments which repay users with the funds invested by new users that join the scheme, and implode when it is no longer possible to find new investments. Despite being illegal in many countries, Ponzi schemes are now proliferating on Bitcoin, and they keep alluring new victims, who are plundered of millions of dollars. We apply data mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our starting point is a dataset of features of real-world Ponzi schemes, that we construct by analysing, on the Bitcoin blockchain, the transactions used to perform the scams. We use this dataset to experiment with various machine learning algorithms, and we assess their effectiveness through standard validation protocols and performance metrics. The best of the classifiers we have experimented can identify most of the Ponzi schemes in the dataset, with a low number of false positives

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Cagliari

Separation of pulsar signals from noise with supervised machine learning algorithms

Author: Bethapudi Suryarao
Desai Shantanu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ), Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pi peline. This dataset was previously used for cross-validation of the SPINN-based machine learning engine, used for the reprocessing of HTRU-S survey data arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique (SMOTE) to deal with high class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for the same recall value.Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and Computin

arXiv.org e-Print Archive

Research Archive of Indian Institute of Technology Hyderabad

A Novel GAN-based Fault Diagnosis Approach for Imbalanced Industrial Time Series

Author: Cheng Cheng
Jiang Wenqian
Ma Guijun
Yuan Ye
Zhou Beitong
Publication venue
Publication date: 01/04/2019
Field of study

This paper proposes a novel fault diagnosis approach based on generative adversarial networks (GAN) for imbalanced industrial time series where normal samples are much larger than failure cases. We combine a well-designed feature extractor with GAN to help train the whole network. Aimed at obtaining data distribution and hidden pattern in both original distinguishing features and latent space, the encoder-decoder-encoder three-sub-network is employed in GAN, based on Deep Convolution Generative Adversarial Networks (DCGAN) but without Tanh activation layer and only trained on normal samples. In order to verify the validity and feasibility of our approach, we test it on rolling bearing data from Case Western Reserve University and further verify it on data collected from our laboratory. The results show that our proposed approach can achieve excellent performance in detecting faulty by outputting much larger evaluation scores

arXiv.org e-Print Archive

Shallow Triple Stream Three-dimensional CNN (STSTNet) for Micro-expression Recognition

Author: Gan Y. S.
Huang Yen-Chang
Khor Huai-Qian
Liong Sze-Teng
See John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

In the recent year, state-of-the-art for facial micro-expression recognition have been significantly advanced by deep neural networks. The robustness of deep learning has yielded promising performance beyond that of traditional handcrafted approaches. Most works in literature emphasized on increasing the depth of networks and employing highly complex objective functions to learn more features. In this paper, we design a Shallow Triple Stream Three-dimensional CNN (STSTNet) that is computationally light whilst capable of extracting discriminative high level features and details of micro-expressions. The network learns from three optical flow features (i.e., optical strain, horizontal and vertical optical flow fields) computed based on the onset and apex frames of each video. Our experimental results demonstrate the effectiveness of the proposed STSTNet, which obtained an unweighted average recall rate of 0.7605 and unweighted F1-score of 0.7353 on the composite database consisting of 442 samples from the SMIC, CASME II and SAMM databases.Comment: 5 pages, 1 figure, Accepted and published in IEEE FG 201

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

SHDL@MMU Digital Repository

Enhanced Machine Learning Techniques for Early HARQ Feedback Prediction in 5G

Author: Göktepe Barış
Hellge Cornelius
Samek Wojciech
Schierl Thomas
Strodthoff Nils
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/10/2019
Field of study

We investigate Early Hybrid Automatic Repeat reQuest (E-HARQ) feedback schemes enhanced by machine learning techniques as a path towards ultra-reliable and low-latency communication (URLLC). To this end, we propose machine learning methods to predict the outcome of the decoding process ahead of the end of the transmission. We discuss different input features and classification algorithms ranging from traditional methods to newly developed supervised autoencoders. These methods are evaluated based on their prospects of complying with the URLLC requirements of effective block error rates below

10^{-5}

at small latency overheads. We provide realistic performance estimates in a system model incorporating scheduling effects to demonstrate the feasibility of E-HARQ across different signal-to-noise ratios, subcode lengths, channel conditions and system loads, and show the benefit over regular HARQ and existing E-HARQ schemes without machine learning.Comment: 14 pages, 15 figures; accepted versio

arXiv.org e-Print Archive

Fraunhofer-ePrints

Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission

Author: Huang Kaibin
Liu Dongzhu
Zhang Jun
Zhu Guangxu
Publication venue
Publication date: 01/01/2019
Field of study

By deploying machine-learning algorithms at the network edge, edge learning can leverage the enormous real-time data generated by billions of mobile devices to train AI models, which enable intelligent mobile applications. In this emerging research area, one key direction is to efficiently utilize radio resources for wireless data acquisition to minimize the latency of executing a learning task at an edge server. Along this direction, we consider the specific problem of retransmission decision in each communication round to ensure both reliability and quantity of those training data for accelerating model convergence. To solve the problem, a new retransmission protocol called data-importance aware automatic-repeat-request (importance ARQ) is proposed. Unlike the classic ARQ focusing merely on reliability, importance ARQ selectively retransmits a data sample based on its uncertainty which helps learning and can be measured using the model under training. Underpinning the proposed protocol is a derived elegant communication-learning relation between two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data uncertainty. This relation facilitates the design of a simple threshold based policy for importance ARQ. The policy is first derived based on the classic classifier model of support vector machine (SVM), where the uncertainty of a data sample is measured by its distance to the decision boundary. The policy is then extended to the more complex model of convolutional neural networks (CNN) where data uncertainty is measured by entropy. Extensive experiments have been conducted for both the SVM and CNN using real datasets with balanced and imbalanced distributions. Experimental results demonstrate that importance ARQ effectively copes with channel fading and noise in wireless data acquisition to achieve faster model convergence than the conventional channel-aware ARQ.Comment: This is an updated version: 1) extension to general classifiers; 2) consideration of imbalanced classification in the experiments. Submitted to IEEE Journal for possible publicatio

arXiv.org e-Print Archive

Crossref

King's Research Portal