Search CORE

28,008 research outputs found

Data mining for detecting Bitcoin Ponzi schemes

Author: Bartoletti Massimo
Pes Barbara
Serusi Sergio
Publication venue
Publication date: 01/01/2018
Field of study

Soon after its introduction in 2009, Bitcoin has been adopted by cyber-criminals, which rely on its pseudonymity to implement virtually untraceable scams. One of the typical scams that operate on Bitcoin are the so-called Ponzi schemes. These are fraudulent investments which repay users with the funds invested by new users that join the scheme, and implode when it is no longer possible to find new investments. Despite being illegal in many countries, Ponzi schemes are now proliferating on Bitcoin, and they keep alluring new victims, who are plundered of millions of dollars. We apply data mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our starting point is a dataset of features of real-world Ponzi schemes, that we construct by analysing, on the Bitcoin blockchain, the transactions used to perform the scams. We use this dataset to experiment with various machine learning algorithms, and we assess their effectiveness through standard validation protocols and performance metrics. The best of the classifiers we have experimented can identify most of the Ponzi schemes in the dataset, with a low number of false positives

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Cagliari

Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network

Author: Buntine Wray
Lim Kar Wai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/09/2016
Field of study

Bibliographic analysis considers the author's research areas, the citation network and the paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents, using a nonparametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. This gives rise to the Citation Network Topic Model (CNTM). We propose a novel and efficient inference algorithm for the CNTM to explore subsets of research publications from CiteSeerX. The publication datasets are organised into three corpora, totalling to about 168k publications with about 62k authors. The queried datasets are made available online. In three publicly available corpora in addition to the queried datasets, our proposed model demonstrates an improved performance in both model fitting and document clustering, compared to several baselines. Moreover, our model allows extraction of additional useful knowledge from the corpora, such as the visualisation of the author-topics network. Additionally, we propose a simple method to incorporate supervision into topic modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin

arXiv.org e-Print Archive

The Australian National University

Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks

Author: Baldwin Timothy
Cohn Trevor
Rahimi Afshin
Publication venue
Publication date: 01/01/2017
Field of study

We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical dialectology, and evaluate it using the DARE dataset.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP 2017) September 2017, Copenhagen, Denmar

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

An Overview of the Use of Neural Networks for Data Mining Tasks

Author: Alberts B
Alpaydin E
Ando T
Blake CL
Bramer MA
Castanheira LG
Han J
Lu H
Mitchell M
Ni X
Quinlan RJ
Rumelhart DE
Shafer JC
Shendure J
Simić D
Stahl F
Steinwart I
Surjandari I
Wei JS
Widrow B
Witten IH
Zaslavsky B
Zhang D
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

Central Archive at the University of Reading

Crossref

Portsmouth University Research Portal (Pure)

Bournemouth University Research Online

Testing stock market convergence: a non-linear factor approach

Author: A Bernard
B Hobijn
Burcu Erdogan
F Busetti
G Bekaert
GA Hardouvelis
Guglielmo Maria Caporale
HJ Stock
J Bai
JM Campa
M Fratzscher
MA Ferreira
MA Ferreira
N Islam
PC Phillips
R Brooks
RJ Barro
RJ Barro
S Cavaglia
S Heston
SP Baca
Vladimir Kuzin
WN Goetzmann
Z Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/04/2014
Field of study

This paper applies the Phillips and Sul (Econometrica 75(6):1771–1855, 2007) method to test for convergence in stock returns to an extensive dataset including monthly stock price indices for five EU countries (Germany, France, the Netherlands, Ireland and the UK) as well as the US between 1973 and 2008. We carry out the analysis on both sectors and individual industries within sectors. As a first step, we use the Stock and Watson (J Am Stat Assoc 93(441):349–358, 1998) procedure to filter the data in order to extract the long-run component of the series; then, following Phillips and Sul (Econometrica 75(6):1771–1855, 2007), we estimate the relative transition parameters. In the case of sectoral indices we find convergence in the middle of the sample period, followed by divergence, and detect four (two large and two small) clusters. The analysis at a disaggregate, industry level again points to convergence in the middle of the sample, and subsequent divergence, but a much larger number of clusters is now found. Splitting the cross-section into two subgroups including euro area countries, the UK and the US respectively, provides evidence of a global convergence/divergence process not obviously influenced by EU policies

Crossref

Brunel University Research Archive

Bayesian nonparametric sparse VAR models

Author: Billio Monica
Casarin Roberto
Rossini Luca
Publication venue
Publication date: 29/10/2018
Field of study

High dimensional vector autoregressive (VAR) models require a large number of parameters to be estimated and may suffer of inferential problems. We propose a new Bayesian nonparametric (BNP) Lasso prior (BNP-Lasso) for high-dimensional VAR models that can improve estimation efficiency and prediction accuracy. Our hierarchical prior overcomes overparametrization and overfitting issues by clustering the VAR coefficients into groups and by shrinking the coefficients of each group toward a common location. Clustering and shrinking effects induced by the BNP-Lasso prior are well suited for the extraction of causal networks from time series, since they account for some stylized facts in real-world networks, which are sparsity, communities structures and heterogeneity in the edges intensity. In order to fully capture the richness of the data and to achieve a better understanding of financial and macroeconomic risk, it is therefore crucial that the model used to extract network accounts for these stylized facts.Comment: Forthcoming in "Journal of Econometrics" ---- Revised Version of the paper "Bayesian nonparametric Seemingly Unrelated Regression Models" ---- Supplementary Material available on reques

arXiv.org e-Print Archive

VU Research Portal

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari