Search CORE

100 research outputs found

Deep Recurrent Modelling of Stationary Bitcoin Price Formation Using the Order Flow

Author: DMW Powers
E Bacry
M Dixon
MD Gould
R Cont
S Hochreiter
Publication venue
Publication date: 31/03/2020
Field of study

In this paper we propose a deep recurrent model based on the order flow for the stationary modelling of the high-frequency directional prices movements. The order flow is the microsecond stream of orders arriving at the exchange, driving the formation of prices seen on the price chart of a stock or currency. To test the stationarity of our proposed model we train our model on data before the 2017 Bitcoin bubble period and test our model during and after the bubble. We show that without any retraining, the proposed model is temporally stable even as Bitcoin trading shifts into an extremely volatile "bubble trouble" period. The significance of the result is shown by benchmarking against existing state-of-the-art models in the literature for modelling price formation using deep learning.Comment: 10 pages, The 19th International Conference on Artificial Intelligence and Soft Computin

arXiv.org e-Print Archive

Crossref

UCL Discovery

A matter of words: NLP for quality evaluation of Wikipedia medical articles

Author: B Stvilia
DMW Powers
E Marzini
F Cabitza
G Pasi
K Wecel
K Wu
M Hall
NV Chawla
O Bodenreider
SA Azer
TL Saaty
TM Cover
Publication venue
Publication date: 01/01/2016
Field of study

Automatic quality evaluation of Web information is a task with many fields of applications and of great relevance, especially in critical domains like the medical one. We move from the intuition that the quality of content of medical Web documents is affected by features related with the specific domain. First, the usage of a specific vocabulary (Domain Informativeness); then, the adoption of specific codes (like those used in the infoboxes of Wikipedia articles) and the type of document (e.g., historical and technical ones). In this paper, we propose to leverage specific domain features to improve the results of the evaluation of Wikipedia medical articles. In particular, we evaluate the articles adopting an "actionable" model, whose features are related to the content of the articles, so that the model can also directly suggest strategies for improving a given article quality. We rely on Natural Language Processing (NLP) and dictionaries-based techniques in order to extract the bio-medical concepts in a text. We prove the effectiveness of our approach by classifying the medical articles of the Wikipedia Medicine Portal, which have been previously manually labeled by the Wiki Project team. The results of our experiments confirm that, by considering domain-oriented features, it is possible to obtain sensible improvements with respect to existing solutions, mainly for those articles that other approaches have less correctly classified. Other than being interesting by their own, the results call for further research in the area of domain specific features suitable for Web data quality assessment

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Archivio della ricerca- Università di Roma La Sapienza

Online Research Database In Technology

Archivio istituzionale della ricerca - Università di Padova

Determining the Veracity of Rumours on Twitter

Author: AJ Smola
C Castillo
CM Bishop
D Koller
DMW Powers
F Pedregosa
GR Lomax
I Guyon
J Fox
J Mai
JRC Nurse
JRC Nurse
JW Pennebaker
K Kelton
M Verleysen
R Lukyanenko
RY Wang
T Hastie
Y Gil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

While social networks can provide an ideal platform for up-to-date information from individuals across the world, it has also proved to be a place where rumours fester and accidental or deliberate mis- information often emerges. In this article, we aim to support the task of making sense from social media data, and specifically, seek to build an autonomous message-classifier that filters relevant and trustworthy information from Twitter. For our work, we collected about 100 million public tweets, including users’ past tweets, from which we identified 72 rumours (41 true, 31 false). We considered over 80 trustworthiness measures including the authors’ profile and past behaviour, the social network connections (graphs), and the content of tweets themselves. We ran modern machine-learning classifiers over those measures to produce trustworthiness scores at various time windows from the outbreak of the rumour. Such time-windows were key as they allowed useful insight into the progression of the rumours. From our findings, we identified that our model was significantly more accurate than similar studies in the literature. We also identified critical attributes of the data that give rise to the trustworthiness scores assigned. Finally we developed a software demonstration that provides a visual user interface to allow the user to examine the analysis

arXiv.org e-Print Archive

Central Archive at the University of Reading

Crossref

Open Research Online (The Open University)

Oxford University Research Archive

Kent Academic Repository

A strategy to incorporate prior knowledge into correlation network cutoff selection

Author: A Fabregat
A-L Barabási
AK Rider
B Pei
B Zhang
BW Matthews
CE Shannon
D Croft
D Szklarczyk
D Szklarczyk
D Szklarczyk
DMW Powers
E Benedetti
F Dieterle
G Altay
G Camilli
G Sales
G Sales
H Carter
I Rudan
J Krumsiek
J Krumsiek
J Krumsiek
J Linde
J Schafer
J Schäfer
JE Huffman
JN Weinstein
K Baba
KA Hoadley
KT Do
M Ante
M Balbin
M Giurgiu
MHJ Selman
N Swainston
OJ Dunn
P Langfelder
R Albert
R Jefferis
R Tibshirani
S Hammoudeh
S Kim
V Stavrakas
Y Benjamini
Y Li
Y Yang
Y Zuo
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Correlation networks are frequently used to statistically extract biological interactions between omics markers. Network edge selection is typically based on the statistical significance of the correlation coefficients. This procedure, however, is not guaranteed to capture biological mechanisms. We here propose an alternative approach for network reconstruction: a cutoff selection algorithm that maximizes the overlap of the inferred network with available prior knowledge. We first evaluate the approach on IgG glycomics data, for which the biochemical pathway is known and well-characterized. Importantly, even in the case of incomplete or incorrect prior knowledge, the optimal network is close to the true optimum. We then demonstrate the generalizability of the approach with applications to untargeted metabolomics and transcriptomics data. For the transcriptomics case, we demonstrate that the optimized network is superior to statistical networks in systematically retrieving interactions that were not included in the biological reference used for optimization

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer

PuSH

MPG.PuRe

Recommended from our members

A computational study on outliers in world music

Author: A Flexer
A Holzapfel
A Honingh
A Livshin
A Lomax
A Lomax
B Nettl
B Nettl
BL Sturm
C Guastavino
C Panagiotakis
Chun-Hsi Huang
CM Bishop
CT Lu
D Bountouridis
D Chen
D Clarke
D Schnitzer
DMW Powers
E Gómez
Emmanouil Benetos
F Pachet
G Tzanetakis
G Tzanetakis
G Tzanetakis
H Lee
I Ben-Gal
J Salamon
J Serrà
J Serrà
JJ Aucouturier
JP Bello
JS Downie
JS Downie
JT Titon
L Sun
M Mauch
M Müller
M Schedl
MA Bartsch
MA Schmuckler
Maria Panteli
N Kroher
P Casas
P Filzmoser
P Toiviainen
PE Savage
PE Savage
PE Savage
PJ Rousseeuw
PV Bohlman
R Typke
S Abdallah
S Bhattacharyya
S Brown
S Le Bomin
S McAdams
S Sadie
SC Johnson
SE Trehub
Simon Dixon
T Collins
T Rzeszutek
TH Grubesic
V Hodge
Y Lu
Z Fu
Z Fu
Ò Celma
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country

City Research Online

Crossref

Directory of Open Access Journals

Queen Mary Research Online

Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles

Author: A Field
A Juneja
A Salomon
AM Abdelatti Ali
B Schölkopf
Barbara Schuppler
C Cortes
CY Espy-Wilson
DMW Powers
F Metze
F Pernkopf
J Frankel
JM Kessens
K Johnson
K Kirchhoff
K Manjunath
KJ Kohler
M Saraçlar
O Scharenborg
O Scharenborg
P Niyogi
R Ogden
S Chang
S Greenberg
S King
S King
SM Siniscalchi
T Pruthi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

TUGraz OPEN Library

Neuropsychological predictors of conversion from mild cognitive impairment to Alzheimer’s disease: a feature selection ensemble combining stability and predictability

Author: A Ben
A Kalousis
AL Blum
AL Spedding
Alexandre de Mendonça
Alzheimer Association
American Psychiatric Association
AV Carreiro
B Seijo-Pardo
BC Dickerson
C Bastin
C Cabral
C Salvatore
D Silva
DE Barnes
Dina Silva
DMW Powers
E Grober
E Moradi
F Portet
Francisco L. Ferreira
G Zhao
H Amieva
I Guyon
I Guyon
I Kononenko
J Demsar
J Li
J Maroco
J Ye
JL Lustgarten
L Nanni
L Vandewater
M Guerreiro
M Irish
M Prince
M Prince
Manuela Guerreiro
MJ Summers
N Meinshausen
NM Samtani
NV Chawla
OM Doyle
P Johnson
P Langley
P Scheltens
P Willett
P Yang
RC Petersen
RE Schapire
S Belleville
S Nogueira
Sandra Cardoso
Sara C. Madeira
SF Eskildsen
SG Mueller
SI Dimitriadis
SJ Lee
T Hastie
T Pereira
Telma Pereira
V Bolón-canedo
Y Saeys
Z-H Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Background Predicting progression from Mild Cognitive Impairment (MCI) to Alzheimer’s Disease (AD) is an utmost open issue in AD-related research. Neuropsychological assessment has proven to be useful in identifying MCI patients who are likely to convert to dementia. However, the large battery of neuropsychological tests (NPTs) performed in clinical practice and the limited number of training examples are challenge to machine learning when learning prognostic models. In this context, it is paramount to pursue approaches that effectively seek for reduced sets of relevant features. Subsets of NPTs from which prognostic models can be learnt should not only be good predictors, but also stable, promoting generalizable and explainable models. Methods We propose a feature selection (FS) ensemble combining stability and predictability to choose the most relevant NPTs for prognostic prediction in AD. First, we combine the outcome of multiple (filter and embedded) FS methods. Then, we use a wrapper-based approach optimizing both stability and predictability to compute the number of selected features. We use two large prospective studies (ADNI and the Portuguese Cognitive Complaints Cohort, CCC) to evaluate the approach and assess the predictive value of a large number of NPTs. Results The best subsets of features include approximately 30 and 20 (from the original 79 and 40) features, for ADNI and CCC data, respectively, yielding stability above 0.89 and 0.95, and AUC above 0.87 and 0.82. Most NPTs learnt using the proposed feature selection ensemble have been identified in the literature as strong predictors of conversion from MCI to AD. Conclusions The FS ensemble approach was able to 1) identify subsets of stable and relevant predictors from a consensus of multiple FS methods using baseline NPTs and 2) learn reliable prognostic models of conversion from MCI to AD using these subsets of features. The machine learning models learnt from these features outperformed the models trained without FS and achieved competitive results when compared to commonly used FS algorithms. Furthermore, the selected features are derived from a consensus of methods thus being more robust, while releasing users from choosing the most appropriate FS method to be used in their classification task.PTDC/EEI-SII/1937/2014; SFRH/BD/95846/2013; SFRH/BD/118872/2016info:eu-repo/semantics/publishedVersio

Crossref

Directory of Open Access Journals

Universidade de Lisboa: Repositório.UL

Sapientia

Modeling immersive media experiences by sensing impact on subjects

Author: AM von der Pütten
AR Dores
B Blankertz
DJ Schutter
DMW Powers
Eleni Kroupi
J Kittler
J Pan
J Schäfer
Jong-Seok Lee
JS Lee
K Muller
KC Bilchick
M Emoto
M Slater
M Slater
Martin Rerabek
MV Sanchez-Vives
O Jensen
Philippe Hanhart
R Davidson
R Picard
S Scholler
SD Kulkarni
Touradj Ebrahimi
V Hayward
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Predicting the valence of a scene from observers’ eye movements

Author: A Borji
A Bulling
Adham Atyabi
AL Yarbus
Antti Rantanen
C Lithari
D Parkhurst
DE Irwin
DL Olson
DMW Powers
F Vitu
H van Steenbergen
HA Wadlinger
Hamed R.-Tavakoli
J Goh
J Mikels
J Simola
Janne Heikkilä
JG Tichon
JM Henderson
JM Susskind
K Humphrey
K Mogg
KT Ma
L Itti
L Nummenmaa
L Wang
M Shahrokh Esfahani
MA Just
MM Bradley
MNA Wahab
MR Greene
NM Chen
Peter James Hills
PJ Lang
S Sanei
Samia Nefti-Meziani
Seppo J. Laukka
T Anderson
Y Niu
YW Chen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Multimedia analysis benefits from understanding the emotional content of a scene in a variety of tasks such as video genre classification and content-based image retrieval. Recently, there has been an increasing interest in applying human bio-signals, particularly eye movements, to recognize the emotional gist of a scene such as its valence. In order to determine the emotional category of images using eye movements, the existing methods often learn a classifier using several features that are extracted from eye movements. Although it has been shown that eye movement is potentially useful for recognition of scene valence, the contribution of each feature is not well-studied. To address the issue, we study the contribution of features extracted from eye movements in the classification of images into pleasant, neutral, and unpleasant categories. We assess ten features and their fusion. The features are histogram of saccade orientation, histogram of saccade slope, histogram of saccade length, histogram of saccade duration, histogram of saccade velocity, histogram of fixation duration, fixation histogram, top-ten salient coordinates, and saliency map. We utilize machine learning approach to analyze the performance of features by learning a support vector machine and exploiting various feature fusion schemes. The experiments reveal that ‘saliency map’, ‘fixation histogram’, ‘histogram of fixation duration’, and ‘histogram of saccade slope’ are the most contributing features. The selected features signify the influence of fixation information and angular behavior of eye movements in the recognition of the valence of images

Public Library of Science (PLOS)

University of Salford Institutional Repository

Crossref

Directory of Open Access Journals

PubMed Central

Aaltodoc Publication Archive

FigShare