Search CORE

711 research outputs found

Analysis of Models for Decentralized and Collaborative AI on Blockchain

Author: F Rosenblatt
GI Webb
M Li
R Tibshirani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/09/2020
Field of study

Machine learning has recently enabled large advances in artificial intelligence, but these results can be highly centralized. The large datasets required are generally proprietary; predictions are often sold on a per-query basis; and published models can quickly become out of date without effort to acquire more data and maintain them. Published proposals to provide models and data for free for certain tasks include Microsoft Research's Decentralized and Collaborative AI on Blockchain. The framework allows participants to collaboratively build a dataset and use smart contracts to share a continuously updated model on a public blockchain. The initial proposal gave an overview of the framework omitting many details of the models used and the incentive mechanisms in real world scenarios. In this work, we evaluate the use of several models and configurations in order to propose best practices when using the Self-Assessment incentive mechanism so that models can remain accurate and well-intended participants that submit correct data have the chance to profit. We have analyzed simulations for each of three models: Perceptron, Na\"ive Bayes, and a Nearest Centroid Classifier, with three different datasets: predicting a sport with user activity from Endomondo, sentiment analysis on movie reviews from IMDB, and determining if a news article is fake. We compare several factors for each dataset when models are hosted in smart contracts on a public blockchain: their accuracy over time, balances of a good and bad user, and transaction costs (or gas) for deploying, updating, collecting refunds, and collecting rewards. A free and open source implementation for the Ethereum blockchain and simulations written in Python is provided at https://github.com/microsoft/0xDeCA10B. This version has updated gas costs using newer optimizations written after the original publication.Comment: Accepted to ICBC 202

arXiv.org e-Print Archive

Crossref

Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition

Author: D Nion
EE Papalexakis
EE Papalexakis
GI Webb
J Håstad
M Mørup
TG Kolda
VD Blondel
Publication venue
Publication date: 09/11/2018
Field of study

Tensor decompositions are used in various data mining applications from social network to medical applications and are extremely useful in discovering latent structures or concepts in the data. Many real-world applications are dynamic in nature and so are their data. To deal with this dynamic nature of data, there exist a variety of online tensor decomposition algorithms. A central assumption in all those algorithms is that the number of latent concepts remains fixed throughout the entire stream. However, this need not be the case. Every incoming batch in the stream may have a different number of latent concepts, and the difference in latent concepts from one tensor batch to another can provide insights into how our findings in a particular application behave and deviate over time. In this paper, we define "concept" and "concept drift" in the context of streaming tensor decomposition, as the manifestation of the variability of latent concepts throughout the stream. Furthermore, we introduce SeekAndDestroy, an algorithm that detects concept drift in streaming tensor decomposition and is able to produce results robust to that drift. To the best of our knowledge, this is the first work that investigates concept drift in streaming tensor decomposition. We extensively evaluate SeekAndDestroy on synthetic datasets, which exhibit a wide variety of realistic drift. Our experiments demonstrate the effectiveness of SeekAndDestroy, both in the detection of concept drift and in the alleviation of its effects, producing results with similar quality to decomposing the entire tensor in one shot. Additionally, in real datasets, SeekAndDestroy outperforms other streaming baselines, while discovering novel useful components.Comment: 16 Pages, Accepted at ECML-PKDD 201

arXiv.org e-Print Archive

Crossref

Concept Drift Detection in Discrete Streaming Data Using Probabilistic Graphical Models

Author: AR Masegosa
G Widmer
GI Webb
H Borchani
J Gama
J Gama
JC Schlimmer
JM Winn
LL Minku
SL Lauritzen
WL Buntine
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Crossref

Repositorio Institucional Universidad de Granada

Seir immune strategy for instance weighted naive bayes classification

Author: DW Aha
GI Webb
J Wu
J Wu
J Wu
J Wu
JR Quinlan
L Jiang
L Jiang
N Friedman
SB Kim
T Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

© Springer International Publishing Switzerland 2015. Naive Bayes (NB) has been popularly applied in many classification tasks. However, in real-world applications, the pronounced advantage of NB is often challenged by insufficient training samples. Specifically, the high variance may occur with respect to the limited number of training samples. The estimated class distribution of a NB classier is inaccurate if the number of training instances is small. To handle this issue, in this paper, we proposed a SEIR (Susceptible, Exposed, Infectious and Recovered) immune-strategy-based instance weighting algorithm for naive Bayes classification, namely SWNB. The immune instance weighting allows the SWNB algorithm adjust itself to the data without explicit specification of functional or distributional forms of the underlying model. Experiments and comparisons on 20 benchmark datasets demonstrated that the proposed SWNB algorithm outperformed existing state-of-the-art instance weighted NB algorithm and other related computational intelligence methods

Crossref

OPUS - University of Technology Sydney

Fast Generation of Best Interval Patterns for Nonmonotonic Constraints

Author: A Buzmakov
B Ganter
B Ganter
C Roth
F Moerchen
GI Webb
GI Webb
H Yao
J Cao
J Vreeken
N Pasquier
N Tatti
SO Kuznetsov
SO Kuznetsov
SO Kuznetsov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/06/2015
Field of study

International audienceIn pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are neither monotonic nor anti-monotonic, which makes it difficult to generate patterns satisfying these constraints.In this paper we introduce the notion of "generalized monotonicity" and Sofia algorithm that allow generating best patterns in polynomial time for some nonmonotonic constraints modulo constraint computation and pattern extension operations. In particular, this algorithm is polynomial for data on itemsets and interval tuples. In this paper we consider stability and delta-measure which are nonmonotonic constraints and apply them to interval tuple datasets. In the experiments, we compute best interval tuple patterns w.r.t. these measures and show the advantage of our approach over postfiltering approaches

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Influence of topography on tide propagation and amplification in semi-enclosed basins

Author: A Artegiani
AM Davies
AM Davies
AN Staniforth
B Cushman-Roisin
C Garrett
CD Winant
DJ Webb
G Godin
G Godin
G Godin
GI Taylor
Henk M. Schuttelaars
HO Mofjeld
I Janeković
I Janeković
JN Hunt
JTF Zimmerman
K Hidaka
MC Hendershott
MM Rienecker
MS Longuet-Higgins
N Carbajal
P Ripa
PC Roos
Pieter C. Roos
PJ Brown
R Proctor
SE Sabbagh-Yazdi
SG Marinone
SG Marinone
V Malačič
VT Buchwald
Publication venue: Springer
Publication date: 26/09/2010
Field of study

An idealized model for tide propagation and amplification in semi-enclosed rectangular basins is presented, accounting for depth differences by a combination of longitudinal and lateral topographic steps. The basin geometry is formed by several adjacent compartments of identical width, each having either a uniform depth or two depths separated by a transverse topographic step. The problem is forced by an incoming Kelvin wave at the open end, while allowing waves to radiate outward. The solution in each compartment is written as the superposition of (semi)-analytical wave solutions in an infinite channel, individually satisfying the depth-averaged linear shallow water equations on the f plane, including bottom friction. A collocation technique is employed to satisfy continuity of elevation and flux across the longitudinal topographic steps between the compartments. The model results show that the tidal wave in shallow parts displays slower propagation, enhanced dissipation and amplified amplitudes. This reveals a resonance mechanism, occurring when\ud the length of the shallow end is roughly an odd multiple of the quarter Kelvin wavelength. Alternatively, for sufficiently wide basins, also Poincaré waves may become resonant. A transverse step implies different wavelengths of the incoming and reflected Kelvin wave, leading to increased amplitudes in shallow regions and a shift of amphidromic points in the direction of the deeper part. Including the shallow parts near the basin’s closed end (thus capturing the Kelvin resonance mechanism) is essential to reproduce semi-diurnal and diurnal\ud tide observations in the Gulf of California, the Adriatic Sea and the Persian Gulf

Crossref

TU Delft Repository

University of Twente Research Information

Ensembles of jittered association rule classifiers

Author: Alípio Mário Jorge
E Bauer
E Frank
GI Webb
IH Witten
J Demšar
J Gama
JR Quinlan
L Breiman
L Breiman
Paulo J. Azevedo
RE Schapire
SM Weiss
T Hastie
W Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The ensembling of classifiers tends to improve predictive accuracy. To obtain an ensemble with N classifiers, one typically needs to run N learning processes. In this paper we introduce and explore Model Jittering Ensembling, where one single model is perturbed in order to obtain variants that can be used as an ensemble. We use as base classifiers sets of classification association rules. The two methods of jittering ensembling we propose are Iterative Reordering Ensembling (IRE) and Post Bagging (PB). Both methods start by learning one rule set over a single run, and then produce multiple rule sets without relearning. Empirical results on 36 data sets are positive and show that both strategies tend to reduce error with respect to the single model association rule classifier. A bias–variance analysis reveals that while both IRE and PB are able to reduce the variance component of the error, IRE is particularly effective in reducing the bias component. We show that Model Jittering Ensembling can represent a very good speed-up w.r.t. multiple model learning ensembling. We also compare Model Jittering with various state of the art classifiers in terms of predictive accuracy and computational efficiency.This work was partially supported by FCT project Rank! (PTDC/EIA/81178/2006) and by AdI project Palco3.0 financed by QREN and Fundo Europeu de Desenvolvimento Regional (FEDER), and also supported by Fundacao Ciencia e Tecnologia, FEDER e Programa de Financiamento Plurianual de Unidades de I & D. Thanks are due to William Cohen for kindly providing the executable code for the SLIPPER implementation. Our gratitude goes also to our anonymous reviewers who have helped to significantly improve this paper by sharing their knowledge and their informed criticism with the authors

CiteSeerX

Universidade do Minho: RepositoriUM

Crossref

Smoothing a rugged protein folding landscape by sequence-based redesign

Author: Borg NA
Bottomley SP
Buckle AM
Costa MGS
Dai W
Hoke DE
Hollins JJ
Irving JA
Kass I
Keleher S
Marijanovic EM
McGowan S
Nickson AA
Pearce MA
Porebski BT
Webb GI
Whisstock JC
Zhu L
Publication venue: Scientific Reports
Publication date: 26/09/2016
Field of study

\alpha_1

-antitrypsin deficiency. To investigate how serpins balance function and folding, we used consensus design to create

\textit{conserpin}

, a synthetic serpin that folds reversibly, is functional, thermostable, and polymerization resistant. Characterization of its structure, folding and dynamics suggest that consensus design has remodeled the folding landscape to reconcile competing requirements for stability and function. This approach may offer general benefits for engineering functional proteins that have risky folding landscapes, including the removal of aggregation-prone intermediates, and modifying scaffolds for use as protein therapeutics.BTP is a Medical Research Council Career Development Fellow. AAN and JJH are supported by the Wellcome Trust (grant number WT 095195). SM acknowledges fellowship support from the Australian Research Council (FT100100960). NAB is an Australian Research Council Future Fellow (110100223). GIW is an Australian Research Council Discovery Outstanding Researcher Award Fellow (DP140100087). AMB is a National Health and Medical Research Senior Research Fellow (1022688). JCW is an NHMRC Senior Principal Research fellow and also acknowledges the support of an ARC Federation Fellowship. We thank the Australian Synchrotron for beam-time and technical assistance. This work was supported by the Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE) (www.massive.org.au). We acknowledge the Monash Protein Production Unit and Monash Macromolecular Crystallization Facilit

UCL Discovery

PubMed Central

Apollo (Cambridge)

Monash University Research Portal

Smoothing a rugged protein folding landscape by sequence-based redesign

Author: Borg NA
Bottomley SP
Buckle AM
Costa MGS
Dai W
Hoke DE
Hollins JJ
Irving JA
Kass I
Keleher S
Marijanovic EM
McGowan S
Nickson AA
Pearce MA
Porebski BT
Webb GI
Whisstock JC
Zhu L
Publication venue: NATURE PUBLISHING GROUP
Publication date: 26/09/2016
Field of study

The rugged folding landscapes of functional proteins puts them at risk of misfolding and aggregation. Serine protease inhibitors, or serpins, are paradigms for this delicate balance between function and misfolding. Serpins exist in a metastable state that undergoes a major conformational change in order to inhibit proteases. However, conformational labiality of the native serpin fold renders them susceptible to misfolding, which underlies misfolding diseases such as α1-antitrypsin deficiency. To investigate how serpins balance function and folding, we used consensus design to create conserpin, a synthetic serpin that folds reversibly, is functional, thermostable, and polymerization resistant. Characterization of its structure, folding and dynamics suggest that consensus design has remodeled the folding landscape to reconcile competing requirements for stability and function. This approach may offer general benefits for engineering functional proteins that have risky folding landscapes, including the removal of aggregation-prone intermediates, and modifying scaffolds for use as protein therapeutics

UCL Discovery

DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups

Author: A Belfodil
AF Hayes
B Efron
B Ganter
B Ganter
D Eppstein
F Duris
F Lemmerich
GI Webb
H Grosskreutz
M Das
M van Leeuwen
S Geisser
S Minato
SO Kuznetsov
T Cover
W Duivesteijn
W Hämäläinen
W Hämäläinen
Publication venue: HAL CCSD
Publication date: 20/06/2019
Field of study

We strive to find contexts (i.e., subgroups of entities) under which exceptional (dis-)agreement occurs among a group of individuals , in any type of data featuring individuals (e.g., parliamentarians , customers) performing observable actions (e.g., votes, ratings) on entities (e.g., legislative procedures, movies). To this end, we introduce the problem of discovering statistically significant exceptional contextual intra-group agreement patterns. To handle the sparsity inherent to voting and rating data, we use Krippendorff's Alpha measure for assessing the agreement among individuals. We devise a branch-and-bound algorithm , named DEvIANT, to discover such patterns. DEvIANT exploits both closure operators and tight optimistic estimates. We derive analytic approximations for the confidence intervals (CIs) associated with patterns for a computationally efficient significance assessment. We prove that these approximate CIs are nested along specialization of patterns. This allows to incorporate pruning properties in DEvIANT to quickly discard non-significant patterns. Empirical study on several datasets demonstrates the efficiency and the usefulness of DEvIANT. Technical Report Associated with the ECML/PKDD 2019 Paper entitled: "DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups"