710 research outputs found
Analysis of Models for Decentralized and Collaborative AI on Blockchain
Machine learning has recently enabled large advances in artificial
intelligence, but these results can be highly centralized. The large datasets
required are generally proprietary; predictions are often sold on a per-query
basis; and published models can quickly become out of date without effort to
acquire more data and maintain them. Published proposals to provide models and
data for free for certain tasks include Microsoft Research's Decentralized and
Collaborative AI on Blockchain. The framework allows participants to
collaboratively build a dataset and use smart contracts to share a continuously
updated model on a public blockchain. The initial proposal gave an overview of
the framework omitting many details of the models used and the incentive
mechanisms in real world scenarios. In this work, we evaluate the use of
several models and configurations in order to propose best practices when using
the Self-Assessment incentive mechanism so that models can remain accurate and
well-intended participants that submit correct data have the chance to profit.
We have analyzed simulations for each of three models: Perceptron, Na\"ive
Bayes, and a Nearest Centroid Classifier, with three different datasets:
predicting a sport with user activity from Endomondo, sentiment analysis on
movie reviews from IMDB, and determining if a news article is fake. We compare
several factors for each dataset when models are hosted in smart contracts on a
public blockchain: their accuracy over time, balances of a good and bad user,
and transaction costs (or gas) for deploying, updating, collecting refunds, and
collecting rewards. A free and open source implementation for the Ethereum
blockchain and simulations written in Python is provided at
https://github.com/microsoft/0xDeCA10B. This version has updated gas costs
using newer optimizations written after the original publication.Comment: Accepted to ICBC 202
Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition
Tensor decompositions are used in various data mining applications from
social network to medical applications and are extremely useful in discovering
latent structures or concepts in the data. Many real-world applications are
dynamic in nature and so are their data. To deal with this dynamic nature of
data, there exist a variety of online tensor decomposition algorithms. A
central assumption in all those algorithms is that the number of latent
concepts remains fixed throughout the entire stream. However, this need not be
the case. Every incoming batch in the stream may have a different number of
latent concepts, and the difference in latent concepts from one tensor batch to
another can provide insights into how our findings in a particular application
behave and deviate over time. In this paper, we define "concept" and "concept
drift" in the context of streaming tensor decomposition, as the manifestation
of the variability of latent concepts throughout the stream. Furthermore, we
introduce SeekAndDestroy, an algorithm that detects concept drift in streaming
tensor decomposition and is able to produce results robust to that drift. To
the best of our knowledge, this is the first work that investigates concept
drift in streaming tensor decomposition. We extensively evaluate SeekAndDestroy
on synthetic datasets, which exhibit a wide variety of realistic drift. Our
experiments demonstrate the effectiveness of SeekAndDestroy, both in the
detection of concept drift and in the alleviation of its effects, producing
results with similar quality to decomposing the entire tensor in one shot.
Additionally, in real datasets, SeekAndDestroy outperforms other streaming
baselines, while discovering novel useful components.Comment: 16 Pages, Accepted at ECML-PKDD 201
Seir immune strategy for instance weighted naive bayes classification
© Springer International Publishing Switzerland 2015. Naive Bayes (NB) has been popularly applied in many classification tasks. However, in real-world applications, the pronounced advantage of NB is often challenged by insufficient training samples. Specifically, the high variance may occur with respect to the limited number of training samples. The estimated class distribution of a NB classier is inaccurate if the number of training instances is small. To handle this issue, in this paper, we proposed a SEIR (Susceptible, Exposed, Infectious and Recovered) immune-strategy-based instance weighting algorithm for naive Bayes classification, namely SWNB. The immune instance weighting allows the SWNB algorithm adjust itself to the data without explicit specification of functional or distributional forms of the underlying model. Experiments and comparisons on 20 benchmark datasets demonstrated that the proposed SWNB algorithm outperformed existing state-of-the-art instance weighted NB algorithm and other related computational intelligence methods
Fast Generation of Best Interval Patterns for Nonmonotonic Constraints
International audienceIn pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are neither monotonic nor anti-monotonic, which makes it difficult to generate patterns satisfying these constraints.In this paper we introduce the notion of "generalized monotonicity" and Sofia algorithm that allow generating best patterns in polynomial time for some nonmonotonic constraints modulo constraint computation and pattern extension operations. In particular, this algorithm is polynomial for data on itemsets and interval tuples. In this paper we consider stability and delta-measure which are nonmonotonic constraints and apply them to interval tuple datasets. In the experiments, we compute best interval tuple patterns w.r.t. these measures and show the advantage of our approach over postfiltering approaches
Influence of topography on tide propagation and amplification in semi-enclosed basins
An idealized model for tide propagation and amplification in semi-enclosed rectangular basins is presented, accounting for depth differences by a combination of longitudinal and lateral topographic steps. The basin geometry is formed by several adjacent compartments of identical width, each having either a uniform depth or two depths separated by a transverse topographic step. The problem is forced by an incoming Kelvin wave at the open end, while allowing waves to radiate outward. The solution in each compartment is written as the superposition of (semi)-analytical wave solutions in an infinite channel, individually satisfying the depth-averaged linear shallow water equations on the f plane, including bottom friction. A collocation technique is employed to satisfy continuity of elevation and flux across the longitudinal topographic steps between the compartments. The model results show that the tidal wave in shallow parts displays slower propagation, enhanced dissipation and amplified amplitudes. This reveals a resonance mechanism, occurring when\ud
the length of the shallow end is roughly an odd multiple of the quarter Kelvin wavelength. Alternatively, for sufficiently wide basins, also Poincaré waves may become resonant. A transverse step implies different wavelengths of the incoming and reflected Kelvin wave, leading to increased amplitudes in shallow regions and a shift of amphidromic points in the direction of the deeper part. Including the shallow parts near the basin’s closed end (thus capturing the Kelvin resonance mechanism) is essential to reproduce semi-diurnal and diurnal\ud
tide observations in the Gulf of California, the Adriatic Sea and the Persian Gulf
Ensembles of jittered association rule classifiers
The ensembling of classifiers tends to improve predictive accuracy. To obtain an ensemble with N classifiers, one typically needs to run N learning processes. In this paper we introduce and explore Model Jittering Ensembling, where one single model is perturbed in order to obtain variants that can be used as an ensemble. We use as base classifiers sets of classification association rules. The two methods of jittering ensembling we propose are Iterative Reordering Ensembling (IRE) and Post Bagging (PB). Both methods start by learning one rule set over a single run, and then produce multiple rule sets without relearning. Empirical results on 36 data sets are positive and show that both strategies tend to reduce error with respect to the single model association rule classifier. A bias–variance analysis reveals that while both IRE and PB are able to reduce the variance component of the error, IRE is particularly effective in reducing the bias component. We show that Model Jittering Ensembling can represent a very good speed-up w.r.t. multiple model learning ensembling. We also compare Model Jittering with various state of the art classifiers in terms of predictive accuracy and computational efficiency.This work was partially supported by FCT project Rank! (PTDC/EIA/81178/2006) and by AdI project Palco3.0 financed by QREN and Fundo Europeu de Desenvolvimento Regional (FEDER), and also supported by Fundacao Ciencia e Tecnologia, FEDER e Programa de Financiamento Plurianual de Unidades de I & D. Thanks are due to William Cohen for kindly providing the executable code for the SLIPPER implementation. Our gratitude goes also to our anonymous reviewers who have helped to significantly improve this paper by sharing their knowledge and their informed criticism with the authors
Smoothing a rugged protein folding landscape by sequence-based redesign
The rugged folding landscapes of functional proteins puts them at risk of misfolding and aggregation. Serine protease inhibitors, or serpins, are paradigms for this delicate balance between function and misfolding. Serpins exist in a metastable state that undergoes a major conformational change in order to inhibit proteases. However, conformational labiality of the native serpin fold renders them susceptible to misfolding, which underlies misfolding diseases such as -antitrypsin deficiency. To investigate how serpins balance function and folding, we used consensus design to create , a synthetic serpin that folds reversibly, is functional, thermostable, and polymerization resistant. Characterization of its structure, folding and dynamics suggest that consensus design has remodeled the folding landscape to reconcile competing requirements for stability and function. This approach may offer general benefits for engineering functional proteins that have risky folding landscapes, including the removal of aggregation-prone intermediates, and modifying scaffolds for use as protein therapeutics.BTP is a Medical Research Council Career Development Fellow. AAN and JJH are supported by the Wellcome Trust (grant number WT 095195). SM acknowledges fellowship support from the Australian Research Council (FT100100960). NAB is an Australian Research Council Future Fellow (110100223). GIW is an Australian Research Council Discovery Outstanding Researcher Award Fellow (DP140100087). AMB is a National Health and Medical Research Senior Research Fellow (1022688). JCW is an NHMRC Senior Principal Research fellow and also acknowledges the support of an ARC Federation Fellowship. We thank the Australian Synchrotron for beam-time and technical assistance. This work was supported by the Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE) (www.massive.org.au). We acknowledge the Monash Protein Production Unit and Monash Macromolecular Crystallization Facilit
Smoothing a rugged protein folding landscape by sequence-based redesign
The rugged folding landscapes of functional proteins puts them at risk of misfolding and aggregation.
Serine protease inhibitors, or serpins, are paradigms for this delicate balance between function and
misfolding. Serpins exist in a metastable state that undergoes a major conformational change in
order to inhibit proteases. However, conformational labiality of the native serpin fold renders them
susceptible to misfolding, which underlies misfolding diseases such as α1-antitrypsin deficiency. To
investigate how serpins balance function and folding, we used consensus design to create conserpin,
a synthetic serpin that folds reversibly, is functional, thermostable, and polymerization resistant.
Characterization of its structure, folding and dynamics suggest that consensus design has remodeled
the folding landscape to reconcile competing requirements for stability and function. This approach
may offer general benefits for engineering functional proteins that have risky folding landscapes,
including the removal of aggregation-prone intermediates, and modifying scaffolds for use as protein
therapeutics
DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups
We strive to find contexts (i.e., subgroups of entities) under which exceptional (dis-)agreement occurs among a group of individuals , in any type of data featuring individuals (e.g., parliamentarians , customers) performing observable actions (e.g., votes, ratings) on entities (e.g., legislative procedures, movies). To this end, we introduce the problem of discovering statistically significant exceptional contextual intra-group agreement patterns. To handle the sparsity inherent to voting and rating data, we use Krippendorff's Alpha measure for assessing the agreement among individuals. We devise a branch-and-bound algorithm , named DEvIANT, to discover such patterns. DEvIANT exploits both closure operators and tight optimistic estimates. We derive analytic approximations for the confidence intervals (CIs) associated with patterns for a computationally efficient significance assessment. We prove that these approximate CIs are nested along specialization of patterns. This allows to incorporate pruning properties in DEvIANT to quickly discard non-significant patterns. Empirical study on several datasets demonstrates the efficiency and the usefulness of DEvIANT. Technical Report Associated with the ECML/PKDD 2019 Paper entitled: "DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups"
- …