140 research outputs found

    Towards Data Mining in Large and Fully Distributed Peer-To-Peer Overlay Networks

    Get PDF
    The Internet, which is becoming a more and more dynamic, extremely heterogeneous network has recently became a platform for huge fully distributed peer-to-peer overlay networks containing millions of nodes typically for the purpose of information dissemination and file sharing. This paper targets the problem of analyzing data which are scattered over a such huge and dynamic set of nodes, where each node is storing possibly very little data but where the total amount of data is immense due to the large number of nodes. We present distributed algorithms for effectively calculating basic statistics of data using the recently introduced newscast model of computation and we demonstrate how to implement basic data mining algorithms based on these techniques. We will argue that the suggested techniques are efficient, robust and scalable and that they preserve the privacy of data

    RHESSI Spectral Fits of Swift GRBs

    Full text link
    One of the challenges of the Swift era has been accurately determining Epeak for the prompt GRB emission. RHESSI, which is sensitive from 30 keV to 17 MeV, can extend spectral coverage above the Swift-BAT bandpass. Using the public Swift data, we present results of joint spectral fits for 26 bursts co-observed by RHESSI and Swift-BAT through May 2007. We compare these fits to estimates of Epeak which rely on BAT data alone. A Bayesian Epeak estimator gives better correspondence with our measured results than an estimator relying on correlations with the Swift power law indices.Comment: 4 pages, 1 figure. To appear in the proceedings of Gamma Ray Bursts 2007, Santa Fe, New Mexico, November 5-9 200

    Structure and dielectric response in the high TcT_c ferroelectric Bi(Zn,Ti)O3_3-PbTiO3_3 solid solutions

    Full text link
    Theoretical {\em ab initio} and experimental methods were used to investigate the xxBi(Zn,Ti)O3_3-(1-xx)PbTiO3_3 (BZT-PT) solid solution. We find that hybridization between Zn 4pp and O 2pp orbitals allows the formation of short, covalent Zn-O bonds, enabling favorable coupling between A-site and B-site displacements. This leads to large polarization, strong tetragonality and an elevated ferroelectric to paraelectric phase transition temperature. nhomogeneities in local structure near the 90^\circ domain boundaries can be deduced from the asymetric peak broadening in the neutron and x-ray diffraction spectra. These extrinsic effects make the ferroelectric to paraelectric phase transition diffuse in BZT-PT solid solutions

    On Evidence Weighted Mixture Classification

    Get PDF
    2005 Joint Annual Meeting of the Interface and the Classification Society of North America, St. Louis, Missouri, 8-12 June 2005Calculation of the marginal likelihood or evidence is a problem central to model selection and model averaging in a Bayesian framework. Many sampling methods, especially (Reversible Jump) Markov chain Monte Carlo techniques, have been devised to avoid explicit calculation of the evidence, but they are limited to models with a common parameterisation. It is desirable to extend model averaging to models with disparate architectures and parameterisations. In this paper we present a straightforward general computational scheme for calculating the evidence, applicable to any model for which samples can be drawn from the posterior distribution of parameters conditioned on the data. The scheme is demonstrated on a simple feature subset selection example

    Representing classifier confidence in the safety critical domain: an illustration from mortality prediction in trauma cases

    Get PDF
    Copyright © 2007 Springer Verlag. The final publication is available at link.springer.comThis work proposes a novel approach to assessing confidence measures for software classification systems in demanding applications such as those in the safety critical domain. Our focus is the Bayesian framework for developing a model-averaged probabilistic classifier implemented using Markov chain Monte Carlo (MCMC) and where appropriate its reversible jump variant (RJ-MCMC). Within this context we suggest a new technique, building on the reject region idea, to identify areas in feature space that are associated with "unsure" classification predictions. We term such areas "uncertainty envelopes" and they are defined in terms of the full characteristics of the posterior predictive density in different regions of the feature space. We argue this is more informative than use of a traditional reject region which considers only point estimates of predictive probabilities. Results from the method we propose are illustrated on synthetic data and also usefully applied to real life safety critical systems involving medical trauma data

    Computing with confidence: a Bayesian approach

    Get PDF
    Bayes’ rule is introduced as a coherent strategy for multiple recomputations of classifier system output, and thus as a basis for assessing the uncertainty associated with a particular system results --- i.e. a basis for confidence in the accuracy of each computed result. We use a Markov-Chain Monte Carlo method for efficient selection of recomputations to approximate the computationally intractable elements of the Bayesian approach. The estimate of the confidence to be placed in any classification result provides a sound basis for rejection of some classification results. We present uncertainty envelopes as one way to derive these confidence estimates from the population of recomputed results. We show that a coarse SURE or UNSURE confidence rating based on a threshold of agreed classifications works well, not only pinpointing those results that are reliable but also in indicating input data problems, such as corrupted or incomplete data, or application of an inadequate classifier model

    A Bayesian methodology for estimating uncertainty of decisions in safety-critical systems

    Get PDF
    Published as chapter in Frontiers in Artificial Intelligence and Applications. Volume 149, IOS Press Book, 2006. Integrated Intelligent Systems for Engineering Design. Edited by Xuan F. Zha, R.J. Howlett. ISBN 978-1-58603-675-1, pp. 82-96. This version deposited in arxiv.orghttp://arxiv.org/abs/1012.0322Uncertainty of decisions in safety-critical engineering applications can be estimated on the basis of the Bayesian Markov Chain Monte Carlo (MCMC) technique of averaging over decision models. The use of decision tree (DT) models assists experts to interpret causal relations and find factors of the uncertainty. Bayesian averaging also allows experts to estimate the uncertainty accurately when a priori information on the favored structure of DTs is available. Then an expert can select a single DT model, typically the Maximum a Posteriori model, for interpretation purposes. Unfortunately, a priori information on favored structure of DTs is not always available. For this reason, we suggest a new prior on DTs for the Bayesian MCMC technique. We also suggest a new procedure of selecting a single DT and describe an application scenario. In our experiments on the Short-Term Conflict Alert data our technique outperforms the existing Bayesian techniques in predictive accuracy of the selected single DTs.Supported by a grant from the EPSRC under the Critical Systems Program, grant GR/R24357/0

    Comparison of the Bayesian and Randomised Decision Tree Ensembles within an Uncertainty Envelope Technique

    Get PDF
    Copyright © 2006 Springer. The final publication is available at link.springer.comMultiple Classifier Systems (MCSs) allow evaluation of the uncertainty of classification outcomes that is of crucial importance for safety critical applications. The uncertainty of classification is determined by a trade-off between the amount of data available for training, the classifier diversity and the required performance. The interpretability of MCSs can also give useful information for experts responsible for making reliable classifications. For this reason Decision Trees (DTs) seem to be attractive classification models for experts. The required diversity of MCSs exploiting such classification models can be achieved by using two techniques, the Bayesian model averaging and the randomised DT ensemble. Both techniques have revealed promising results when applied to real-world problems. In this paper we experimentally compare the classification uncertainty of the Bayesian model averaging with a restarting strategy and the randomised DT ensemble on a synthetic dataset and some domain problems commonly used in the machine learning community. To make the Bayesian DT averaging feasible, we use a Markov Chain Monte Carlo technique. The classification uncertainty is evaluated within an Uncertainty Envelope technique dealing with the class posterior distribution and a given confidence probability. Exploring a full posterior distribution, this technique produces realistic estimates which can be easily interpreted in statistical terms. In our experiments we found out that the Bayesian DTs are superior to the randomised DT ensembles within the Uncertainty Envelope technique

    A Bayesian Methodology for Estimating Uncertainty of Decisions in Safety-Critical Systems

    Get PDF
    In: Integrated Intelligent Systems for Engineering Design (editors: Zha, X.F. and Howlett, R.J.)Frontiers in Artificial Intelligence and Applications vol. 14

    Estimating Classification Uncertainty of Bayesian Decision Tree Technique on Financial Data

    Get PDF
    Copyright © 2007 Springer. The final publication is available at link.springer.comBook title: Perception-based Data Mining and Decision Making in Economics and FinanceSummary Bayesian averaging over classification models allows the uncertainty of classification outcomes to be evaluated, which is of crucial importance for making reliable decisions in applications such as financial in which risks have to be estimated. The uncertainty of classification is determined by a trade-off between the amount of data available for training, the diversity of a classifier ensemble and the required performance. The interpretability of classification models can also give useful information for experts responsible for making reliable classifications. For this reason Decision Trees (DTs) seem to be attractive classification models. The required diversity of the DT ensemble can be achieved by using the Bayesian model averaging all possible DTs. In practice, the Bayesian approach can be implemented on the base of a Markov Chain Monte Carlo (MCMC) technique of random sampling from the posterior distribution. For sampling large DTs, the MCMC method is extended by Reversible Jump technique which allows inducing DTs under given priors. For the case when the prior information on the DT size is unavailable, the sweeping technique defining the prior implicitly reveals a better performance. Within this chapter we explore the classification uncertainty of the Bayesian MCMC techniques on some datasets from the StatLog Repository and real financial data. The classification uncertainty is compared within an Uncertainty Envelope technique dealing with the class posterior distribution and a given confidence probability. This technique provides realistic estimates of the classification uncertainty which can be easily interpreted in statistical terms with the aim of risk evaluation
    corecore