20,937 research outputs found

    Principal alarms in multivariate statistical process control

    Get PDF
    This paper describes a methodology for the simulation of multivariate out of control situations using in-control data. The method is based on finding the independent factors of the variability of the process, and shifting these factors one by one. These shifts are then translated in terms of the observed variables. The shifts provoked by the most important factors are called principal alarms. The principal alarms are plotted, visualizing the main deviations of the process. Also, a resampling procedure for ARL estimation using principal alarms is proposed. An application using a real industrial process, illustrates the usefulness of the methodology

    SMART: Unique splitting-while-merging framework for gene clustering

    Get PDF
    Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc

    Using geographical information systems for management of back-pain data

    Get PDF
    This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2002 MCB UP LtdIn the medical world, statistical visualisation has largely been confined to the realm of relatively simple geographical applications. This remains the case, even though hospitals have been collecting spatial data relating to patients. In particular, hospitals have a wealth of back pain information, which includes pain drawings, usually detailing the spatial distribution and type of pain suffered by back-pain patients. Proposes several technological solutions, which permit data within back-pain datasets to be digitally linked to the pain drawings in order to provide methods of computer-based data management and analysis. In particular, proposes the use of geographical information systems (GIS), up till now a tool used mainly in the geographic and cartographic domains, to provide novel and powerful ways of visualising and managing back-pain data. A comparative evaluation of the proposed solutions shows that, although adding complexity and cost, the GIS-based solution is the one most appropriate for visualisation and analysis of back-pain datasets

    Evaluation methods and decision theory for classification of streaming data with temporal dependence

    Get PDF
    Predictive modeling on data streams plays an important role in modern data analysis, where data arrives continuously and needs to be mined in real time. In the stream setting the data distribution is often evolving over time, and models that update themselves during operation are becoming the state-of-the-art. This paper formalizes a learning and evaluation scheme of such predictive models. We theoretically analyze evaluation of classifiers on streaming data with temporal dependence. Our findings suggest that the commonly accepted data stream classification measures, such as classification accuracy and Kappa statistic, fail to diagnose cases of poor performance when temporal dependence is present, therefore they should not be used as sole performance indicators. Moreover, classification accuracy can be misleading if used as a proxy for evaluating change detectors with datasets that have temporal dependence. We formulate the decision theory for streaming data classification with temporal dependence and develop a new evaluation methodology for data stream classification that takes temporal dependence into account. We propose a combined measure for classification performance, that takes into account temporal dependence, and we recommend using it as the main performance measure in classification of streaming data
    • …
    corecore