348 research outputs found

    Estimating process capability index Cpm using a bootstrap sequential sampling procedure

    Full text link
    Construction of a confidence interval for process capability index CPM is often based on a normal approximation with fixed sample size. In this article, we describe a different approach in constructing a fixed-width confidence interval for process capability index CPM with a preassigned accuracy by using a combination of bootstrap and sequential sampling schemes. The optimal sample size required to achieve a preassigned confidence level is obtained using both two-stage and modified two-stage sequential procedures. The procedure developed is also validated using an extensive simulation study.<br /

    Prognostic Methods for Integrating Data from Complex Diseases

    Get PDF
    Statistics in medical research gained a vast surge with the development of high-throughput biotechnologies that provide thousands of measurements for each patient. These multi-layered data has the clear potential to improve the disease prognosis. Data integration is increasingly becoming essential in this context, to address problems such as increasing the power, inconsistencies between studies, obtaining more reliable biomarkers and gaining a broader understanding of the disease. This thesis focuses on addressing the challenges in the development of statistical methods while contributing to the methodological advancements in this field. We propose a clinical data analysis framework to obtain a model with good prediction accuracy addressing missing data and model instability. A detailed pre-processing pipeline is proposed for miRNA data that removes unwanted noise and offers improved concordance with qRT-PCR data. Platform specific models are developed to uncover biomarkers using mRNA, protein and miRNA data, to identify the source with the most important prognostic information. This thesis explores two types of data integration: horizontal; the integration of same type of data, and vertical; the integration of data from different platforms for the same patient. We use multiple miRNA datasets to develop a meta-analysis framework addressing the challenges in horizontal data integration using a multi-step validation protocol. In the vertical data integration, we extend the pre-validation principle and derive platform dependent weights to utilise the weighted Lasso. Our study revealed that integration of multi-layered data is instrumental in improving the prediction accuracy and in obtaining more biologically relevant biomarkers. A novel visualisation technique to look at prediction accuracy at patient level revealed vital findings with translational impact in personalised medicine

    Process Capability Calculations with Nonnormal Data in the Medical Device Manufacturing Industry

    Get PDF
    U.S. Food and Drug Administration (FDA) recalls of medical devices are at historically high levels despite efforts by manufacturers to meet stringent agency requirements to ensure quality and patient safety. A factor in the release of potentially dangerous devices might be the interpretations of nonnormal test data by statistically unsophisticated engineers. The purpose of this study was to test the hypothesis that testing by lot provides a better indicator of true process behavior than process capability indices (PCIs) calculated from the mixed lots that often occur in a typical production situation. The foundations of this research were in the prior work of Bertalanffy, Kane, Shewhart, and Taylor. The research questions examined whether lot traceability allows the decomposition of the combination distribution to allow more accurate calculations of PCIs used to monitor medical device production. The study was semiexperimental, using simulated data. While the simulated data were random, the study was a quasiexperimental design because of the control of the simulated data through parameter selection. The results of this study indicate that decomposition does not increase the accuracy of the PCI. The conclusion is that a systems approach using the PCI, additional statistical tools, and expert knowledge could yield more accurate results than could decomposition alone. More accurate results could ensure the production of safer medical devices by correctly identifying noncapable processes (i.e., processes that may not produce required results), while also preventing needless waste of resources and delays in potentially life-savings technology, reaching patients in cases where processes evaluate as noncapable when they are actually capable

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    Explainable machine learning for project management control

    Get PDF
    Project control is a crucial phase within project management aimed at ensuring —in an integrated manner— that the project objectives are met according to plan. Earned Value Management —along with its various refinements— is the most popular and widespread method for top-down project control. For project control under uncertainty, Monte Carlo simulation and statistical/machine learning models extend the earned value framework by allowing the analysis of deviations, expected times and costs during project progress. Recent advances in explainable machine learning, in particular attribution methods based on Shapley values, can be used to link project control to activity properties, facilitating the interpretation of interrelations between activity characteristics and control objectives. This work proposes a new methodology that adds an explainability layer based on SHAP —Shapley Additive exPlanations— to different machine learning models fitted to Monte Carlo simulations of the project network during tracking control points. Specifically, our method allows for both prospective and retrospective analyses, which have different utilities: forward analysis helps to identify key relationships between the different tasks and the desired outcomes, thus being useful to make execution/replanning decisions; and backward analysis serves to identify the causes of project status during project progress. Furthermore, this method is general, model-agnostic and provides quantifiable and easily interpretable information, hence constituting a valuable tool for project control in uncertain environments

    Statistical modelling of the near-Earth magnetic field in space weather

    Get PDF
    Space weather refers to electromagnetic disturbances in the near-Earth environment as a result of the Sun-Earth interaction. Severe space weather events such as magnetic storms can cause disruption to a wide range of technologies and infrastructure, including communications systems, electronic circuits and power grids. Because of its high potential impact, space weather has been included in the UK National Risk Register since 2011. Space weather monitoring and early magnetic storm detection can be used to mitigate risk in sensitive technological systems. The aim of this project is to investigate the electromagnetic disturbances in the near-Earth environment through developing statistical models that quantifies the variations and uncertainties in the near-Earth magnetic field. Data of the near-Earth magnetic field arise from in-situ satellite measurements and computer model outputs. The Cluster II mission (Escoubet et al., 2001a) has four satellites that provide in-situ measurements of the near-Earth magnetic field at time-varying locations along their trajectories. The computer model consists of an internal part that calculates the magnetic field sourced from Earth itself and an external part that estimates the magnetic field resulting from the Sun-Earth interaction. These magnetic fields, termed as the internal field and the external field, add up to the total magnetic field. Numerical estimates of the internal field and the external field are obtained respectively from the IGRF-11 model (Finlay et al., 2010) and the Tysganenko-96 (T96) model (Tsyganenko, 2013) given the times and the locations as inputs. The IGRF model outputs are invariant to space weather conditions whereas the T96 model outputs change with the input space weather parameters. The time-varying space weather parameters for T96 model include the solar wind ram pressure, the y and the z-components of the interplanetary magnetic field, and the disturbance storm time index. These parameters are the estimated time series of the solar wind conditions at the magnetopause, i.e. the boundary of the magnetosphere on the day-side, and the disturbance level at the Earth’s surface. Real-time values of the T96 model input parameters are available at hourly resolution from https://omniweb.gsfc.nasa.gov/. The overall aim of the thesis is to build spatio-temporal models that can be used to understand uncertainties and constraints leveraged from 3D mathematical models of space weather events. These spatio-temporal models can be then used to help understand the design parameters that need to be varied in building a precise and reliable sensor network. Chapter 1 provides an introduction to space weather in terms of the near-Earth magnetic field environment. Beginning with an overview of the near-Earth magnetic field environment, Chapter 2 describes the sources for generating in-situ satellite measurements and computer model outputs, namely the Cluster II mission, the IGRF model, and the T96 model. The process of sampling the magnetic field data from the different data sources and the space-time dependence in the hourly sampled magnetic field data are also included in this Chapter. Converting the space-time structure in the magnetic field data into a time series structure with a function relating the position in space to time, Chapter 3 explores the temporal variations in the sampled in-situ satellite measurements. Through a hierarchical approach, the satellite measurements are related to the computer model outputs. This chapter proposes statistical methods for dealing with the non-stationary features, temporal autocorrelation, and volatility present in the time series data. With the aim of better characterising the electromagnetic environment around the Earth, Chapter 4 develops time-series models of the near-Earth magnetic field utilising in-situ (CLUSTER) magnetic field data. Regression models linking the CLUSTER satellite observations and two physical models of the magnetic field (T96 and IGRF) are fit to each orbit in the period 2003-2013. The time series of model parameter estimates are then analysed to examine any long term patterns, variations and associations to storm indices. In addition to explaining how the two physical models calibrate with the observed satellite measurements, these statistical models capture the inherent volatility in the magnetic field, and allow us to identify other factors associated with the magnetic field variation, such as the relative position of each satellite relative to the Earth and the Sun. Mixed-effect models that include these factors are constructed for parameters estimated from the regression models for evaluating the performance of the two computer models. Following the calibration of the computer models against the satellite measurements, Chapter 5 investigates how these computer models allow us to investigate the association between the variations in near-Earth magnetic field and storms. To identify the signatures of storm onsets in different locations in the magnetosphere, change-point detection methods are considered for time series magnetic field signals generated from the computer models along various feasible satellite orbits. The detection results inform on potential sampling strategies of the near-Earth magnetic field to be predictive of storms through selecting achievable satellite orbits for placing satellite sensors and detecting changes in the time series magnetic signals. Chapter 6 provides of a summary of the main finding within this thesis, identifies some limitations of the work carried out in the main chapters, and include a discussion of future research. An Appendix provides details of coordinate transformation for converting the time and position dependent magnetic field data into an appropriate coordinate system

    Customer lifetime value: a framework for application in the insurance industry - building a business process to generate and maintain an automatic estimation agent

    Get PDF
    Research Project submited as partial fulfilment for the Master Degree in Statistics and Information Management, specialization in Knowledge Management and Business IntelligenceIn recent years the topic of Customer Lifetime Value (CLV) or in its expanded version, Customer Equity (CE) has become popular as a strategic tool across several industries, in particular in retail and services. Although the core concepts of CLV modelling have been studied for several years and the mathematics that underpins the concept is well understood, the application to specific industries is not trivial. The complexities associated with the development of a CLV programme as a business process are not insignificant causing a myriad of obstacles to its implementation. This research project builds a framework to develop and implement the CLV concept as maintainable business process with the focus on the Insurance Industry, in particular for the nonlife line of business. Key concepts, as churn modelling, portfolio stationary premiums, fiscal policies and balance sheet information must be integrated into the CLV framework. In addition, an automatic estimation machine (AEM) is developed to standardize CLV calculations. The concept of AEM is important, given that CLV information “must be fit for purpose”, when used in other business processes. The field work is carried out in a Portuguese Bancassurance Company which is part of an important Portuguese financial Group. Firstly this is done by investigating how to translate and apply the known CLV concepts into the insurance industry context. Secondly, a sensitivity study is done to establish the optimum parameters strategy. This is done by incorporating and comparing several Datamining concepts applied to churn prediction and customer base segmentation. Scenarios for balance sheet information usage and others actuarial concepts are analyzed to calibrate the Cash Flow component of the CLV framework. Thirdly, an Automatic Estimation Agent is defined for application to the current or the expanding firm portfolio, the advantages of using the SOA approach for deployment is also verified. Additionally a comparative impact study is done between two valuation views: the Premium/Cost driven versus the CLV driven. Finally a framework for a BPM is presented, not only for building the AEM but also for its maintenance according to an explicit performance threshold.O tema do valor embebido do Cliente (Customer Lifetime Value ou CLV), ou na sua versão expandida, valoração patrimonial do Cliente (Customer Equity), adquiriu alguma relevância como ferramenta estratégica em várias indústrias, em particular na Distribuição e Serviços. Embora os principais conceitos subjacentes ao CLV tenham sido já desenvolvidos e a matemática financeira possa ser considerada trivial, a sua aplicação prática não o é. As complexidades associadas ao desenvolvimento de um programa de CLV, especialmente na forma de Processo de Negócio não são insignificantes, existindo uma miríade de obstáculos à sua implementação. Este projecto de pesquisa desenvolve o enquadramento de adaptação, actividades e processos necessários para a aplicação do conceito à Industria de Seguros, especificamente para uma empresa que actue no Sector Não Vida. Conceitos-chave, como a modelação da erosão das carteiras, a estacionaridade dos prémios, as políticas fiscais e informação de balanço terão de ser integrados no âmbito do programa de modelação do valor embebido do Cliente. Um dos entregáveis será uma “máquina automática de estimação” do valor embebido, essa ferramenta servirá para padronizar os cálculos do CLV, para além disso é importante, dado que a informação do CLV será utilizada noutros processos de negócio, como por exemplo a distribuição ou vendas. O trabalho de campo é realizado numa empresa de Seguros tipo Bancassurance pertença de um Grupo Financeiro Português relevante. O primeiro passo do trabalho será a compressão do conceito do CLV e como aplicá-lo aos Seguros. Em segundo lugar, será feito um estudo de sensibilidade para determinar a estratégia óptima de parâmetros através de aplicação de técnicas de modelação. Em terceiro lugar serão abordados alguns detalhes da máquina automática de estimação e a sua utilização do ponto de vista dos Serviços e Sistemas de Negócio ( e.g. via SOA). Em paralelo será realizado um estudo de impacto comparativo entre as duas visões de avaliação do negócio: Rácio de Sinistralidade vs CLV. Por último será apresentado um desenho de processo para a manutenção continuada da utilização deste conceito no suporte ao negócio

    Vol. 7, No. 2 (Full Issue)

    Get PDF
    corecore