1,373 research outputs found

    An application of extreme value theory in medical sciences

    Get PDF
    Tese de mestrado em Bioestatística, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, em 2018Valores altos de pressão arterial são considerados um fator de risco de doenças cardiovasculares, ver [Hajar, 2016]. Estas doenças são a principal causa de morte em Portugal. Com o objectivo de criar um perfil da população Portuguesa em relação aos riscos de doenças cardiovasculares, um estudo foi desenvolvido em 2005, pela Associação Nacional de Farmácias através do seu Departamento de Serviços Farmacêuticos. O interesse principal do presente estudo consiste em modelar valores elevados de pressão arterial sistólica em indivíduos que sofrem de uma categoria particular de hipertensão. Um estudo similar foi desenvolvido para modelar os valores elevados de níveis de colesterol total, ver [de Zea Bermudez and Mendes, 2012]. A presente dissertação tem dois principais interesses: estudar a distribuição geográfica dos valores elevados de pressão arterial sistólica (em indivíduos com valores normais de pressão arterial diastólica) em Portugal, i.e., ajustar modelos de valores extremos para cada distrito de Portugal e ilhas e analisar em particular o grupo de maior risco, i.e., indivíduos idosos. Com esse propósito, a metodologia Peaks Over Threshold foi aplicada. Esta metodologia consiste em ajustar um modelo aos excessos (ou excedências) acima de um limiar de pressão arterial sistólica suficientemente elevado. Os modelos obtidos serão capazes de estimar quantis elevados e probabilidades de cauda de pressão arterial sistólica. Na presente dissertação, os indivíduos foram divididos em quatro grupos distintos. Aqueles que apresentavam valores normais de pressão arterial sistólica e pressão arterial diastólica. Os que apresentavam valores superiores aos delineados pelas entidades médicas em um ou ambos os índices, ver tabela 6.1. Dentro deste último grupo consideramos os indivíduos que sofrem de hipertensão arterial sistólica isolada, caracterizada por valores de pressão arterial sistólica superior ou igual a 140 mmHg e valor de pressão arterial diastólica inferior a 90 mmHg. Pretendemos estudar valores elevados de pressão arterial sistólica neste grupo. Em primeira análise, foi efectuado um estudo descritivo dos indivíduos que frequentaram a campanha e que sofrem de hipertensão sistólica isolada, com o intuito de averiguar o efeito de outras variáveis de interesse nos níveis de pressão arterial sistólica. As variáveis consideradas nesta analise preliminar foram a idade, cuja relação com valores elevados de pressão arterial sistólica é conhecida, ver [Pinto, 2007]; o género, consumo de tabaco, índice de massa corporal e distrito. A análise de valores extremos utilizando a metodologia Peaks Over Threshold consiste em várias etapas. Em primeiro lugar, é necessário obter o valor limiar elevado (threshold) com o objectivo de ajustar uma distribuição generalizada de Pareto aos seus excessos. Esta distribuição tem parâmetro de forma k e parâmetro de escala s, ver expressão (3.2). Esta primeira etapa é por vezes difícil. A literatura apresenta várias metodologias para tratar esta fase. Existem métodos exploratórios, como o descrito por [Coles, 2001], que utiliza a função de excesso médio para discernir o limiar elevado pretendido. [DuMouchel, 1983] sugere utilizar c0:9 como valor limiar. Existem também métodos que consistem em ajustar o modelo considerando vários valores limiar e avaliar qual produz o melhor ajustamento, como por exemplo os testes de Cramér-von Mises e Anderson-Darling, ver [Choulakian and Stephens, 2001]. Ainda dentro deste grupo destacamos um método Bayesiano que utiliza medidas de surpresa, ver [Lee et al., 2015]. Todos os métodos referidos acima são utilizados ao longo da dissertação. Após concluída esta fase procedemos ao ajuste de uma distribuição generalizada de Pareto aos excessos do valor limiar seleccionado. Máxima verosimilhança é a metodologia mais usual para efectuar o ajustamento visto que os resultantes estimadores dos parâmetros gozam de propriedades relevantes. Numa primeira etapa, implementamos a metodologia Peaks Over Threshold nos indivíduos que sofrem de pressão arterial sistólica isolada em cada distrito de Portugal continental e ilhas. Aqui são exploradas as dificuldades inerentes na análise de valores extremos e também alguns problemas encontrados nos dados, os quais são explorados no capítulo seguinte, onde analisamos os valores de pressão arterial sistólica em indivíduos idosos, (idade superior ou igual a 55) e consideramos um método que trata o problema de testes múltiplos para hipóteses ordenadas. Estas resultam da aplicação dos testes de Cramér-von Mises e Anderson-Darling para diferentes partições da amostra; e consideramos também modelos jittering para lidar com o problema de discretização dos dados.It has been well stated that high values of blood pressure constitute a risk factor for cardiovascular diseases [Hajar, 2016], with the latter being the number one death cause in Portugal. With the objectiveof profiling the Portuguese population in what regards cardiovascular diseases’ risk factors, a study was developed and carried out in 2005, by the National Pharmacy Association through its Department of Pharmaceutical Care. The main interest of the present study is to model the high values of systolic blood pressure of individuals with a specific hypertension pathology. A similar study was developed for the total cholesterol levels [de Zea Bermudez and Mendes, 2012]. The aims of this dissertation are twofold: to study the geographical distribution of the high systolic blood pressure (but normal diastolic blood pressure) in Portugal, i.e., fitting extreme value models for each Portuguese district and islands and studying the group that is more at risk, i.e., the elderly. With that purpose, the Peaks Over Threshold methodology was applied, which consists in finding a sufficiently high systolic blood pressure threshold and fitting a tail model to the excesses. The models will be able to estimate extreme quantiles and tail probabilities of the systolic blood pressure in each group

    Evaluating the Differences of Gridding Techniques for Digital Elevation Models Generation and Their Influence on the Modeling of Stony Debris Flows Routing: A Case Study From Rovina di Cancia Basin (North-Eastern Italian Alps)

    Get PDF
    Debris \ufb02ows are among the most hazardous phenomena in mountain areas. To cope with debris \ufb02ow hazard, it is common to delineate the risk-prone areas through routing models. The most important input to debris \ufb02ow routing models are the topographic data, usually in the form of Digital Elevation Models (DEMs). The quality of DEMs depends on the accuracy, density, and spatial distribution of the sampled points; on the characteristics of the surface; and on the applied gridding methodology. Therefore, the choice of the interpolation method affects the realistic representation of the channel and fan morphology, and thus potentially the debris \ufb02ow routing modeling outcomes. In this paper, we initially investigate the performance of common interpolation methods (i.e., linear triangulation, natural neighbor, nearest neighbor, Inverse Distance to a Power, ANUDEM, Radial Basis Functions, and ordinary kriging) in building DEMs with the complex topography of a debris \ufb02ow channel located in the Venetian Dolomites (North-eastern Italian Alps), by using small footprint full- waveform Light Detection And Ranging (LiDAR) data. The investigation is carried out through a combination of statistical analysis of vertical accuracy, algorithm robustness, and spatial clustering of vertical errors, and multi-criteria shape reliability assessment. After that, we examine the in\ufb02uence of the tested interpolation algorithms on the performance of a Geographic Information System (GIS)-based cell model for simulating stony debris \ufb02ows routing. In detail, we investigate both the correlation between the DEMs heights uncertainty resulting from the gridding procedure and that on the corresponding simulated erosion/deposition depths, both the effect of interpolation algorithms on simulated areas, erosion and deposition volumes, solid-liquid discharges, and channel morphology after the event. The comparison among the tested interpolation methods highlights that the ANUDEM and ordinary kriging algorithms are not suitable for building DEMs with complex topography. Conversely, the linear triangulation, the natural neighbor algorithm, and the thin-plate spline plus tension and completely regularized spline functions ensure the best trade-off among accuracy and shape reliability. Anyway, the evaluation of the effects of gridding techniques on debris \ufb02ow routing modeling reveals that the choice of the interpolation algorithm does not signi\ufb01cantly affect the model outcomes

    Contributions to Statistical Image Analysis for High Content Screening.

    Full text link
    Images of cells incubated with fluorescent small molecule probes can be used to infer where the compounds distribute within cells. Identifying the spatial pattern of compound localization within each cell is very important problem for which adequate statistical methods do not yet exist. First, we asked whether a classifier for subcellular localization categories can be developed based on a training set of manually classified cells. Due to challenges of the images such as uneven field illumination, low resolution, high noise, variation in intensity and contrast, and cell to cell variability in probe distributions, we constructed texture features for contrast quantiles conditioning on intensities, and classifying on artificial cells with same marginal distribution but different conditional distribution supported that this conditioning approach is beneficial to distinguish different localization distributions. Using these conditional features, we obtained satisfactory performance in image classification, and performed to dimension reduction and data visualization. As high content images are subject to several major forms of artifacts, we are interested in the implications of measurement errors and artifacts on our ability to draw scientifically meaningful conclusions from high content images. Specifically, we considered three forms of artifacts: saturation, blurring and additive noise. For each type of artifacts, we artificially introduced larger amount, and aimed to understand the bias by `Simulation Extrapolation' (SIMEX) method, applied to the measurement errors for pairwise centroid distances, the degree of eccentricity in the class-specific distributions, and the angles between the dominant axes of variability for different categories. Finally, we briefly considered the analysis of time-point images. Small molecule studies will be more focused. Specifically, we consider the evolving patterns of subcellular staining from the moment that a compound is introduced into the cell culture medium, to the point that steady state distribution is reached. We construct the degree to which the subcellular staining pattern is concentrated in or near the nucleus as the features of timecourse data set, and aim to determine whether different compounds accumulate in different regions at different times, as characterized in terms of their position in the cell relative to the nucleus.Ph.D.StatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91460/1/liufy_1.pd

    A Framework for the Estimation of Disaggregated Statistical Indicators Using Tree-Based Machine Learning Methods

    Get PDF
    The thesis combines four papers that introduce a coherent framework based on MERFs for the estimation of spatially disaggregated economic and inequality indicators and associated uncertainties. Chapter 1 focusses on flexible domain prediction using MERFs. We discuss characteristics of semi-parametric point and uncertainty estimates for domain-specific means. Extensive model- and design-based simulations highlight advantages of MERFs in comparison to 'traditional' LMM-based SAE methods. Chapter 2 introduces the use of MERFs under limited covariate information. The access to population-level micro-data for auxiliary information imposes barriers for researchers and practitioners. We introduce an approach that adaptively incorporates aggregated auxiliary information using calibration-weights in the absence of unit-level auxiliary data. We apply the proposed method to German survey data and use aggregated covariate census information from the same year to estimate the average opportunity cost of care work for 96 planning regions in Germany. In Chapter 3, we discuss the estimation of non-linear poverty and inequality indicators. Our proposed method allows to estimate domain-specific cumulative distribution functions from which desired (non-linear) poverty estimators can be obtained. We evaluate proposed point and uncertainty estimators in a design-based simulation and focus on a case study uncovering spatial patterns of poverty for the Mexican state of Veracruz. Additionally, Chapter 3 informs a methodological discussion on differences and advantages between the use of predictive algorithms and (linear) statistical models in the context of SAE. The final Chapter 4 complements the previous research by implementing discussed methods for point and uncertainty estimates in the open-source R package SAEforest. The package facilitates the use of discussed methods and accessibly adds MERFs to the existing toolbox for SAE and official statistics. Overall, this work aims to synergize aspects from two statistical spheres (e.g. 'traditional' parametric models and nonparametric predictive algorithms) by critically discussing and adapting tree-based methods for applications in SAE. In this perspective, the thesis contributes to the existing literature along three dimensions: 1) The methodological development of alternative semi-parametric methods for the estimation of non-linear domain-specific indicators and means under unit-level and aggregated auxiliary covariates. 2) The proposition of a general framework that enables further discussions between 'traditional' and algorithmic approaches for SAE as well as an extensive comparison between LMM-based methods and MERFs in applications and several model and design-based simulations. 3) The provision of an open-source software package to facilitate the usability of methods and thus making MERFs and general SAE methodology accessible for tailored research applications of statistical, institutional and political practitioners

    Application of Statistical Methods and Process Models for the Design and Analysis of Activated Sludge Wastewater Treatment Plants (WWTPs)

    Get PDF
    The purpose of this study is to investigate statistical procedures to qualify uncertainty, and explicitly evaluate its impact on wastewater treatment plants (WWTPs). The goal is to develop a statistical-based procedure to design WWTPs that provide reliable protection of water quality, instead of making overly conservative assumptions and adopting empirical safety factors. An innovative Monte Carlo based procedure was developed to quantify the risk of violating effluent as a function of various design decisions. A simulation program called StatASPS was developed to conduct Monte Carlo simulations combined with the ASM1 model. A random influent generator was developed to describe the statistical characteristics of the influent components of WWTPs. Prior to modeling, a two-directional exponential smoothing (TES) method was developed to replace those non-randomly missing data during weekends and holidays. The best models were selected based on various statistics and the ability to forecast future values. The time series models were then used to generate random influent variables with the same statistical characteristics as the original data. The best Monte Carlo simulations were conducted using historical influent data and site-specific parameter distributions, according to the applications to both the Oak Ridge and Seneca WWTPs. This indicates that parameter uncertainty was more effective in predicting uncertainty in plant performance than influent variability. The ultimate simulations were conducted using one-month’s influent data, considering limitations of computing technologies. Application of the method to the two plants demonstrated that this method provided a reliable and reasonable estimate of the uncertainty of plant performance. The best predictions of plant uncertainty were obtained by determining the distribution for the most sensitive parameter and holding all other model parameters constant. The StatASPS procedure proved to be a reliable and reasonable method to design cost-effective WWTPs. With further development, this procedure could provide engineers and regulators with a high degree of confidence that the plant will perform as required, without resorting to overly conservative assumptions or large safety factors

    ISBIS 2016: Meeting on Statistics in Business and Industry

    Get PDF
    This Book includes the abstracts of the talks presented at the 2016 International Symposium on Business and Industrial Statistics, held at Barcelona, June 8-10, 2016, hosted at the Universitat Politècnica de Catalunya - Barcelona TECH, by the Department of Statistics and Operations Research. The location of the meeting was at ETSEIB Building (Escola Tecnica Superior d'Enginyeria Industrial) at Avda Diagonal 647. The meeting organizers celebrated the continued success of ISBIS and ENBIS society, and the meeting draw together the international community of statisticians, both academics and industry professionals, who share the goal of making statistics the foundation for decision making in business and related applications. The Scientific Program Committee was constituted by: David Banks, Duke University Amílcar Oliveira, DCeT - Universidade Aberta and CEAUL Teresa A. Oliveira, DCeT - Universidade Aberta and CEAUL Nalini Ravishankar, University of Connecticut Xavier Tort Martorell, Universitat Politécnica de Catalunya, Barcelona TECH Martina Vandebroek, KU Leuven Vincenzo Esposito Vinzi, ESSEC Business Schoo
    • …
    corecore