412 research outputs found

    On the uses and abuses of regression models: a call for reform of statistical practice and teaching

    Full text link
    When students and users of statistical methods first learn about regression analysis there is an emphasis on the technical details of models and estimation methods that invariably runs ahead of the purposes for which these models might be used. More broadly, statistics is widely understood to provide a body of techniques for "modelling data", underpinned by what we describe as the "true model myth", according to which the task of the statistician/data analyst is to build a model that closely approximates the true data generating process. By way of our own historical examples and a brief review of mainstream clinical research journals, we describe how this perspective leads to a range of problems in the application of regression methods, including misguided "adjustment" for covariates, misinterpretation of regression coefficients and the widespread fitting of regression models without a clear purpose. We then outline an alternative approach to the teaching and application of regression methods, which begins by focussing on clear definition of the substantive research question within one of three distinct types: descriptive, predictive, or causal. The simple univariable regression model may be introduced as a tool for description, while the development and application of multivariable regression models should proceed differently according to the type of question. Regression methods will no doubt remain central to statistical practice as they provide a powerful tool for representing variation in a response or outcome variable as a function of "input" variables, but their conceptualisation and usage should follow from the purpose at hand.Comment: 24 pages main document including 3 figures, plus 15 pages supplementary material. Based on plenary lecture (President's Invited Speaker) delivered to ISCB43, Newcastle, UK, August 2022. Submitted for publication 12-Sep-2

    Contributions aux modèles de régression avec réponses manquantes : risques concurrents et données longitudinales

    Get PDF
    Missing data are a common occurrence in medical studies. In regression modeling, missing outcomes limit our capability to draw inferences about the covariate effects of medical interest, which are those describing the distribution of the entire set of planned outcomes. In addition to losing precision, the validity of any method used to draw inferences from the observed data will require that some assumption about the mechanism leading to missing outcomes holds. Rubin (1976, Biometrika, 63:581-592) called the missingness mechanism MAR (for “missing at random”) if the probability of an outcome being missing does not depend on missing outcomes when conditioning on the observed data, and MNAR (for “missing not at random”) otherwise. This distinction has important implications regarding the modeling requirements to draw valid inferences from the available data, but generally it is not possible to assess from these data whether the missingness mechanism is MAR or MNAR. Hence, sensitivity analyses should be routinely performed to assess the robustness of inferences to assumptions about the missingness mechanism. In the field of incomplete multivariate data, in which the outcomes are gathered in a vector for which some components may be missing, MAR methods are widely available and increasingly used, and several MNAR modeling strategies have also been proposed. On the other hand, although some sensitivity analysis methodology has been developed, this is still an active area of research. The first aim of this dissertation was to develop a sensitivity analysis approach for continuous longitudinal data with drop-outs, that is, continuous outcomes that are ordered in time and completely observed for each individual up to a certain time-point, at which the individual drops-out so that all the subsequent outcomes are missing. The proposed approach consists in assessing the inferences obtained across a family of MNAR pattern-mixture models indexed by a so-called sensitivity parameter that quantifies the departure from MAR. The approach was prompted by a randomized clinical trial investigating the benefits of a treatment for sleep-maintenance insomnia, from which 22% of the individuals had dropped-out before the study end. The second aim was to build on the existing theory for incomplete multivariate data to develop methods for competing risks data with missing causes of failure. The competing risks model is an extension of the standard survival analysis model in which failures from different causes are distinguished. Strategies for modeling competing risks functionals, such as the cause-specific hazards (CSH) and the cumulative incidence function (CIF), generally assume that the cause of failure is known for all patients, but this is not always the case. Some methods for regression with missing causes under the MAR assumption have already been proposed, especially for semi-parametric modeling of the CSH. But other useful models have received little attention, and MNAR modeling and sensitivity analysis approaches have never been considered in this setting. We propose a general framework for semi-parametric regression modeling of the CIF under MAR using inverse probability weighting and multiple imputation ideas. Also under MAR, we propose a direct likelihood approach for parametric regression modeling of the CSH and the CIF. Furthermore, we consider MNAR pattern-mixture models in the context of sensitivity analyses. In the competing risks literature, a starting point for methodological developments for handling missing causes was a stage II breast cancer randomized clinical trial in which 23% of the deceased women had missing cause of death. We use these data to illustrate the practical value of the proposed approaches.Les données manquantes sont fréquentes dans les études médicales. Dans les modèles de régression, les réponses manquantes limitent notre capacité à faire des inférences sur les effets des covariables décrivant la distribution de la totalité des réponses prévues sur laquelle porte l'intérêt médical. Outre la perte de précision, toute inférence statistique requière qu'une hypothèse sur le mécanisme de manquement soit vérifiée. Rubin (1976, Biometrika, 63:581-592) a appelé le mécanisme de manquement MAR (pour les sigles en anglais de « manquant au hasard ») si la probabilité qu'une réponse soit manquante ne dépend pas des réponses manquantes conditionnellement aux données observées, et MNAR (pour les sigles en anglais de « manquant non au hasard ») autrement. Cette distinction a des implications importantes pour la modélisation, mais en général il n'est pas possible de déterminer si le mécanisme de manquement est MAR ou MNAR à partir des données disponibles. Par conséquent, il est indispensable d'effectuer des analyses de sensibilité pour évaluer la robustesse des inférences aux hypothèses de manquement.Pour les données multivariées incomplètes, c'est-à-dire, lorsque l'intérêt porte sur un vecteur de réponses dont certaines composantes peuvent être manquantes, plusieurs méthodes de modélisation sous l'hypothèse MAR et, dans une moindre mesure, sous l'hypothèse MNAR ont été proposées. En revanche, le développement de méthodes pour effectuer des analyses de sensibilité est un domaine actif de recherche. Le premier objectif de cette thèse était de développer une méthode d'analyse de sensibilité pour les données longitudinales continues avec des sorties d'étude, c'est-à-dire, pour les réponses continues, ordonnées dans le temps, qui sont complètement observées pour chaque individu jusqu'à la fin de l'étude ou jusqu'à ce qu'il sorte définitivement de l'étude. Dans l'approche proposée, on évalue les inférences obtenues à partir d'une famille de modèles MNAR dits « de mélange de profils », indexés par un paramètre qui quantifie le départ par rapport à l'hypothèse MAR. La méthode a été motivée par un essai clinique étudiant un traitement pour le trouble du maintien du sommeil, durant lequel 22% des individus sont sortis de l'étude avant la fin.Le second objectif était de développer des méthodes pour la modélisation de risques concurrents avec des causes d'évènement manquantes en s'appuyant sur la théorie existante pour les données multivariées incomplètes. Les risques concurrents apparaissent comme une extension du modèle standard de l'analyse de survie où l'on distingue le type d'évènement ou la cause l'ayant entrainé. Les méthodes pour modéliser le risque cause-spécifique et la fonction d'incidence cumulée supposent en général que la cause d'évènement est connue pour tous les individus, ce qui n'est pas toujours le cas. Certains auteurs ont proposé des méthodes de régression gérant les causes manquantes sous l'hypothèse MAR, notamment pour la modélisation semi-paramétrique du risque. Mais d'autres modèles n'ont pas été considérés, de même que la modélisation sous MNAR et les analyses de sensibilité. Nous proposons des estimateurs pondérés et une approche par imputation multiple pour la modélisation semi-paramétrique de l'incidence cumulée sous l'hypothèse MAR. En outre, nous étudions une approche par maximum de vraisemblance pour la modélisation paramétrique du risque et de l'incidence sous MAR. Enfin, nous considérons des modèles de mélange de profils dans le contexte des analyses de sensibilité. Un essai clinique étudiant un traitement pour le cancer du sein de stade II avec 23% des causes de décès manquantes sert à illustrer les méthodes proposées

    Confounding-adjustment methods for the causal difference in medians

    Get PDF
    Background With continuous outcomes, the average causal effect is typically defined using a contrast of expected potential outcomes. However, in the presence of skewed outcome data, the expectation (population mean) may no longer be meaningful. In practice the typical approach is to continue defining the estimand this way or transform the outcome to obtain a more symmetric distribution, although neither approach may be entirely satisfactory. Alternatively the causal effect can be redefined as a contrast of median potential outcomes, yet discussion of confounding-adjustment methods to estimate the causal difference in medians is limited. In this study we described and compared confounding-adjustment methods to address this gap. Methods The methods considered were multivariable quantile regression, an inverse probability weighted (IPW) estimator, weighted quantile regression (another form of IPW) and two little-known implementations of g-computation for this problem. Methods were evaluated within a simulation study under varying degrees of skewness in the outcome and applied to an empirical study using data from the Longitudinal Study of Australian Children. Results Simulation results indicated the IPW estimator, weighted quantile regression and g-computation implementations minimised bias across all settings when the relevant models were correctly specified, with g-computation additionally minimising the variance. Multivariable quantile regression, which relies on a constant-effect assumption, consistently yielded biased results. Application to the empirical study illustrated the practical value of these methods. Conclusion The presented methods provide appealing avenues for estimating the causal difference in medians.Peer reviewe

    RECONSTRUCCIÓN DE LA MEMORIA HISTÓRICA DEL CONFLICTO ARMADO EN EL MUNICIPIO DE EL CARMEN DE VIBORAL, ANTIOQUIA

    Get PDF
    El presente documento atiende a la necesidad de reconstruir desde la memoria al preguntarnos ¿Porque recordar? Para reconocer la importancia de recuperar la memoria colectiva de lo sucedido durante el conflicto armado en el municipio de El Carmen de Viboral entre 1.990 al 2010 para comprender ¿qué pasó, porqué paso, cómo paso, quiénes fueron los actores y que se debe hacerse para que esto no se repita? Por otra parte, se pretende recuperar parte de la memoria histórica de lo vivido en el municipio de El Carmen de Viboral durante dos décadas 1.990 y 2010, conflicto vivido por las comunidades rurales y el gremio transportador del municipio de El Carmen de Viboral en Antioquia. Para adelantar la reconstrucción de la memoria se acude a ejercicios de cartografía social en donde se escoge como referente el mapeo de las zonas del municipio más afectadas por el conflicto armado, las vías de acceso a las veredas y la zona urbana del municipio de El Carmen de Viboral, se toma como metodología no solo la cartografía social para la ubicación espacial de las zonas más afectadas sino además, la conformación de grupos focales conformados por líderes de las veredas más afectadas con quienes se construyen relatos de los hechos y recuerdos que para ellos fueron más representativos, se conforman 5 equipos focales en 5 sectores del municipio: 4 de ellos ubicados en las veredas y corregimientos, tomando como referencia las vías de comunicación de acceso vehicular a las principales veredas y un quinto grupo conformado por el sector transportador del municipio, al ser este sector uno de los más afectados por la violencia durante los años 90s y 2.000, se realiza acercamiento con conductores e integrantes del gremio transportador, ya que se tiene referencia que este sector fue uno de los más afectados durante el conflicto al ser estigmatizados y acusados de auxiliadores de los grupos armados presentes en el municipio, siendo objeto de múltiples acciones violentas como amenazas, desapariciones, tortura, asesinatos y por último se realizan entrevistas a secretarios de gobierno y personeros que ejercieron sus cargos en los años de mayor crudeza del conflicto.This document addresses the need to reconstruct from memory by asking us Why remember? To recognize the importance of recovering the collective memory of what happened during the armed conflict in the municipality of El Carmen de Viboral between 1990 and 2010 to understand what happened, why did it happen, how did it happen, who were the actors and what should be done to That this is not repeated? On the other hand, it is intended to recover part of the historical memory of what was lived in the municipality of El Carmen de Viboral during two decades 1990 and 2010, conflict lived by the rural communities and the transport guild of the municipality of El Carmen de Viboral in Antioqui

    On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice.

    Get PDF
    The not-at-random fully conditional specification (NARFCS) procedure provides a flexible means for the imputation of multivariable missing data under missing-not-at-random conditions. Recent work has outlined difficulties with eliciting the sensitivity parameters of the procedure from expert opinion due to their conditional nature. Failure to adequately account for this conditioning will generate imputations that are inconsistent with the assumptions of the user. In this paper, we clarify the importance of correct conditioning of NARFCS sensitivity parameters and develop procedures to calibrate these sensitivity parameters by relating them to more easily elicited quantities, in particular, the sensitivity parameters from simpler pattern mixture models. Additionally, we consider how to include the missingness indicators as part of the imputation models of NARFCS, recommending including all of them in each model as default practice. Algorithms are developed to perform the calibration procedure and demonstrated on data from the Avon Longitudinal Study of Parents and Children, as well as with simulation studies

    Una aproximación multi-agente para el soporte al proceso de extracción- transformación-carga en bodegas de datos

    Get PDF
    In order to provide an adequate solution in terms of robustness and automation in the process of Extract-Transform-Load (ETL) in data warehouses, in this article a multi-agent model that gathers the strengths of other approaches like wrappers and ad-hoc solutions is presented. Such a model considers the heterogeneity and availability of the data sources as well as their distributed nature. For its validation an experiment was performed using simulated and real data, which demonstrated not only its technical feasibility but also its effectiveness in terms of the percentage of processed data and the time to accomplish it.Para brindar una solución adecuada en términos de robustez y auto-matización en el proceso de Extracción-Transformación-Carga (ETL por sus siglas en inglés) en bodegas de datos, en este artículo se presenta un modelo de sistema multi-agente que recopila las fortalezas de otros enfo-ques como son los wrappers y soluciones ad-hoc. Tal modelo considera la heterogeneidad y disponibilidad de las fuentes de datos, así como el carác-ter distribuido de los mismos. Para su validación se llevó a cabo una experimentación con datos tanto simulados como reales, la cual demostró no sólo su viabilidad técnica si no también su efectividad en cuanto a porcentaje de datos procesados y a tiempo para hacerlo

    Prototipo de mano mecatrónica para aplicaciones en robótica industrial

    Get PDF
    89 páginasOne of the principal limitations of industrial robots is directly related with the use of multiple final effectors, which are exchanged regularly in order to execute different tasks with the same robot. Considering the lost time while changing final effectors, this work is focused on the development of a final effector prototype of the mechatronic hand type, which could be used as a generic final effector on manipulating tasks. The prototype will permit the manipulation of different geometric volumes, emulating the human hand´s basic movements, which are operated by an instrumented glove or wirelessly through an android device.Una de las principales limitaciones que presentan los brazos robóticos industriales está directamente relacionada con el uso de múltiples efectores finales, los cuales se deben intercambiar para poder ejecutar diferentes tareas. Por lo anterior, el presente trabajo expone el desarrollo de un prototipo de elemento terminal, tipo mano mecatrónica, el cual puede ser implementado como efector final genérico en tareas de manipulación. El prototipo construido permite manipular diferentes volúmenes geométricos, emulando en sus movimientos básicos la mano humana. Además, éste puede ser operado por un guante instrumentado e inalámbricamente.PregradoIngeniero(a) Mecatrónico(a

    Effect of the RGB Wavelengths of LED Light on Growth Rates of Nile Tilapia Fry in Biofloc Technology (BFT) Systems

    Get PDF
    This research evaluates the effect of wavelengths of the light on growth rates of Nile tilapia fry in the order of improving sustainability in aquaculture production. For this purpose, four tanks of water with tilapias were studied. Three tanks were illuminated with LED lamps each one with monochromatic peak wavelengths (): Blue light (BL) tank with = 451.67 nm, Green light (GL) with = 513.33 nm and Red light (RL) tank with = 627.27 nm. All tanks were illuminated with a light intensity of 0.832 ⁄2, and they had a photoperiod of 18L:6D throughout the study. Besides, the fourth tank was illuminated only by Natural light (NL) tank, which had the function of witness tank. Each treatment included the fourth, were randomly assigned to 150L tanks that were stocked with 122 Nile tilapia fry. The Nile tilapia fry had an initial average weight of 0.24 ± 0.01 , and were grown for 73 days. The average final weight for BL, GL, RL and NL treatments were 15.54 g, 16.84 g, 17.27 g and 16.22 g, respectively. The results suggest that Nile tilapia fry was positively influenced by the red light wavelength, which was represented in the greatest mass gain
    corecore