455 research outputs found

    Variable selection via penalized regression and the genetic algorithm using information complexity, with applications for high-dimensional -omics data

    Get PDF
    This dissertation is a collection of examples, algorithms, and techniques for researchers interested in selecting influential variables from statistical regression models. Chapters 1, 2, and 3 provide background information that will be used throughout the remaining chapters, on topics including but not limited to information complexity, model selection, covariance estimation, stepwise variable selection, penalized regression, and especially the genetic algorithm (GA) approach to variable subsetting. In chapter 4, we fully develop the framework for performing GA subset selection in logistic regression models. We present advantages of this approach against stepwise and elastic net regularized regression in selecting variables from a classical set of ICU data. We further compare these results to an entirely new procedure for variable selection developed explicitly for this dissertation, called the post hoc adjustment of measured effects (PHAME). In chapter 5, we reproduce many of the same results from chapter 4 for the first time in a multinomial logistic regression setting. The utility and convenience of the PHAME procedure is demonstrated on a set of cancer genomic data. Chapter 6 marks a departure from supervised learning problems as we shift our focus to unsupervised problems involving mixture distributions of count data from epidemiologic fields. We start off by reintroducing Minimum Hellinger Distance estimation alongside model selection techniques as a worthy alternative to the EM algorithm for generating mixtures of Poisson distributions. We also create for the first time a GA that derives mixtures of negative binomial distributions. The work from chapter 6 is incorporated into chapters 7 and 8, where we conclude the dissertation with a novel analysis of mixtures of count data regression models. We provide algorithms based on single and multi-target genetic algorithms which solve the mixture of penalized count data regression models problem, and demonstrate the usefulness of this technique on HIV count data that were used in a previous study published by Gray, Massaro, et al. (2015) as well as on time-to-event data taken from the cancer genomic data sets from earlier

    Factors affecting child mortality in Lesotho using 2009 and 2014 LDHS data.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Pietermaritzburg.Child mortality rate is known to be the important indicator of social development, quality of life, welfare as well as the overall health of the society. In most countries, especially the developing countries; the death of a child is usually caused by transferable, preventable diseases and poor health. Progress in improving under-five mortality since 1990 has been made globally. There has been a decline globally in under-five mortality from 12.7 million in 1990 to approximately 6 million in 2015. All regions except the developing countries in Sub-Saharan Africa, Central Asia, Southern Asia and Oceania had reduced the rate by 52% or more in 2013. Lesotho is a developing country with one of the highest rates of infant and child mortality. The study uncovers the factors influencing child mortality in Lesotho based on the Lesotho Demographic and Health Surveys for 2009 and 2014. The survey logistic regression, a model under the generalized linear model framework was used to find the factors related to under-five child mortality to account for the sampling designs complexity. The SLR model is not able to account for variability occurring from connection between subjects from the equal clusters and household. The generalized linear mixed model is then put into application. To ease the normality assumptions and the linearity assumption in the parametric models, the semi-parametric generalized additive model, was lastly used for the data. Finding the determining factors that result in child mortality will benefit the way intervention programs are planned and the formulation for policy makers to lead in the decreasing of child mortality; and accomplish MDGs. This study intends to improve the existing knowledge on child mortality in Lesotho by studying the determining factors in detail. Based on the previous studies this paper will recommend intervention designs and policy formulation. Overall, the findings of this research showed that birth order number, weight of child at birth, age of child, breastfeeding, wealth index, education attainment, mother’s age, type of place of residence, number of children living were the key determining factors of the under-five mortality in Lesotho. The study displays that policy makers should strengthen the interventions for child health in order to decrease child under-five mortality. The results achieved can help with the policy formulation to control and reduce child mortality. The government should continually assess current programs to review and develop programs that are more applicable

    A Survey on Causal Discovery Methods for Temporal and Non-Temporal Data

    Full text link
    Causal Discovery (CD) is the process of identifying the cause-effect relationships among the variables from data. Over the years, several methods have been developed primarily based on the statistical properties of data to uncover the underlying causal mechanism. In this study we introduce the common terminologies in causal discovery, and provide a comprehensive discussion of the approaches designed to identify the causal edges in different settings. We further discuss some of the benchmark datasets available for evaluating the performance of the causal discovery algorithms, available tools to perform causal discovery readily, and the common metrics used to evaluate these methods. Finally, we conclude by presenting the common challenges involved in CD and also, discuss the applications of CD in multiple areas of interest

    Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain

    Get PDF
    466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts

    CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

    Get PDF
    The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

    Attitudes towards old age and age of retirement across the world: findings from the future of retirement survey

    Get PDF
    The 21st century has been described as the first era in human history when the world will no longer be young and there will be drastic changes in many aspects of our lives including socio-demographics, financial and attitudes towards the old age and retirement. This talk will introduce briefly about the Global Ageing Survey (GLAS) 2004 and 2005 which is also popularly known as “The Future of Retirement”. These surveys provide us a unique data source collected in 21 countries and territories that allow researchers for better understanding the individual as well as societal changes as we age with regard to savings, retirement and healthcare. In 2004, approximately 10,000 people aged 18+ were surveyed in nine counties and one territory (Brazil, Canada, China, France, Hong Kong, India, Japan, Mexico, UK and USA). In 2005, the number was increased to twenty-one by adding Egypt, Germany, Indonesia, Malaysia, Poland, Russia, Saudi Arabia, Singapore, Sweden, Turkey and South Korea). Moreover, an additional 6320 private sector employers was surveyed in 2005, some 300 in each country with a view to elucidating the attitudes of employers to issues relating to older workers. The paper aims to examine the attitudes towards the old age and retirement across the world and will indicate some policy implications

    Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain

    Get PDF
    466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts

    Improving Outcomes in Machine Learning and Data-Driven Learning Systems using Structural Causal Models

    Get PDF
    The field of causal inference has experienced rapid growth and development in recent years. Its significance in addressing a diverse array of problems and its relevance across various research and application domains are increasingly being acknowledged. However, the current state-of-the-art approaches to causal inference have not yet gained widespread adoption in mainstream data science practices. This research endeavor begins by seeking to motivate enthusiasm for contemporary approaches to causal investigation utilizing observational data. It explores the existing applications and potential future prospects for employing causal inference methods to enhance desired outcomes in data-driven learning applications across various domains, with a particular focus on their relevance in artificial intelligence (AI). Following this motivation, this dissertation proceeds to offer a broad review of fundamental concepts, theoretical frameworks, methodological advancements, and existing techniques pertaining to causal inference. The research advances by investigating the problem of data-driven root cause analysis through the lens of causal structure modeling. Data-driven approaches to root cause analysis (RCA) have received attention recently due to their ability to exploit increasing data availability for more effective root cause identification in complex processes. Advancements in the field of causal inference enable unbiased causal investigations using observational data. This study proposes a data-driven RCA method and a time-to-event (TTE) data simulation procedure built on the structural causal model (SCM) framework. A novel causality-based method is introduced for learning a representation of root cause mechanisms, termed in this work as root cause graphs (RCGs), from observational TTE data. Three case scenarios are used to generate TTE datasets for evaluating the proposed method. The utility of the proposed RCG recovery method is demonstrated by using recovered RCGs to guide the estimation of root cause treatment effects. In the presence of mediation, RCG-guided models produce superior estimates of root cause total effects compared to models that adjust for all covariates. The author delves into the subject of integrating causal inference and machine learning. Incorporating causal inference into machine learning offers many benefits including enhancing model interpretability and robustness to changes in data distributions. This work considers the task of feature selection for prediction model development in the context of potentially changing environments. First, a filter feature selection approach that improves on the select k-best method and prioritizes causal features is introduced and compared to the standard select k-best algorithm. Secondly, a causal feature selection algorithm which adapts to covariate shifts in the target domain is proposed for domain adaptation. Causal approaches to feature selection are demonstrated to be capable of yielding optimal prediction performance when modeling assumptions are met. Additionally, they can mitigate the degrading effects of some forms of dataset shifts on prediction performance

    Reliability and Cost Impacts for Attritable Systems

    Get PDF
    Attritable systems trade system attributes like reliability and reparability to achieve lower acquisition cost and decrease cost risk. Ultimately, it is hoped that by trading these attributes the amount of systems able to be acquired will be increased. However, the effect of trading these attributes on system-level reliability and cost risk is difficult to express complicated reparable systems like an air vehicle. Failure-time and cost data from a baseline limited-life air vehicle is analyzed for this reliability and reparability trade study. The appropriateness of various reliability and cost estimation techniques are examined for these data. This research employs the cumulative incidence function as an input to discrete time non-homogeneous Markov chain models to overcome the hurdles of representing the failure-time data of a reparable system with competing failure modes that vary with time. This research quantifies the probability of system survival to a given sortie, S(n), average unit flyaway cost (AUFC), and cost risk metrics to convey the value of reliability and reparability trades. Investigation of the benefit of trading system reparability shows a marked increase in cost risk. Yet, trades in subsystem reliability calculate the required decrease in subsystem cost required to make such a trade advantageous. This research results in a trade-space analysis tool that can be used to guide the development of future attritable air vehicles
    corecore