41 research outputs found

    Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis

    Full text link
    In diverse fields ranging from finance to omics, it is increasingly common that data is distributed and with multiple individual sources (referred to as ``clients'' in some studies). Integrating raw data, although powerful, is often not feasible, for example, when there are considerations on privacy protection. Distributed learning techniques have been developed to integrate summary statistics as opposed to raw data. In many of the existing distributed learning studies, it is stringently assumed that all the clients have the same model. To accommodate data heterogeneity, some federated learning methods allow for client-specific models. In this article, we consider the scenario that clients form clusters, those in the same cluster have the same model, and different clusters have different models. Further considering the clustering structure can lead to a better understanding of the ``interconnections'' among clients and reduce the number of parameters. To this end, we develop a novel penalization approach. Specifically, group penalization is imposed for regularized estimation and selection of important variables, and fusion penalization is imposed to automatically cluster clients. An effective ADMM algorithm is developed, and the estimation, selection, and clustering consistency properties are established under mild conditions. Simulation and data analysis further demonstrate the practical utility and superiority of the proposed approach

    Structured analysis of the high-dimensional FMR model

    Get PDF
    Abstract(#br)The finite mixture of regression (FMR) model is a popular tool for accommodating data heterogeneity. In the analysis of FMR models with high-dimensional covariates, it is necessary to conduct regularized estimation and identify important covariates rather than noises. In the literature, there has been a lack of attention paid to the differences among important covariates, which can lead to the underlying structure of covariate effects. Specifically, important covariates can be classified into two types: those that behave the same in different subpopulations and those that behave differently. It is of interest to conduct structured analysis to identify such structures, which will enable researchers to better understand covariates and their associations with outcomes. Specifically, the FMR model with high-dimensional covariates is considered. A structured penalization approach is developed for regularized estimation, selection of important variables, and, equally importantly, identification of the underlying covariate effect structure. The proposed approach can be effectively realized, and its statistical properties are rigorously established. Simulation demonstrates its superiority over alternatives. In the analysis of cancer gene expression data, interesting models/structures missed by the existing analysis are identified

    Health insurance coverage and impact: a survey in three cities in China.

    Get PDF
    BACKGROUND: China has one of the world's largest health insurance systems, composed of government-run basic health insurance and commercial health insurance. The basic health insurance has undergone system-wide reform in recent years. Meanwhile, there is also significant development in the commercial health insurance sector. A phone call survey was conducted in three major cities in China in July and August, 2011. The goal was to provide an updated description of the effect of health insurance on the population covered. Of special interest were insurance coverage, gross and out-of-pocket medical cost and coping strategies. RESULTS: Records on 5,097 households were collected. Analysis showed that smaller households, higher income, lower expense, presence of at least one inpatient treatment and living in rural areas were significantly associated with a lower overall coverage rate. In the separate analysis of basic and commercial health insurance, similar factors were found to have significant associations. Higher income, presence of chronic disease, presence of inpatient treatment, higher coverage rates and living in urban areas were significantly associated with higher gross medical cost. A similar set of factors were significantly associated with higher out-of-pocket cost. Households with lower income, inpatient treatment, higher commercial insurance coverage, and living in rural areas were significantly more likely to pursue coping strategies other than salary. CONCLUSIONS: The surveyed cities and surrounding rural areas had socioeconomic status far above China's average. However, there was still a need to further improve coverage. Even for households with coverage, there was considerable out-of-pocket medical cost, particularly for households with inpatient treatments and/or chronic diseases. A small percentage of households were unable to self-finance out-of-pocket medical cost. Such observations suggest possible targets for further improving the health insurance system

    New ratio DEA software for measuring efficiency of industrial departments

    No full text
    As the environmental problem becomes more and more serious in the development of society, it has drawn a lot of attentions from every government. Industry prompts the development of economy at the same time produces a lot of pollutions, such as smoke pollution and waste. Evaluating the efficiency of the departments of industry benefits district government to decide which departments should be developed in priority. In this study, the new ratio model in data envelopment analysis (DEA) is proposed and applied for evaluating the industrial departments of Chongqing City of China. Moreover, some suggestions are given. 漏 2012 ACADEMY PUBLISHER

    Health Insurance Coverage and Impact: A Survey in Three Cities in China

    No full text
    Background: China has one of the world's largest health insurance systems, composed of government-run basic health insurance and commercial health insurance. The basic health insurance has undergone system-wide reform in recent years. Meanwhile, there is also significant development in the commercial health insurance sector. A phone call survey was conducted in three major cities in China in July and August, 2011. The goal was to provide an updated description of the effect of health insurance on the population covered. Of special interest were insurance coverage, gross and out-of-pocket medical cost and coping strategies. Results: Records on 5,097 households were collected. Analysis showed that smaller households, higher income, lower expense, presence of at least one inpatient treatment and living in rural areas were significantly associated with a lower overall coverage rate. In the separate analysis of basic and commercial health insurance, similar factors were found to have significant associations. Higher income, presence of chronic disease, presence of inpatient treatment, higher coverage rates and living in urban areas were significantly associated with higher gross medical cost. A similar set of factors were significantly associated with higher out-of-pocket cost. Households with lower income, inpatient treatment, higher commercial insurance coverage, and living in rural areas were significantly more likely to pursue coping strategies other than salary. Conclusions: The surveyed cities and surrounding rural areas had socioeconomic status far above China's average. However, there was still a need to further improve coverage. Even for households with coverage, there was considerable out-of-pocket medical cost, particularly for households with inpatient treatments and/or chronic diseases. A small percentage of households were unable to self-finance out-of-pocket medical cost. Such observations suggest possible targets for further improving the health insurance system.Fundamental Research Funds for the Central Universities [2010221040]; Fujian Social Science Funds from China [2011C042

    Three-part model for fractional response variables with application to Chinese household health insurance coverage

    No full text
    National Natural Science Foundation of China [71201139]; National Bureau of Statistics Funds [2011LD002]; Fundamental Research Funds for the Central Universities from China [2010221040]A survey on health insurance was conducted in July and August of 2011 in three major cities in China. In this study, we analyze the household coverage rate, which is an important index of the quality of health insurance. The coverage rate is restricted to the unit interval [0, 1], and it may differ from other rate data in that the two corners are nonzero. That is, there are nonzero probabilities of zero and full coverage. Such data may also be encountered in economics, finance, medicine, and many other areas. The existing approaches may not be able to properly accommodate such data. In this study, we develop a three-part model that properly describes fractional response variables with non-ignorable zeros and ones. We investigate estimation and inference under two proportional constraints on the regression parameters. Such constraints may lead to more lucid interpretations and fewer unknown parameters and hence more accurate estimation. A simulation study is conducted to compare the performance of constrained and unconstrained models and show that estimation under constraint can be more efficient. The analysis of household health insurance coverage data suggests that household size, income, expense, and presence of chronic disease are associated with insurance coverage

    Variable selection for credit risk model using data mining technique

    No full text
    With the emergence of the current financial crisis, societies see the increasing importance of credit risks management in financial institutions. Four mainstream credit risk rating models have been developed, however, their applicability in the Taiwan market is yet to be evaluated. In this paper, six major credit risk models, including Merton Option Pricing Model, Discriminant Analysis Model, Logistic Regression (Logit) Model, Probit Model, Survival Analysis Model, and Artificial Neural Network Model were examined, in order to identify the common variables applicable to each model. The common variables were then applied to each respective model directly. Using Transition Matrix and mapping methods to estimate long term default probability, for developing appropriate credit risk model with the estimated default probability. ? 2011 ACADEMY PUBLISHER

    Out-of-pocket medical cost: univariate and multivariate logistic regressions.

    No full text
    <p>Numbers are “odds ratio (p-value)”. “Baseline” represents the reference group for OR calculation. Sample size  = 5070.</p
    corecore