17 research outputs found

    A Comprehensive Empirical Study of Bugs in Open-Source Federated Learning Frameworks

    Full text link
    Federated learning (FL) is a distributed machine learning (ML) paradigm, allowing multiple clients to collaboratively train shared machine learning (ML) models without exposing clients' data privacy. It has gained substantial popularity in recent years, especially since the enforcement of data protection laws and regulations in many countries. To foster the application of FL, a variety of FL frameworks have been proposed, allowing non-experts to easily train ML models. As a result, understanding bugs in FL frameworks is critical for facilitating the development of better FL frameworks and potentially encouraging the development of bug detection, localization and repair tools. Thus, we conduct the first empirical study to comprehensively collect, taxonomize, and characterize bugs in FL frameworks. Specifically, we manually collect and classify 1,119 bugs from all the 676 closed issues and 514 merged pull requests in 17 popular and representative open-source FL frameworks on GitHub. We propose a classification of those bugs into 12 bug symptoms, 12 root causes, and 18 fix patterns. We also study their correlations and distributions on 23 functionalities. We identify nine major findings from our study, discuss their implications and future research directions based on our findings

    Time Kinetics and prognosis roles of calcitonin after surgery for medullary thyroid carcinoma

    No full text
    Abstract Background Medullary thyroid carcinoma (MTC) is a malignant tumor with low incidence. Currently, most studies have focused on the prognostic risk factors of MTC, whatever, time kinetic and risk factors related to calcitonin normalization (CN) and biochemical persistence/recurrence (BP) are yet to be elucidated. Methods A retrospective study was conducted for 190 MTC patients. Risk factors related to calcitonin normalization (CN) and biochemical persistence/recurrence (BP) were analyzed. The predictors of calcitonin normalization time (CNT) and biochemical persistent/recurrent time (BPT) were identified. Further, the prognostic roles of CNT and BPT were also demonstrated. Results The 5- and 10-year DFS were 86.7% and 70.2%, respectively. The 5- and 10-year OS were 97.6% and 78.8%, respectively. CN was achieved in 120 (63.2%) patients, whereas BP was presented in 76 (40.0%) patients at the last follow up. After curative surgery, 39 (32.5%) and 106 (88.3%) patients achieved CN within 1 week and 1 month. All patients who failed to achieve CN turned to BP over time and 32/70 of them developed structural recurrence. The median time of CNT and BPT was 1 month (1 day to 84 months) and 6 month (3 day to 63months), respectively. LNR > 0.23 and male gender were independent predictors for CN and BP. LNR > 0.23 (Hazard ratio (HR), 0.24; 95% CI,0.13–0.46; P  0.23 (HR,5.10; 95% CI,2.15–12.11; P  1400ng/L (HR,2.34; 95% CI,1.29–4.25; P = 0.005) for shorter BPT. In survival analysis, primary tumor size > 2 cm (HR, 5.81; 95% CI,2.20-15.38; P  1 month (HR, 5.69; 95% CI, 1.17–27.61; P = 0.031) and multifocality (HR, 3.10; 95% CI, 1.45–6.65; P = 0.004) were independent predictor of DFS. Conclusion Early changes of Ctn after curative surgery can predict the long-term risks of biochemical and structural recurrence, which provide a useful real-time prognostic information. LNR significantly affect the time kinetic of biochemical prognosis. Tumor burden and CNT play a crucial role in MTC survival, the intensity of follow-up must be tailored accordingly

    A dynamic machine learning model for prediction of NAFLD in a health checkup population: A longitudinal study

    No full text
    Background: Non-alcoholic fatty liver disease (NAFLD) is one of the most common liver diseases worldwide. Currently, most NAFLD prediction models are diagnostic models based on cross-sectional data, which failed to provide early identification or clarify causal relationships. We aimed to use time-series deep learning models with longitudinal health checkup records to predict the onset of NAFLD in the future, and update the model stepwise by incorporating new checkup records to achieve dynamic prediction. Methods: 10,493 participants with over 6 health checkup records from Beijing MJ Health Screening Center were included to conduct a retrospective cohort study, in which the constantly updated initial 5 checkup data were incorporated stepwise to predict the risk of NAFLD at and after their sixth health checkups. A total of 33 variables were considered, consisting of demographic characteristics, medical history, lifestyle, physical examinations, and laboratory tests. L1-penalized logistic regression (LR) was used for feature selection. The long short-term memory (LSTM) algorithm was introduced for model development, and five-fold cross-validation was conducted to tune and choose optimal hyperparameters. Both internal validation and external validation were conducted, using the 20% randomly divided holdout test dataset and previously unseen data from Shanghai MJ Health Screening Center, respectively, to evaluate model performance. The evaluation metrics included area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, Brier score, and decision curve. Bootstrap sampling was implemented to generate 95% confidence intervals of all the metrics. Finally, the Shapley additive explanations (SHAP) algorithm was applied in the holdout test dataset for model interpretability to obtain time-specific and sample-specific contributions of each feature. Results: Among the 10,493 participants, 1662 (15.84%) were diagnosed with NAFLD at and after their sixth health checkups. The predictive performance of the deep learning model in the internal validation dataset improved over the incorporation of the checkups, with AUROC increasing from 0.729 (95% CI: 0.698,0.760) at baseline to 0.818 (95% CI: 0.798,0.844) when consecutive 5 checkups were included. The external validation dataset, containing 1728 participants, was used to verify the results, in which AUROC increased from 0.700 (95% CI: 0.657,0.740) with only the first checkups to 0.792 (95% CI: 0.758,0.825) with all five. The results of feature significance showed that body fat percentage, alanine transaminase (ALT), and uric acid owned the greatest impact on the outcome, time-specific, individual-specific and dynamic feature contributions were also produced for model interpretability. Conclusion: A dynamic prediction model was successfully established in our study, and the prediction capability kept improving with the renewal of the latest checkup records. In addition, we identified key features associated with the onset of NAFLD, making it possible to optimize the prevention and control strategies of the disease in the general population

    Combinatorial Use of Machine Learning and Logistic Regression for Predicting Carotid Plaque Risk Among 5.4 Million Adults With Fatty Liver Disease Receiving Health Check-Ups: Population-Based Cross-Sectional Study

    No full text
    BackgroundCarotid plaque can progress into stroke, myocardial infarction, etc, which are major global causes of death. Evidence shows a significant increase in carotid plaque incidence among patients with fatty liver disease. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons, resulting in a large number of patients with undetected carotid plaques, especially among those with fatty liver disease. ObjectiveThis study aimed to combine the advantages of machine learning (ML) and logistic regression to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque. MethodsOur study included 5,420,640 participants with fatty liver from Meinian Health Care Center. We used random forest, elastic net (EN), and extreme gradient boosting ML algorithms to select important features from potential predictors. Features acknowledged by all 3 models were enrolled in logistic regression analysis to develop a carotid plaque prediction model. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation data set, and an external validation data set comprising 32,682 participants from MJ Health Check-up Center. Risk cutoff points for carotid plaque were determined based on the Youden index, predicted probability distribution, and prevalence rate of the internal validation data set to classify participants into high-, intermediate-, and low-risk groups. This risk classification was further validated in the external validation data set. ResultsAmong the participants, 26.23% (1,421,970/5,420,640) were diagnosed with carotid plaque in the development data set, and 21.64% (7074/32,682) were diagnosed in the external validation data set. A total of 6 features, including age, systolic blood pressure, low-density lipoprotein cholesterol (LDL-C), total cholesterol, fasting blood glucose, and hepatic steatosis index (HSI) were collectively selected by all 3 ML models out of 27 predictors. After eliminating the issue of collinearity between features, the logistic regression model established with the 5 independent predictors reached an area under the curve of 0.831 in the internal validation data set and 0.801 in the external validation data set, and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or ML algorithms. Optimal predicted probability cutoff points of 25% and 65% were determined for classifying individuals into low-, intermediate-, and high-risk categories for carotid plaque. ConclusionsThe combination of ML and logistic regression yielded a practical carotid plaque prediction model, and was of great public health implications in the early identification and risk assessment of carotid plaque among individuals with fatty liver

    Prevalence of Liver Steatosis and Fibrosis in the General Population and Various High-Risk Populations: A Nationwide Study With 5.7 Million Adults in China

    No full text
    BACKGROUND & AIMS: This study aimed to estimate the prevalence of liver steatosis and fibrosis in the general population and populations with potential risk factors in China, so as to inform policies for the screening and management of fatty liver disease and liver fibrosis in general and high-risk populations. METHODS: This cross-sectional, population-based, nationwide study was based on the database of the largest health check-up chain in China. Adults from 30 provinces who underwent a check-up between 2017 and 2022 were included. Steatosis and fibrosis were assessed and graded by transient elastography. Overall and stratified prevalence was estimated among the general population and various subpopulations with demographic, cardiovascular, and chronic liver disease risk factors. A mixed effect regression model was used to examine predictors independently associated with steatosis and fibrosis. RESULTS: In 5,757,335 participants, the prevalence of steatosis, severe steatosis, advanced fibrosis, and cirrhosis was 44.39%, 10.57%, 2.85%, and 0.87%, respectively. Participants who were male, with obesity, diabetes, hypertension, dyslipidemia, metabolic syndrome, or elevated alanine aminotransferase or aspartate aminotransferase had a significantly higher prevalence of all grades of steatosis and fibrosis, and those with fatty liver, decreased albumin or platelet count, and hepatitis B virus infection also had a significantly higher prevalence of fibrosis than their healthy counterparts. Most cardiovascular and chronic liver disease risk factors were independent predictors for steatosis and fibrosis, except for dyslipidemia for fibrosis. CONCLUSIONS: A substantial burden of liver steatosis and fibrosis was found in China. Our study provides evidence for shaping future pathways for screening and risk stratification of liver steatosis and fibrosis in the general population. The findings of this study highlight that fatty liver and liver fibrosis should be included in disease management programs as targets for screening and regular monitoring in high-risk populations, especially in those with diabetes

    Association between Dietary Patterns and the Risk of Depressive Symptoms in the Older Adults in Rural China

    No full text
    Geriatric depression, a chronic condition, has become a substantial burden in rural China. This study aimed to assess the association between dietary patterns and the risk of geriatric depression in rural China. Between March 2018 and June 2019, 3304 participants were recruited for this cross-sectional study in rural Tianjin, China. Principal component analysis was used to determine the major dietary patterns. The associations between dietary patterns and the risk of geriatric depression were assessed using a logistic regression model. Four dietary patterns were identified: vegetables-fruit, animal food, processed food, and milk-egg. The study found that vegetable-fruit (Q2 vs. Q1: OR = 0.62, 95% CI: 0.46–0.83; Q3 vs. Q1: OR = 0.54, 95% CI: 0.38–0.75; Q4 vs. Q1: OR = 0.39, 95% CI: 0.26–0.57) and animal food patterns (Q3 vs. Q1: OR = 0.69, 95% CI: 0.50–0.95; Q4 vs. Q1: OR = 0.58, 95% CI: 0.41–0.82) were associated with a decreased risk of depression, and inflammatory dietary pattern (Q2 vs. Q1: OR = 1.71, 95% CI: 1.23–2.38; Q3 vs. Q1: OR = 1.70, 95% CI: 1.22–2.36; Q4 vs. Q1: OR = 1.44, 95% CI: 1.03–2.03) was associated with an increased risk of depression. The present findings reinforce the importance of adopting an adequate diet consisting of vegetables, fruit and animal foods, while limiting the intake of pro-inflammatory foods, to decrease the risk of depression

    The Association between Leukocyte and Its Subtypes and Benign Breast Disease: The TCLSIH Cohort Study

    No full text
    Inflammation plays a crucial role in the formation of benign breast disease. Given the limited study to explore the association between leukocyte as an indicator of immune system and benign breast disease, we used data from a large cross-sectional study to investigate association between leukocyte and its subtypes and benign breast disease among women in the general population. The data were derived from baseline data of the Tianjin chronic low-grade systemic inflammation and health (TCLSIH) cohort study during 2014 and 2016. Breast thickness and nodules status were assessed by using ultrasonography. Leukocyte and its subtype counts were carried out using the automated hematology analyzer. Multiple logistic regression analysis was used to examine the association between leukocyte and its subtypes and prevalence of benign breast disease. In the present study, the prevalence of benign breast disease was 20.9%. After adjustments for potentially confounding factors, the odds ratios (95% confidence interval) for benign breast disease across lymphocyte quintiles were as follows: 1.00 (reference), 0.99 (0.82, 1.2), 0.85 (0.69, 1.04), 0.84 (0.68, 1.02), and 0.75 (0.61, 0.92) (P for trend = 0.002). An inverse association between lymphocyte counts and benign breast disease was found, but leukocyte and other subtypes have nothing to do with benign breast disease. Further prospective studies are needed to determine the findings
    corecore