4 research outputs found

    Global and Local Two-Sample Tests via Regression

    Full text link
    Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature, there have been recent methodological developments such as classification accuracy tests. The goal of this work is to present a regression approach to comparing multivariate distributions of complex data. Depending on the chosen regression model, our framework can efficiently handle different types of variables and various structures in the data, with competitive power under many practical scenarios. Whereas previous work has been largely limited to global tests which conceal much of the local information, our approach naturally leads to a local two-sample testing framework in which we identify local differences between multivariate distributions with statistical confidence. We demonstrate the efficacy of our approach both theoretically and empirically, under some well-known parametric and nonparametric regression methods. Our proposed methods are applied to simulated data as well as a challenging astronomy data set to assess their practical usefulness

    Exploring the application and challenges of fNIRS technology in early detection of Parkinson’s disease

    Get PDF
    BackgroundParkinson’s disease (PD) is a prevalent neurodegenerative disorder that significantly benefits from early diagnosis for effective disease management and intervention. Despite advancements in medical technology, there remains a critical gap in the early and non-invasive detection of PD. Current diagnostic methods are often invasive, expensive, or late in identifying the disease, leading to missed opportunities for early intervention.ObjectiveThe goal of this study is to explore the efficiency and accuracy of combining fNIRS technology with machine learning algorithms in diagnosing early-stage PD patients and to evaluate the feasibility of this approach in clinical practice.MethodsUsing an ETG-4000 type near-infrared brain function imaging instrument, data was collected from 120 PD patients and 60 healthy controls. This cross-sectional study employed a multi-channel mode to monitor cerebral blood oxygen changes. The collected data were processed using a general linear model and β values were extracted. Subsequently, four types of machine learning models were developed for analysis: Support vector machine (SVM), K-nearest neighbors (K-NN), random forest (RF), and logistic regression (LR). Additionally, SHapley Additive exPlanations (SHAP) technology was applied to enhance model interpretability.ResultsThe SVM model demonstrated higher accuracy in differentiating between PD patients and control group (accuracy of 85%, f1 score of 0.85, and an area under the ROC curve of 0.95). SHAP analysis identified the four most contributory channels (CH) as CH01, CH04, CH05, and CH08.ConclusionThe model based on the SVM algorithm exhibited good diagnostic performance in the early detection of PD patients. Future early diagnosis of PD should focus on the Frontopolar Cortex (FPC) region

    Statistical independence for the evaluation of classifier-based diagnosis

    Get PDF
    open3Machine learning techniques are increasingly adopted in computer-aided diagnosis. Evaluation methods for classification results that are based on the study of one or more metrics can be unable to distinguish between cases in which the classifier is discriminating the classes from cases in which it is not. In the binary setting, such circumstances can be encountered when data are unbalanced with respect to the diagnostic groups. Having more healthy controls than pathological subjects, datasets meant for diagnosis frequently show a certain degree of unbalancedness. In this work, we propose to recast the evaluation of classification results as a test of statistical independence between the predicted and the actual diagnostic groups. We address the problem within the Bayesian hypothesis testing framework. Different from the standard metrics, the proposed method is able to handle unbalanced data and takes into account the size of the available data. We show experimental evidence of the efficacy of the approach both on simulated data and on real data about the diagnosis of the Attention Deficit Hyperactivity Disorder (ADHD).Olivetti, Emanuele; Greiner, Susanne; Avesani, PaoloOlivetti, Emanuele; Greiner, Susanne; Avesani, Paol

    Statistical independence for the evaluation of classifier-based diagnosis

    Get PDF
    open3Machine learning techniques are increasingly adopted in computer-aided diagnosis. Evaluation methods for classification results that are based on the study of one or more metrics can be unable to distinguish between cases in which the classifier is discriminating the classes from cases in which it is not. In the binary setting, such circumstances can be encountered when data are unbalanced with respect to the diagnostic groups. Having more healthy controls than pathological subjects, datasets meant for diagnosis frequently show a certain degree of unbalancedness. In this work, we propose to recast the evaluation of classification results as a test of statistical independence between the predicted and the actual diagnostic groups. We address the problem within the Bayesian hypothesis testing framework. Different from the standard metrics, the proposed method is able to handle unbalanced data and takes into account the size of the available data. We show experimental evidence of the efficacy of the approach both on simulated data and on real data about the diagnosis of the Attention Deficit Hyperactivity Disorder (ADHD).Olivetti, Emanuele; Greiner, Susanne; Avesani, PaoloOlivetti, Emanuele; Greiner, Susanne; Avesani, Paol
    corecore