42 research outputs found

    Asymptotic Characterisation of Robust Empirical Risk Minimisation Performance in the Presence of Outliers

    Full text link
    We study robust linear regression in high-dimension, when both the dimension dd and the number of data points nn diverge with a fixed ratio α=n/d\alpha=n/d, and study a data model that includes outliers. We provide exact asymptotics for the performances of the empirical risk minimisation (ERM) using ℓ2\ell_2-regularised ℓ2\ell_2, ℓ1\ell_1, and Huber loss, which are the standard approach to such problems. We focus on two metrics for the performance: the generalisation error to similar datasets with outliers, and the estimation error of the original, unpolluted function. Our results are compared with the information theoretic Bayes-optimal estimation bound. For the generalization error, we find that optimally-regularised ERM is asymptotically consistent in the large sample complexity limit if one perform a simple calibration, and compute the rates of convergence. For the estimation error however, we show that due to a norm calibration mismatch, the consistency of the estimator requires an oracle estimate of the optimal norm, or the presence of a cross-validation set not corrupted by the outliers. We examine in detail how performance depends on the loss function and on the degree of outlier corruption in the training set and identify a region of parameters where the optimal performance of the Huber loss is identical to that of the ℓ2\ell_2 loss, offering insights into the use cases of different loss functions

    Imaging-based representation and stratification of intra-tumor heterogeneity via tree-edit distance

    Get PDF
    Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the promise to surrogate histopathological assessment of tumor lesions. In this work, we propose a tree-based representation strategy for describing intra-tumor heterogeneity of patients affected by metastatic cancer. We leverage radiomics information extracted from PET/CT imaging and we provide an exhaustive and easily readable summary of the disease spreading. We exploit this novel patient representation to perform cancer subtyping according to hierarchical clustering technique. To this purpose, a new heterogeneity-based distance between trees is defined and applied to a case study of prostate cancer. Clusters interpretation is explored in terms of concordance with severity status, tumor burden and biological characteristics. Results are promising, as the proposed method outperforms current literature approaches. Ultimately, the proposed method draws a general analysis framework that would allow to extract knowledge from daily acquired imaging data of patients and provide insights for effective treatment planning

    Impact of COVID-19 on cardiovascular testing in the United States versus the rest of the world

    Get PDF
    Objectives: This study sought to quantify and compare the decline in volumes of cardiovascular procedures between the United States and non-US institutions during the early phase of the coronavirus disease-2019 (COVID-19) pandemic. Background: The COVID-19 pandemic has disrupted the care of many non-COVID-19 illnesses. Reductions in diagnostic cardiovascular testing around the world have led to concerns over the implications of reduced testing for cardiovascular disease (CVD) morbidity and mortality. Methods: Data were submitted to the INCAPS-COVID (International Atomic Energy Agency Non-Invasive Cardiology Protocols Study of COVID-19), a multinational registry comprising 909 institutions in 108 countries (including 155 facilities in 40 U.S. states), assessing the impact of the COVID-19 pandemic on volumes of diagnostic cardiovascular procedures. Data were obtained for April 2020 and compared with volumes of baseline procedures from March 2019. We compared laboratory characteristics, practices, and procedure volumes between U.S. and non-U.S. facilities and between U.S. geographic regions and identified factors associated with volume reduction in the United States. Results: Reductions in the volumes of procedures in the United States were similar to those in non-U.S. facilities (68% vs. 63%, respectively; p = 0.237), although U.S. facilities reported greater reductions in invasive coronary angiography (69% vs. 53%, respectively; p < 0.001). Significantly more U.S. facilities reported increased use of telehealth and patient screening measures than non-U.S. facilities, such as temperature checks, symptom screenings, and COVID-19 testing. Reductions in volumes of procedures differed between U.S. regions, with larger declines observed in the Northeast (76%) and Midwest (74%) than in the South (62%) and West (44%). Prevalence of COVID-19, staff redeployments, outpatient centers, and urban centers were associated with greater reductions in volume in U.S. facilities in a multivariable analysis. Conclusions: We observed marked reductions in U.S. cardiovascular testing in the early phase of the pandemic and significant variability between U.S. regions. The association between reductions of volumes and COVID-19 prevalence in the United States highlighted the need for proactive efforts to maintain access to cardiovascular testing in areas most affected by outbreaks of COVID-19 infection

    The Dyck bound in the concave 1-dimensional random assignment model

    Get PDF
    International audienceWe consider models of assignment for random N blue points and N red points on an interval of length 2N , in which the cost for connecting a blue point in x to a red point in y is the concave function |x − y| p , for 0 1, where the optimal matching is trivially determined, here the optimization is non-trivial. The purpose of this paper is to introduce a special configuration, that we call the Dyck matching, and to study its statistical properties. We compute exactly the average cost, in the asymptotic limit of large N , together with the first subleading correction. The scaling is remarkable: it is of order N for p 1 2 , and it is universal for a wide class of models. We conjecture that the average cost of the Dyck matching has the same scaling in N as the cost of the optimal matching, and we produce numerical data in support of this conjecture. We hope to produce a proof of this claim in future work
    corecore