12 research outputs found

    Minimax Classifier with Box Constraint on the Priors

    Get PDF
    Learning a classifier in safety-critical applications like medicine raises several issues. Firstly, the class proportions, also called priors, are in general imbalanced or uncertain. Sometimes, experts are able to provide some bounds on the priors and taking into account this knowledge can improve the predictions. Secondly, it is also necessary to consider any arbitrary loss function given by experts to evaluate the classification decision. Finally, the dataset may contain both categorical and numeric features. In this paper, we propose a box-constrained minimax classifier which addresses all the mentioned issues. To deal with both categorical and numeric features, many works have shown that discretizing the numeric attributes can lead to interesting results. Here, we thus consider that numeric features are discretized. In order to address the class proportions issues, we compute the priors which maximize the empirical Bayes risk over a box-constrained probabilistic simplex. This constraint is defined as the intersection between the simplex and a box constraint provided by experts, which aims at bounding independently each class proportions. Our approach allows to find a compromise between the empirical Bayes classifier and the standard minimax classifier, which may appear too pessimistic. The standard minimax classifier, which has not been studied yet when considerring discrete features, is still accessible by our approach. When considering only discrete features, we show that, for any arbitrary loss function, the empirical Bayes risk, considered as a function of the priors, is a concave non-differentiable multivariate piecewise affine function. To compute the box-constrained least favorable priors, we derive a projected subgradient algorithm. The convergence of our algorithm is established. The performance of our algorithm is illustrated with experiments on the Framingham study database to predict the risk of Coronary Heart Disease (CHD)

    Box-constrained optimization for minimax supervised learning***

    Get PDF
    In this paper, we present the optimization procedure for computing the discrete boxconstrained minimax classifier introduced in [1, 2]. Our approach processes discrete or beforehand discretized features. A box-constrained region defines some bounds for each class proportion independently. The box-constrained minimax classifier is obtained from the computation of the least favorable prior which maximizes the minimum empirical risk of error over the box-constrained region. After studying the discrete empirical Bayes risk over the probabilistic simplex, we consider a projected subgradient algorithm which computes the prior maximizing this concave multivariate piecewise affine function over a polyhedral domain. The convergence of our algorithm is established

    Clustering with feature selection using alternating minimization. Application to computational biology

    Get PDF
    This paper deals with unsupervised clustering with feature selection. The problem is to estimate both labels and a sparse projection matrix of weights. To address this combina-torial non-convex problem maintaining a strict control on the sparsity of the matrix of weights, we propose an alternating minimization of the Frobenius norm criterion. We provide a new efficient algorithm named K-sparse which alternates k-means with projection-gradient minimization. The projection-gradient step is a method of splitting type, with exact projection on the ℓ 1 ball to promote sparsity. The convergence of the gradient-projection step is addressed, and a preliminary analysis of the alternating minimization is made. The Frobenius norm criterion converges as the number of iterates in Algorithm K-sparse goes to infinity. Experiments on Single Cell RNA sequencing datasets show that our method significantly improves the results of PCA k-means, spectral clustering, SIMLR, and Sparcl methods. The complexity of K-sparse is linear in the number of samples (cells), so that the method scales up to large datasets. Finally, we extend K-sparse to supervised classification

    Discrete minimax classifier for personalized diagnosis in medicine

    No full text
    L’apprentissage statistique supervisé devient de plus en plus prometteur pour l'aide au diagnostic dans le domaine de la médecine de précision. À partir du profil clinique ou biologique de chaque patient, ces méthodes peuvent par exemple aider les experts du domaine d'application à diagnostiquer le développement d'une maladie ou à prédire la réponse d'un traitement spécifique. De nos jours, les méthodes de classification supervisées deviennent de plus en plus performantes. Cependant, la plupart de ces méthodes souffrent lorsque les proportions par classes sont déséquilibrées et qu'elles évoluent au court du temps, ce qui se produit très souvent dans le domaine médical. Par exemple, pour de nombreuses maladies, les classes d'intérêts qui correspondent en général aux patients qui développent une maladie sont rares et donc très difficiles à diagnostiquer. De plus, la plupart de ces méthodes sont construites sur l'hypothèse que les futurs patients suivront la même distribution que celles observées dans la base d'apprentissage, ce qui n'est en général pas le cas dans le domaine médical. En effet, la proportion de patients développant une maladie peut évoluer au cours du temps, sans que l'on sache quand, ni comment, ni pourquoi. Et ceci peut ainsi augmenter le risque d'erreur de diagnostics pour de futurs patients. Ces difficultés de proportions par classe déséquilibrées et incertaines sont de plus en plus mises en avant dans le domaine de l'apprentissage statistique pour la santé, et le fait de considérer des classifieurs robustes face à ces difficultés devient nécessaire.Durant ma thèse, nous avons développé une nouvelle méthode de classification supervisée visant à adresser ces difficultés : un classifieur minimax discret. Notre méthode cherche également à faire face à d'autres difficultés qui apparaissent souvent dans le domaine médical : la présence de liens entre les variables descriptives et la présence de variables discrètes et continues. Dans le but de travailler plus facilement avec ce type de variables, nous choisissons de les discrétiser, ce qui nous permet de modéliser analytiquement le risque d'erreur de Bayes empirique sur le simplexe. Notre classifieur minimax est ensuite calibré à partir d'un algorithme de sous-gradient projeté cherchant les probabilités à priori qui maximisent ce risque de Bayes. Par construction, notre classifieur minimax tend à égaliser les risques d'erreurs par classe et devient donc robuste face aux évolutions de proportions au cours du temps. Nous avons ensuite amélioré notre algorithme dans le but de prendre en compte des contraintes indépendantes sur chaque proportion lorsque les experts du domaine d'application sont capables d'estimer des bornes indépendantes sur l'incertitude de chaque proportion par classe. Enfin, nous montrons que notre approche peut facilement être couplée à des arbres de décision ou à des réseaux de neurones convolutionnels pré-entrainés de sorte à ajuster ces classifieurs face aux problèmes de proportions par classe déséquilibrées et incertaines.Nos recherches se sont faites en collaboration proche avec l'Institut de Pharmacologie Moléculaire et Cellulaire (IPMC) et avec la société HalioDX. L'objectif de notre collaboration avec l'IPMC a été de développer un nouveau modèle afin de diagnostiquer la réponse au traitement pour certaines maladies psychiatriques telles que la schizophrénie. La société HalioDX est quant à elle spécialisée dans la recherche sur le cancer du côlon. Enfin, nous avons également commencé une nouvelle collaboration avec des docteurs du CHU de Nice dans la recherche concernant l'amélioration des diagnostics de la fibrose du foie. Toutes ces collaborations nous ont permis de construire notre algorithme en essayant de prendre en compte au mieux les difficultés qui apparaissent souvent dans le domaine médical.Machine Learning algorithms become more and more promising in precision medicine. From the clinical or biological profile of each patient, these methods can for example help the experts in the application field to diagnose a disease or to predict a response to a specific treatment. To this aim, supervised learning classifiers are fitted from a set of labeled learning samples by generally minimizing the global risk of classification errors on this training set. Intuitively, these methods map the feature space so that the founded regions in this space are assigned to a specific class, and such that the global risk of error is minimized. Then, the resulted decision rule, also called classifier, maps the new patients into this feature space and assigns them the class associated with the region to which they belong.Nowadays, supervised learning algorithms become more and more efficient for supervised classification tasks. However, most supervised classifiers do not deal well with imbalanced and uncertain class proportions which generally occur in medical datasets. For example, the number of individuals who have a disease is generally lower that the number of healthy individuals. Moreover, most methods assume that new individuals to be included in the dataset will follow the same distribution as the labeled training samples which may be not the case. Indeed, the proportion of individuals who have a disease may increase or decrease in time due to sampling issues resulting in the fitted classifier to be biased, therefore increasing the classification errors for the new individuals. Imbalanced datasets and uncertain class proportions are two major issues when applying machine learning methods in medicine. Designing robust classifiers that could deal with these two issues is highly warranted.During my Thesis, we developed a new mathematical method, a Discrete Minimax Classifier, for dealing with these class proportions issues. Our approach can take into account the knowledge or the interest of the experts of the application field, and is moreover suitable to other difficulties that usually appear in medicine field such that the presence of mixed attributes and the presence of dependencies between some features. In order to process mixed attributes, the numeric features are discretized which allows us to analytically calculate the empirical Bayes risk of errors over the simplex. Our Discrete Minimax Classifier is computed using a projected-subgradient-based algorithm which searches for the priors that maximize this empirical Bayes risk over the simplex. By construction, our resulted classifier minimizes the maximum of the risks per class and becomes robust face to prior probability shifts. We moreover tune our approach to compute au Gamma-minimax classifier that takes into account independent bounds on each class proportion in the case where the experts in the application field are able to estimate the uncertainty in each class proportion independently. Finally, we show that we can easily couple our approach with pre-trained decision trees or pre-trained convolutional neural networks in order to adjust these classifiers face to imbalanced datasets and prior probability shifts.Our research was performed in tight collaboration with the Institute of Molecular and Cellular Pharmacology (IPMC) in Valbonne, the Nice University Hospital in Nice (CHU) and the HalioDX private company in Marseille. The goal of our project with the IPMC was to improve diagnosis and treatments of psychiatric diseases using blood biomarkers. The goal of our project with the CHU was to conceive a machine learning classifier for diagnosing the presence of a non-alcoholic fatty liver disease from the clinical profile of patients. As for HalioDX, we did not actually work with the datasets from this company but the methods that we have developed will be used by HalioDX to identify cancer patients at risk of not responding to immune checkpoint therapy

    Classifieur minimax discret pour l’aide au diagnostic médical dans la médecine personnalisée

    No full text
    Machine Learning algorithms become more and more promising in precision medicine. From the clinical or biological profile of each patient, these methods can for example help the experts in the application field to diagnose a disease or to predict a response to a specific treatment. To this aim, supervised learning classifiers are fitted from a set of labeled learning samples by generally minimizing the global risk of classification errors on this training set. Intuitively, these methods map the feature space so that the founded regions in this space are assigned to a specific class, and such that the global risk of error is minimized. Then, the resulted decision rule, also called classifier, maps the new patients into this feature space and assigns them the class associated with the region to which they belong.Nowadays, supervised learning algorithms become more and more efficient for supervised classification tasks. However, most supervised classifiers do not deal well with imbalanced and uncertain class proportions which generally occur in medical datasets. For example, the number of individuals who have a disease is generally lower that the number of healthy individuals. Moreover, most methods assume that new individuals to be included in the dataset will follow the same distribution as the labeled training samples which may be not the case. Indeed, the proportion of individuals who have a disease may increase or decrease in time due to sampling issues resulting in the fitted classifier to be biased, therefore increasing the classification errors for the new individuals. Imbalanced datasets and uncertain class proportions are two major issues when applying machine learning methods in medicine. Designing robust classifiers that could deal with these two issues is highly warranted.During my Thesis, we developed a new mathematical method, a Discrete Minimax Classifier, for dealing with these class proportions issues. Our approach can take into account the knowledge or the interest of the experts of the application field, and is moreover suitable to other difficulties that usually appear in medicine field such that the presence of mixed attributes and the presence of dependencies between some features. In order to process mixed attributes, the numeric features are discretized which allows us to analytically calculate the empirical Bayes risk of errors over the simplex. Our Discrete Minimax Classifier is computed using a projected-subgradient-based algorithm which searches for the priors that maximize this empirical Bayes risk over the simplex. By construction, our resulted classifier minimizes the maximum of the risks per class and becomes robust face to prior probability shifts. We moreover tune our approach to compute au Gamma-minimax classifier that takes into account independent bounds on each class proportion in the case where the experts in the application field are able to estimate the uncertainty in each class proportion independently. Finally, we show that we can easily couple our approach with pre-trained decision trees or pre-trained convolutional neural networks in order to adjust these classifiers face to imbalanced datasets and prior probability shifts.Our research was performed in tight collaboration with the Institute of Molecular and Cellular Pharmacology (IPMC) in Valbonne, the Nice University Hospital in Nice (CHU) and the HalioDX private company in Marseille. The goal of our project with the IPMC was to improve diagnosis and treatments of psychiatric diseases using blood biomarkers. The goal of our project with the CHU was to conceive a machine learning classifier for diagnosing the presence of a non-alcoholic fatty liver disease from the clinical profile of patients. As for HalioDX, we did not actually work with the datasets from this company but the methods that we have developed will be used by HalioDX to identify cancer patients at risk of not responding to immune checkpoint therapy.L’apprentissage statistique supervisé devient de plus en plus prometteur pour l'aide au diagnostic dans le domaine de la médecine de précision. À partir du profil clinique ou biologique de chaque patient, ces méthodes peuvent par exemple aider les experts du domaine d'application à diagnostiquer le développement d'une maladie ou à prédire la réponse d'un traitement spécifique. De nos jours, les méthodes de classification supervisées deviennent de plus en plus performantes. Cependant, la plupart de ces méthodes souffrent lorsque les proportions par classes sont déséquilibrées et qu'elles évoluent au court du temps, ce qui se produit très souvent dans le domaine médical. Par exemple, pour de nombreuses maladies, les classes d'intérêts qui correspondent en général aux patients qui développent une maladie sont rares et donc très difficiles à diagnostiquer. De plus, la plupart de ces méthodes sont construites sur l'hypothèse que les futurs patients suivront la même distribution que celles observées dans la base d'apprentissage, ce qui n'est en général pas le cas dans le domaine médical. En effet, la proportion de patients développant une maladie peut évoluer au cours du temps, sans que l'on sache quand, ni comment, ni pourquoi. Et ceci peut ainsi augmenter le risque d'erreur de diagnostics pour de futurs patients. Ces difficultés de proportions par classe déséquilibrées et incertaines sont de plus en plus mises en avant dans le domaine de l'apprentissage statistique pour la santé, et le fait de considérer des classifieurs robustes face à ces difficultés devient nécessaire.Durant ma thèse, nous avons développé une nouvelle méthode de classification supervisée visant à adresser ces difficultés : un classifieur minimax discret. Notre méthode cherche également à faire face à d'autres difficultés qui apparaissent souvent dans le domaine médical : la présence de liens entre les variables descriptives et la présence de variables discrètes et continues. Dans le but de travailler plus facilement avec ce type de variables, nous choisissons de les discrétiser, ce qui nous permet de modéliser analytiquement le risque d'erreur de Bayes empirique sur le simplexe. Notre classifieur minimax est ensuite calibré à partir d'un algorithme de sous-gradient projeté cherchant les probabilités à priori qui maximisent ce risque de Bayes. Par construction, notre classifieur minimax tend à égaliser les risques d'erreurs par classe et devient donc robuste face aux évolutions de proportions au cours du temps. Nous avons ensuite amélioré notre algorithme dans le but de prendre en compte des contraintes indépendantes sur chaque proportion lorsque les experts du domaine d'application sont capables d'estimer des bornes indépendantes sur l'incertitude de chaque proportion par classe. Enfin, nous montrons que notre approche peut facilement être couplée à des arbres de décision ou à des réseaux de neurones convolutionnels pré-entrainés de sorte à ajuster ces classifieurs face aux problèmes de proportions par classe déséquilibrées et incertaines.Nos recherches se sont faites en collaboration proche avec l'Institut de Pharmacologie Moléculaire et Cellulaire (IPMC) et avec la société HalioDX. L'objectif de notre collaboration avec l'IPMC a été de développer un nouveau modèle afin de diagnostiquer la réponse au traitement pour certaines maladies psychiatriques telles que la schizophrénie. La société HalioDX est quant à elle spécialisée dans la recherche sur le cancer du côlon. Enfin, nous avons également commencé une nouvelle collaboration avec des docteurs du CHU de Nice dans la recherche concernant l'amélioration des diagnostics de la fibrose du foie. Toutes ces collaborations nous ont permis de construire notre algorithme en essayant de prendre en compte au mieux les difficultés qui apparaissent souvent dans le domaine médical

    Box-constrained optimization for minimax supervised learning

    No full text
    In this paper, we present the optimization procedure for computing the discrete boxconstrained minimax classifier introduced in [1, 2]. Our approach processes discrete or beforehand discretized features. A box-constrained region defines some bounds for each class proportion independently. The box-constrained minimax classifier is obtained from the computation of the least favorable prior which maximizes the minimum empirical risk of error over the box-constrained region. After studying the discrete empirical Bayes risk over the probabilistic simplex, we consider a projected subgradient algorithm which computes the prior maximizing this concave multivariate piecewise affine function over a polyhedral domain. The convergence of our algorithm is established

    Baseline levels of C-reactive protein and proinflammatory cytokines are not associated with early response to amisulpride in patients with First Episode Psychosis: the OPTiMiSE cohort study

    Get PDF
    International audienceAbstract Background Patients with a First-Episode of Psychosis (FEP) exhibit low-grade inflammation as demonstrated by elevated levels of C reactive protein (CRP) and pro-inflammatory cytokines. Aims The primary goal of this study was to investigate the association between pro-inflammatory biomarkers and clinical outcomes in unmedicated FEP patients. Method We used clinical data and biological samples from 289 FEP patients participating to the Optimization of Treatment and Management of Schizophrenia in Europe (OPTIMISE) clinical trial. Patients were assessed at baseline and 4-5 weeks after treatment with amisulpride. Baseline serum levels of interleukin (IL)-6, IL-8, Tumor Necrosis Factor (TNF)-α and CRP were measured. We first used multivariable regression to investigate the association between each of the four tested biomarkers and the following clinical outcomes: Positive And Negative Syndrome Scale (PANSS), Calgary Depression Score for Schizophrenia (CDSS), remission according to Andreasen’s criteria and Serious Adverse Events (SAEs). As a complementary approach, we used an unsupervised clustering method to stratify patients into an “inflamed” or a “non-inflamed” biotype based on baseline levels of IL-6, IL-8 and TNF-α. We then used linear and logistic regressions to investigate the association between the patient biotype and clinical outcomes. Results After adjusting for covariates and confounders, we did not find any association between IL-6, IL-8, TNF-α, CRP or the patient biotype and clinical outcomes. Implications Our results do not support the existence of an association between baseline levels of CRP and proinflammatory cytokines and early response to amisulpride in unmedicated FEP patients
    corecore