53 research outputs found

    Individuals tell a fascinating story: using unsupervised text mining methods to cluster policyholders based on their medical history

    Get PDF
    Background and objective: Classifying people according to their health profile is crucial in order to propose appropriate treatment. However, the medical diagnosis is sometimes not available. This is for example the case in health insurance, making the proposal of custom prevention plans difficult. When this is the case, an unsupervised clustering method is needed. This article aims to compare three different methods by adapting some text mining methods to the field of health insurance. Also, a new clustering stability measure is proposed in order to compare the stability of the tested processes. Methods : Nonnegative Matrix Factorization, the word2vec method, and marginalized Stacked Denoising Autoencoders are used and compared in order to create a high-quality input for a clustering method. A self-organizing map is then used to obtain the final clustering. A real health insurance database is used in order to test the methods. Results: the marginalized Stacked Denoising Autoencoder outperforms the other methods both in stability and result quality with our data. Conclusions: The use of text mining methods offers several possibilities to understand the context of any medical act. On a medical database, the process could reveal unexpected correlation between treatment, and thus, pathology. Moreover, this kind of method could exploit the refund dates contained in the data, but the tested method using temporality, word2vec, still needs to be improved since the results, even if satisfying, are not as better as the one offered by other methods

    Differentiation of Alzheimer's disease dementia, mild cognitive impairment and normal condition using PET-FDG and AV-45 imaging : a machine-learning approach

    Get PDF
    Nous avons utilisĂ© l'imagerie TEP avec les traceurs F18-FDG et AV45 en conjonction avec les mĂ©thodes de classification du domaine du "Machine Learning". Les images ont Ă©tĂ© acquises en mode dynamique, une image toutes les 5 minutes. Les donnĂ©es ont Ă©tĂ© transformĂ©es par Analyse en Composantes Principales et Analyse en Composantes IndĂ©pendantes. Les images proviennent de trois sources diffĂ©rentes: la base de donnĂ©es ADNI (Alzheimer's Disease Neuroimaging Initiative) et deux protocoles rĂ©alisĂ©s au sein du centre TEP de l'hĂŽpital Purpan. Pour Ă©valuer la performance de la classification nous avons eu recours Ă  la mĂ©thode de validation croisĂ©e LOOCV (Leave One Out Cross Validation). Nous donnons une comparaison entre les deux mĂ©thodes de classification les plus utilisĂ©es, SVM (Support Vector Machine) et les rĂ©seaux de neurones artificiels (ANN). La combinaison donnant le meilleur taux de classification semble ĂȘtre SVM et le traceur AV45. Cependant les confusions les plus importantes sont entre les patients MCI et les sujets normaux. Les patients Alzheimer se distinguent relativement mieux puisqu'ils sont retrouvĂ©s souvent Ă  plus de 90%. Nous avons Ă©valuĂ© la gĂ©nĂ©ralisation de telles mĂ©thodes de classification en rĂ©alisant l'apprentissage sur un ensemble de donnĂ©es et la classification sur un autre ensemble. Nous avons pu atteindre une spĂ©cificitĂ© de 100% et une sensibilitĂ© supĂ©rieure Ă  81%. La mĂ©thode SVM semble avoir une meilleure sensibilitĂ© que les rĂ©seaux de neurones. L'intĂ©rĂȘt d'un tel travail est de pouvoir aider Ă  terme au diagnostic de la maladie d'Alzheimer.We used PET imaging with tracers F18-FDG and AV45 in conjunction with the classification methods in the field of "Machine Learning". PET images were acquired in dynamic mode, an image every 5 minutes.The images used come from three different sources: the database ADNI (Alzheimer's Disease Neuro-Imaging Initiative, University of California Los Angeles) and two protocols performed in the PET center of the Purpan Hospital. The classification was applied after processing dynamic images by Principal Component Analysis and Independent Component Analysis. The data were separated into training set and test set. To evaluate the performance of the classification we used the method of cross-validation LOOCV (Leave One Out Cross Validation). We give a comparison between the two most widely used classification methods, SVM (Support Vector Machine) and artificial neural networks (ANN) for both tracers. The combination giving the best classification rate seems to be SVM and AV45 tracer. However the most important confusion is found between MCI patients and normal subjects. Alzheimer's patients differ somewhat better since they are often found in more than 90%. We evaluated the generalization of our methods by making learning from set of data and classification on another set . We reached the specifity score of 100% and sensitivity score of more than 81%. SVM method showed a bettrer sensitivity than Artificial Neural Network method. The value of such work is to help the clinicians in diagnosing Alzheimer's disease
    • 

    corecore