7 research outputs found

    Data Mining Generating Decision Trees to Alert System Against Death and Losses in Egg Production

    Get PDF
    Climatic changes and high temperatures have been affecting animal production and the well-being of laying birds, with heat stress and high mortality rates, generating economic losses. Legacy databases can contain information to help model thermal comfort at climatic extremes. They can enable decision trees to be created through the use of data mining to prevent mortality and production losses. Thus, the objective of this study is to seek to develop decision trees, for application as an alert system, for the incidence of caloric stress in the production of layers. We used a database of three aviaries located in the city of Bastos-SP, collected in 2013. The data were organized in Excel® spreadsheets, and processed with the Weka® software with the J48 (C4.5) algorithm for mining of the data. The technique allowed the construction of decision trees that in the chosen sheds were classified with respectively 99.73%, 99.61%, and 98.71% of correct answers and with Kappa indexes equal to 0.9958, 0.9907 and 0.9663, which indicate that the three classifiers built are excellent. Thus, the proposed system, with the decision trees built, can serve as a basis for the construction of an alert system to be applied to the three warehouses simultaneously

    Thirty years of artificial intelligence in medicine (AIME) conferences: A review of research themes

    Get PDF
    Over the past 30 years, the international conference on Artificial Intelligence in MEdicine (AIME) has been organized at different venues across Europe every 2 years, establishing a forum for scientific exchange and creating an active research community. The Artificial Intelligence in Medicine journal has published theme issues with extended versions of selected AIME papers since 1998

    Advanced Data Mining in Cardiology

    Get PDF
    Tato práce je zaměřena na využití data miningových metod v lékařství, konkrétně na databázi kardiologických pacientů. Cílem této práce je provést analýzu dat a zaměřit se na hledání neobvyklých závislostí mezi jednotlivými atributy souboru. Součástí práce je přehled dostupných metod, které se využívají v lékařství. Z těchto metod jsou pro další práci vybrány metody rozhodovacích strom, naivního bayesovského klasifikátoru, umělých neuronových sítí a asociačních pravidel. Pro samotné hledání závislostí byly použity metody naivního bayesovského klasifikátoru a asociačních pravidel. Výstupem této práce je komplexní systém pro dobývání znalostí z databází na libovolném datovém souboru. Práce vznikla ve spolupráci s Interní kardiologickou klinikou Fakultní nemocnice Brno Bohunice. Všechny popsané aplikace byly vytvořeny v programovém prostředí Matlab 7.0.1.The aim of this master´s thesis is to analyse and search unusual dependencies in database of patients from Internal Cardiology Clinic Faculty Hospital Brno. The part of the work is theoretical overview of common data mining methods used in medicine, especially decision trees, naive Bayesian classifier, artificial neural networks and association rules. Looking for unusual dependencies between atributes is realized by association rules and naive Bayesian classifier. The output of this work is a complex system for Knowledge discovery in databases process for any data set. This work was realized with collaboration of Internal Cardiology Clinic Faculty Hospital Brno. All programs were made in Matlab 7.0.1.

    Classification Of Clinically Different Subtypes Of Multiple Sclerosis

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2011Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2011Bu çalışma TAU, GFAP, NFL ve MOG proteinlerini ve klinik verileri kullanarak Multipl Skleroz(MS)’un farklı klinik alttiplerinin sınıflandırılmasına odaklanmaktadır. Bu çalışma için kullanılan protein verileri, hastaların Beyin Omurilik Sıvısı (BOS) örneklerinden elde edilmiştir. Farklı sınıflandırma yöntemleri kullanılarak, MS’in farklı klinik alttpleri, protein verileri ve klinik verilere göre sınıflandırılmışlardır. Bu çalışma, Klinik İzole Sendrom(CIS) dan MS e geçişi bu verileri kullanarak tahmin eden ilk çalışmadır. CIS ve Kontrol grubu arasındaki sınıflandırma 87.31%±12.02 (AUC: 0.93±0.09) doğrulukla, MS ve CIS arasındaki sınıflandırma 76.51% ±11.15 (AUC: 0.83 ±0.12) doğrulukla, RRMS ve PPMS arasındaki sınıflandırma 95.77% ±6.63 (AUC: 0.97±0.08) doğrulukla, MS ve Kontrol grubu arasındaki sınıflandırma 92.64% ±7.15 (AUC: 0.97±0.06) doğrulukla, CIS grubundan RRMS grubuna geçiş 86.45% ±12.6 (AUC: 0.89±0.19) doğrulukla tahmin edilmiştir. Bu çalışma, MS’in klinik alttiplerinin tanısı ve prognozunu ve farklı alttipler arası geçişi tahmin etmek için protein ve klinik verileri ve bilgisayar destekli sınıflandırma yöntemlerini kullanan ilk çalışmadır.This study focuses on the classification of different clinical subtypes of MS using TAU,GFAP,NFL and MOG proteins and clinical data. Protein data used in this study are obtained by lumbar puncture. Using different classification methods, different clinical subtypes of multiple sclerosis were classified according to their protein and clinical data patterns. To the best of our knowledge, there are no other studies in the literature that uses these patterns to predict the transition from Clinically Isolated Syndrome (CIS) to Multiple Sclerosis. MS patients, CIS patients, and control group were classified with 71.43%± 10.95 accuracy (AUC: 0.82± 0.12), CIS and control group were classified with accuracy: 87.31%±12.02 (AUC: 0.93±0.09), MS and CIS were clasified with 76.51% ±11.15 (AUC: 0.83 ±0.12) accuracy, RRMS and PPMS were classified with 95.77% ±6.63 accuracy (AUC: 0.97±0.08), MS and control group were classified with 92.64% ±7.15 (AUC: 0.97±0.06) accuracy. Transition from CIS to RRMS was predicted with 86.45% ±12.6 (AUC: 0.89±0.19) accuracy. This is a novel study using computer aided classification methods with protein and clinical data for diagnostic and prognostic purposes in predicting clinical subtypes of MS and predicting transition between subtypes.Yüksek LisansM.Sc

    Novelty, distillation, and federation in machine learning for medical imaging

    Get PDF
    The practical application of deep learning methods in the medical domain has many challenges. Pathologies are diverse and very few examples may be available for rare cases. Where data is collected it may lie in multiple institutions and cannot be pooled for practical and ethical reasons. Deep learning is powerful for image segmentation problems but ultimately its output must be interpretable at the patient level. Although clearly not an exhaustive list, these are the three problems tackled in this thesis. To address the rarity of pathology I investigate novelty detection algorithms to find outliers from normal anatomy. The problem is structured as first finding a low-dimension embedding and then detecting outliers in that embedding space. I evaluate for speed and accuracy several unsupervised embedding and outlier detection methods. Data consist of Magnetic Resonance Imaging (MRI) for interstitial lung disease for which healthy and pathological patches are available; only the healthy patches are used in model training. I then explore the clinical interpretability of a model output. I take related work by the Canon team — a model providing voxel-level detection of acute ischemic stroke signs — and deliver the Alberta Stroke Programme Early CT Score (ASPECTS, a measure of stroke severity). The data are acute head computed tomography volumes of suspected stroke patients. I convert from the voxel level to the brain region level and then to the patient level through a series of rules. Due to the real world clinical complexity of the problem, there are at each level — voxel, region and patient — multiple sources of “truth”; I evaluate my results appropriately against these truths. Finally, federated learning is used to train a model on data that are divided between multiple institutions. I introduce a novel evolution of this algorithm — dubbed “soft federated learning” — that avoids the central coordinating authority, and takes into account domain shift (covariate shift) and dataset size. I first demonstrate the key properties of these two algorithms on a series of MNIST (handwritten digits) toy problems. Then I apply the methods to the BraTS medical dataset, which contains MRI brain glioma scans from multiple institutions, to compare these algorithms in a realistic setting

    Automatic risk evaluation in elderly patients based on Autonomic Nervous System assessment

    Get PDF
    Dysfunction of Autonomic Nervous System (ANS) is a typical feature of chronic heart failure and other cardiovascular disease. As a simple non-invasive technology, heart rate variability (HRV) analysis provides reliable information on autonomic modulation of heart rate. The aim of this thesis was to research and develop automatic methods based on ANS assessment for evaluation of risk in cardiac patients. Several features selection and machine learning algorithms have been combined to achieve the goals. Automatic assessment of disease severity in Congestive Heart Failure (CHF) patients: a completely automatic method, based on long-term HRV was proposed in order to automatically assess the severity of CHF, achieving a sensitivity rate of 93% and a specificity rate of 64% in discriminating severe versus mild patients. Automatic identification of hypertensive patients at high risk of vascular events: a completely automatic system was proposed in order to identify hypertensive patients at higher risk to develop vascular events in the 12 months following the electrocardiographic recordings, achieving a sensitivity rate of 71% and a specificity rate of 86% in identifying high-risk subjects among hypertensive patients. Automatic identification of hypertensive patients with history of fall: it was explored whether an automatic identification of fallers among hypertensive patients based on HRV was feasible. The results obtained in this thesis could have implications both in clinical practice and in clinical research. The system has been designed and developed in order to be clinically feasible. Moreover, since 5-minute ECG recording is inexpensive, easy to assess, and non-invasive, future research will focus on the clinical applicability of the system as a screening tool in non-specialized ambulatories, in order to identify high-risk patients to be shortlisted for more complex investigations

    Imputation through Clustering of Time Series Data: a case study in air pollution

    Get PDF
    Air pollution is a global problem, and air pollution concentration assessment plays an essential role in evaluating the associated risk to human health. Unfortunately, air pollution monitoring stations often have periods of missing data. In this thesis, we investigated missing values problem in air quality data by looking at the hourly pollutant concentration Time Series (TS) of the main four pollutants included in air quality assessment: O3, NO2, PM2.5, and PM10. The research presented in this thesis aims to reduce the uncertainty of the air quality assessment by proposing methods for the imputation of missing values either partially or completely. Our approach uses clustering of stations based on measured pollutants to inform the imputation. We started by testing uni-variate clustering and then developing a multivariate time series (MVTS) clustering method that considers all measured pollutants at a station by aggregating the similarity between those pollutants (through a fused distance) followed by imputation models for the whole TS. We developed various imputation models including ensemble models which aggregate temporal similarity obtained from clustering and spatial similarity obtained by the geographical correlation between stations. Our experimental results show that using MVTS clustering enables imputation of unmeasured pollutants in any station and produced plausible imputed values for all pollutants. Ensemble imputation models (Model 8 and 9) gave the lowest RMSE, the highest (IOA) between imputed and real values, and met the minimum requirement criteria using FAC2 for air quality modelling. The imputation models reproduce high pollution episodes at stations within the clusters where these episodes possibly happened but were not measured, as some of them were captured by the cluster centroids. We also found two important pollutants associated with those episodes: PM2.5 and O3 which may require more measures or should be imputed in different locations for more realistic air quality monitoring
    corecore