22 research outputs found

    A lung cancer detection approach based on shape index and curvedness superpixel candidate selection

    Get PDF
    Orientador : Lucas Ferrari de OliveiraDissertação (mestrado) - Universidade Federal do Paraná, Setor de Tecnologia, Programa de Pós-Graduação em Engenharia Elétrica. Defesa: Curitiba, 29/08/2016Inclui referências : f. 72-76Área de concentração: Sistemas eletrônicosResumo: Câncer é uma das causas com mais mortalidade mundialmente. Câncer de pulmão é o tipo de câncer mais comum (excluíndo câncer de pele não-melanoma). Seus sintomas aparecem em estágios mais avançados, o que dificulta o seu tratamento. Para diagnosticar o paciente, a tomografia computadorizada é utilizada. Ela é composta de diversos cortes, que mapeiam uma região 3D de interesse. Apesar de fornecer muitos detalhes, por serem gerados vários cortes, a análise de exames de tomografia computadorizada se torna exaustiva, o que pode influenciar negativamente no diagnóstico feito pelo especialista. O objetivo deste trabalho é o desenvolvimento de métodos para a segmentação do pulmão e a detecção de nódulos em imagens de tomografia computadorizada do tórax. As imagens são segmentadas para separar o pulmão das outras estruturas e após, detecção de nódulos utilizando a técnicas de superpixeis são aplicadas. A técnica de Rótulamento dos Eixos teve uma média de preservação de nódulos de 93,53% e a técnica Monotone Chain Convex Hull apresentou melhores resultados com uma taxa de 97,78%. Para a detecção dos nódulos, as técnicas Felzenszwalb e SLIC são empregadas para o agrupamento de regiões de nódulos em superpixeis. Uma seleção de candidatos à nódulos baseada em shape index e curvedness é aplicada para redução do número de superpixeis. Para a classificação desses candidatos, foi utilizada a técnica de Florestas Aleatórias. A base de imagens utilizada foi a LIDC, que foi dividida em duas sub-bases: uma de desenvolvimento, composta pelos pacientes 0001 a 0600, e uma de validação, composta pelos pacientes 0601 a 1012. Na base de validação, a técnica Felzenszwalb obteve uma sensibilidade de 60,61% e 7,2 FP/exame. Palavras-chaves: Câncer de pulmão. Detecção de nódulos. Superpixel. Shape index.Abstract: Cancer is one of the causes with more mortality worldwide. Lung cancer is the most common type (excluding non-melanoma skin cancer). Its symptoms appear mostly in advanced stages, which difficult its treatment. For patient diagnostic, computer tomography (CT) is used. CT is composed of many slices, which maps a 3D region of interest. Although it provides many details, its analysis is very exhaustive, which may has negatively influence in the specialist's diagnostic. The objective of this work is the development of lung segmentation and nodule detection methods in chest CT images. These images are segmented to separate the lung region from other parts and, after that, nodule detection using superpixel methods is applied. The Axes' Labeling had a mean of nodule preservation of 93.53% and the Monotone Chain Convex Hull method presented better results, with a mean of 97.78%. For nodule detection, the Felzenszwalb and SLIC methods are employed to group nodule regions. A nodule candidate selection based on shape index and curvedness is applied for superpixel reduction. Then, classification of these candidates is realized by the Random Forest. The LIDC database was divided into two data sets: a development data set composed of the CT scans of patients 0001 to 0600, and a untouched, validation data set, composed of patients 0601 to 1012. For the validation data set, the Felzenszwalb method had a sensitivity of 60.61% and 7.2 FP/scan. Key-words: Lung cancer. Nodule detection. Superpixel. Shape index

    Improved imbalanced classification through convex space learning

    Get PDF
    Imbalanced datasets for classification problems, characterised by unequal distribution of samples, are abundant in practical scenarios. Oversampling algorithms generate synthetic data to enrich classification performance for such datasets. In this thesis, I discuss two algorithms LoRAS & ProWRAS, improving on the state-of-the-art as shown through rigorous benchmarking on publicly available datasets. A biological application for detection of rare cell-types from single-cell transcriptomics data is also discussed. The thesis also provides a better theoretical understanding behind oversampling

    A novel two-stage heart arrhythmia ensemble classifier

    Get PDF
    Atrial fibrillation (AF) and ventricular arrhythmia (Arr) are among the most common and fatal cardiac arrhythmias in the world. Electrocardiogram (ECG) data, collected as part of the UK Biobank, represents an opportunity for analysis and classification of these two diseases in the UK. The main objective of our study is to investigate a two-stage model for the classification of individuals with AF and Arr in the UK Biobank dataset. The current literature addresses heart arrhythmia classification very extensively. However, the data used by most researchers lack enough instances of these common diseases. Moreover, by proposing the two-stage model and separation of normal and abnormal cases, we have improved the performance of the classifiers in detection of each specific disease. Our approach consists of two stages of classification. In the first stage, features of the ECG input are classified into two main classes: normal and abnormal. At the second stage, the features of the ECG are further categorised as abnormal and further classified into two diseases of AF and Arr. A diverse set of ECG features such as the QRS duration, PR interval and RR interval, as well as covariates such as sex, BMI, age and other factors, are used in the modelling process. For both stages, we use the XGBoost Classifier algorithm. The healthy population present in the data, has been undersampled to tackle the class imbalance present in the data. This technique has been applied and evaluated using an ECG dataset from the UKBioBank ECG taken at rest repository. The main results of our paper are as follows: The classification performance for the proposed approach has been measured using F1 score, Sensitivity (Recall) and Specificity (Precision). The results of the proposed system are 87.22%, 88.55% and 85.95%, for average F1 Score, average sensitivity and average specificity, respectively. Contribution and significance: The performance level indicates that automatic detection of AF and Arr in participants present in the UK Biobank is more precise and efficient if done in a two-stage manner. Automatic detection and classification of AF and Arr individuals this way would mean early diagnosis and prevention of more serious consequences later in their lives

    Review of feature selection techniques in Parkinson's disease using OCT-imaging data

    Get PDF
    Several spectral-domain optical coherence tomography studies (OCT) reported a decrease on the macular region of the retina in Parkinson’s disease. Yet, the implication of retinal thinning with visual disability is still unclear. Macular scans acquired from patients with Parkinson’s disease (n = 100) and a control group (n = 248) were used to train several supervised classification models. The goal was to determine the most relevant retinal layers and regions for diagnosis, for which univari- ate and multivariate filter and wrapper feature selection methods were used. In addition, we evaluated the classification ability of the patient group to assess the applicability of OCT measurements as a biomarker of the disease

    Review of feature selection techniques in Parkinson's disease using OCT-imaging data

    Get PDF
    Several spectral-domain optical coherence tomography studies (OCT) reported a decrease on the macular region of the retina in Parkinson’s disease. Yet, the implication of retinal thinning with visual disability is still unclear. Macular scans acquired from patients with Parkinson’s disease (n = 100) and a control group (n = 248) were used to train several supervised classification models. The goal was to determine the most relevant retinal layers and regions for diagnosis, for which univari- ate and multivariate filter and wrapper feature selection methods were used. In addition, we evaluated the classification ability of the patient group to assess the applicability of OCT measurements as a biomarker of the disease

    Can machine learning methods contribute as a decision support system in sequential oligometastatic radioablation therapy?

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsCancer treatment is among the major medical challenges of this century. Sequential oligometastatic radio-ablation (SOMA) is a novel treatment method that aims at ablating reoccurring metastasis in a single session with a targeted high dose of radiation. To know if SOMA is the best possible treatment method for a patient, the benefits of each available therapy need to be understood and evaluated. The ability to model complex systems, such as cancer treatment, is the strength of machine learning techniques. These techniques have improved the understanding of numerous medical therapies already. In some cases, they can serve as medical support systems if they deliver reliable results that doctors can trust and understand. The results obtained from applying numerous machine learning techniques to the data of SOMA-treated patients show that there are favorable techniques in some cases. It was observed that the Random Forest algorithm proved superior at different classification tasks. Additionally, regression problems opposed a great challenge, as the amount of data is very limited. Finally, SHAP values - a novel machine learning interpretation technique – provided valuable insights into understanding the rationale of each algorithm. They proved that the machine learning algorithms could learn patterns aligned with the human intuition in the problems presented. SHAP values show great potential in bridging the gap between complex machine learning algorithms and their interpretability. They display how an algorithm learns from the data and derives results. This opens up exciting possibilities for applying machine learning algorithms in the real world
    corecore