44 research outputs found

    Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization

    Full text link
    We aimed to evaluate computer-aided diagnosis (CADx) system for lung nodule classification focusing on (i) usefulness of gradient tree boosting (XGBoost) and (ii) effectiveness of parameter optimization using Bayesian optimization (Tree Parzen Estimator, TPE) and random search. 99 lung nodules (62 lung cancers and 37 benign lung nodules) were included from public databases of CT images. A variant of local binary pattern was used for calculating feature vectors. Support vector machine (SVM) or XGBoost was trained using the feature vectors and their labels. TPE or random search was used for parameter optimization of SVM and XGBoost. Leave-one-out cross-validation was used for optimizing and evaluating the performance of our CADx system. Performance was evaluated using area under the curve (AUC) of receiver operating characteristic analysis. AUC was calculated 10 times, and its average was obtained. The best averaged AUC of SVM and XGBoost were 0.850 and 0.896, respectively; both were obtained using TPE. XGBoost was generally superior to SVM. Optimal parameters for achieving high AUC were obtained with fewer numbers of trials when using TPE, compared with random search. In conclusion, XGBoost was better than SVM for classifying lung nodules. TPE was more efficient than random search for parameter optimization.Comment: 29 pages, 4 figure

    Lung nodules identification in CT scans using multiple instance learning.

    Get PDF
    Computer Aided Diagnosis (CAD) systems for lung nodules diagnosis aim to classify nodules into benign or malignant based on images obtained from diverse imaging modalities such as Computer Tomography (CT). Automated CAD systems are important in medical domain applications as they assist radiologists in the time-consuming and labor-intensive diagnosis process. However, most available methods require a large collection of nodules that are segmented and annotated by radiologists. This process is labor-intensive and hard to scale to very large datasets. More recently, some CAD systems that are based on deep learning have emerged. These algorithms do not require the nodules to be segmented, and radiologists need to only provide the center of mass of each nodule. The training image patches are then extracted from volumes of fixed-sized centered at the provided nodule\u27s center. However, since the size of nodules can vary significantly, one fixed size volume may not represent all nodules effectively. This thesis proposes a Multiple Instance Learning (MIL) approach to address the above limitations. In MIL, each nodule is represented by a nested sequence of volumes centered at the identified center of the nodule. We extract one feature vector from each volume. The set of features for each nodule are combined and represented by a bag. Next, we investigate and adapt some existing algorithms and develop new ones for this application. We start by applying benchmark MIL algorithms to traditional Gray Level Co-occurrence Matrix (GLCM) engineered features. Then, we design and train simple Convolutional Neural Networks (CNNs) to learn and extract features that characterize lung nodules. These extracted features are then fed to a benchmark MIL algorithm to learn a classification model. Finally, we develop new algorithms (MIL-CNN) that combine feature learning and multiple instance classification in a single network. These algorithms generalize the CNN architecture to multiple instance data. We design and report the results of three experiments applied on both generative (GLCM) and learned (CNN) features using two datasets (The Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) \cite{armato2011lung} and the National Lung Screening Trial (NLST) \cite{national2011reduced}). Two of these experiments perform five-fold cross-validations on the same dataset (NLST or LIDC). The third experiment trains the algorithms on one collection (NLST dataset) and tests it on the other (LIDC dataset). We designed our experiments to compare the different features, compare MIL versus Single Instance Learning (SIL) where a single feature vector represents a nodule, and compare our proposed end-to-end MIL approaches to existing benchmark MIL methods. We demonstrate that our proposed MIL-CNN frameworks are more accurate for the lung nodules diagnosis task. We also show that MIL representation achieves better results than SIL applied on the ground truth region of each nodule

    Machine learning approaches for lung cancer diagnosis.

    Get PDF
    The enormity of changes and development in the field of medical imaging technology is hard to fathom, as it does not just represent the technique and process of constructing visual representations of the body from inside for medical analysis and to reveal the internal structure of different organs under the skin, but also it provides a noninvasive way for diagnosis of various disease and suggest an efficient ways to treat them. While data surrounding all of our lives are stored and collected to be ready for analysis by data scientists, medical images are considered a rich source that could provide us with a huge amount of data, that could not be read easily by physicians and radiologists, with valuable information that could be used in smart ways to discover new knowledge from these vast quantities of data. Therefore, the design of computer-aided diagnostic (CAD) system, that can be approved for use in clinical practice that aid radiologists in diagnosis and detecting potential abnormalities, is of a great importance. This dissertation deals with the development of a CAD system for lung cancer diagnosis, which is the second most common cancer in men after prostate cancer and in women after breast cancer. Moreover, lung cancer is considered the leading cause of cancer death among both genders in USA. Recently, the number of lung cancer patients has increased dramatically worldwide and its early detection doubles a patient’s chance of survival. Histological examination through biopsies is considered the gold standard for final diagnosis of pulmonary nodules. Even though resection of pulmonary nodules is the ideal and most reliable way for diagnosis, there is still a lot of different methods often used just to eliminate the risks associated with the surgical procedure. Lung nodules are approximately spherical regions of primarily high density tissue that are visible in computed tomography (CT) images of the lung. A pulmonary nodule is the first indication to start diagnosing lung cancer. Lung nodules can be benign (normal subjects) or malignant (cancerous subjects). Large (generally defined as greater than 2 cm in diameter) malignant nodules can be easily detected with traditional CT scanning techniques. However, the diagnostic options for small indeterminate nodules are limited due to problems associated with accessing small tumors. Therefore, additional diagnostic and imaging techniques which depends on the nodules’ shape and appearance are needed. The ultimate goal of this dissertation is to develop a fast noninvasive diagnostic system that can enhance the accuracy measures of early lung cancer diagnosis based on the well-known hypotheses that malignant nodules have different shape and appearance than benign nodules, because of the high growth rate of the malignant nodules. The proposed methodologies introduces new shape and appearance features which can distinguish between benign and malignant nodules. To achieve this goal a CAD system is implemented and validated using different datasets. This CAD system uses two different types of features integrated together to be able to give a full description to the pulmonary nodule. These two types are appearance features and shape features. For the appearance features different texture appearance descriptors are developed, namely the 3D histogram of oriented gradient, 3D spherical sector isosurface histogram of oriented gradient, 3D adjusted local binary pattern, 3D resolved ambiguity local binary pattern, multi-view analytical local binary pattern, and Markov Gibbs random field. Each one of these descriptors gives a good description for the nodule texture and the level of its signal homogeneity which is a distinguishable feature between benign and malignant nodules. For the shape features multi-view peripheral sum curvature scale space, spherical harmonics expansions, and different group of fundamental geometric features are utilized to describe the nodule shape complexity. Finally, the fusion of different combinations of these features, which is based on two stages is introduced. The first stage generates a primary estimation for every descriptor. Followed by the second stage that consists of an autoencoder with a single layer augmented with a softmax classifier to provide us with the ultimate classification of the nodule. These different combinations of descriptors are combined into different frameworks that are evaluated using different datasets. The first dataset is the Lung Image Database Consortium which is a benchmark publicly available dataset for lung nodule detection and diagnosis. The second dataset is our local acquired computed tomography imaging data that has been collected from the University of Louisville hospital and the research protocol was approved by the Institutional Review Board at the University of Louisville (IRB number 10.0642). These frameworks accuracy was about 94%, which make the proposed frameworks demonstrate promise to be valuable tool for the detection of lung cancer

    A proposed methodology for detecting the malignant potential of pulmonary nodules in sarcoma using computed tomographic imaging and artificial intelligence-based models

    Get PDF
    The presence of lung metastases in patients with primary malignancies is an important criterion for treatment management and prognostication. Computed tomography (CT) of the chest is the preferred method to detect lung metastasis. However, CT has limited efficacy in differentiating metastatic nodules from benign nodules (e.g., granulomas due to tuberculosis) especially at early stages (<5 mm). There is also a significant subjectivity associated in making this distinction, leading to frequent CT follow-ups and additional radiation exposure along with financial and emotional burden to the patients and family. Even 18F-fluoro-deoxyglucose positron emission technology-computed tomography (18F-FDG PET-CT) is not always confirmatory for this clinical problem. While pathological biopsy is the gold standard to demonstrate malignancy, invasive sampling of small lung nodules is often not clinically feasible. Currently, there is no non-invasive imaging technique that can reliably characterize lung metastases. The lung is one of the favored sites of metastasis in sarcomas. Hence, patients with sarcomas, especially from tuberculosis prevalent developing countries, can provide an ideal platform to develop a model to differentiate lung metastases from benign nodules. To overcome the lack of optimal specificity of CT scan in detecting pulmonary metastasis, a novel artificial intelligence (AI)-based protocol is proposed utilizing a combination of radiological and clinical biomarkers to identify lung nodules and characterize it as benign or metastasis. This protocol includes a retrospective cohort of nearly 2,000–2,250 sample nodules (from at least 450 patients) for training and testing and an ambispective cohort of nearly 500 nodules (from 100 patients; 50 patients each from the retrospective and prospective cohort) for validation. Ground-truth annotation of lung nodules will be performed using an in-house-built segmentation tool. Ground-truth labeling of lung nodules (metastatic/benign) will be performed based on histopathological results or baseline and/or follow-up radiological findings along with clinical outcome of the patient. Optimal methods for data handling and statistical analysis are included to develop a robust protocol for early detection and classification of pulmonary metastasis at baseline and at follow-up and identification of associated potential clinical and radiological markers

    Implementing decision tree-based algorithms in medical diagnostic decision support systems

    Get PDF
    As a branch of healthcare, medical diagnosis can be defined as finding the disease based on the signs and symptoms of the patient. To this end, the required information is gathered from different sources like physical examination, medical history and general information of the patient. Development of smart classification models for medical diagnosis is of great interest amongst the researchers. This is mainly owing to the fact that the machine learning and data mining algorithms are capable of detecting the hidden trends between features of a database. Hence, classifying the medical datasets using smart techniques paves the way to design more efficient medical diagnostic decision support systems. Several databases have been provided in the literature to investigate different aspects of diseases. As an alternative to the available diagnosis tools/methods, this research involves machine learning algorithms called Classification and Regression Tree (CART), Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) for the development of classification models that can be implemented in computer-aided diagnosis systems. As a decision tree (DT), CART is fast to create, and it applies to both the quantitative and qualitative data. For classification problems, RF and ET employ a number of weak learners like CART to develop models for classification tasks. We employed Wisconsin Breast Cancer Database (WBCD), Z-Alizadeh Sani dataset for coronary artery disease (CAD) and the databanks gathered in Ghaem Hospital’s dermatology clinic for the response of patients having common and/or plantar warts to the cryotherapy and/or immunotherapy methods. To classify the breast cancer type based on the WBCD, the RF and ET methods were employed. It was found that the developed RF and ET models forecast the WBCD type with 100% accuracy in all cases. To choose the proper treatment approach for warts as well as the CAD diagnosis, the CART methodology was employed. The findings of the error analysis revealed that the proposed CART models for the applications of interest attain the highest precision and no literature model can rival it. The outcome of this study supports the idea that methods like CART, RF and ET not only improve the diagnosis precision, but also reduce the time and expense needed to reach a diagnosis. However, since these strategies are highly sensitive to the quality and quantity of the introduced data, more extensive databases with a greater number of independent parameters might be required for further practical implications of the developed models

    Bioinformatics Applications Based On Machine Learning

    Get PDF
    The great advances in information technology (IT) have implications for many sectors, such as bioinformatics, and has considerably increased their possibilities. This book presents a collection of 11 original research papers, all of them related to the application of IT-related techniques within the bioinformatics sector: from new applications created from the adaptation and application of existing techniques to the creation of new methodologies to solve existing problems
    corecore