47 research outputs found

    MCNN-LSTM: Combining CNN and LSTM to classify multi-class text in imbalanced news data

    Get PDF
    Searching, retrieving, and arranging text in ever-larger document collections necessitate more efficient information processing algorithms. Document categorization is a crucial component of various information processing systems for supervised learning. As the quantity of documents grows, the performance of classic supervised classifiers has deteriorated because of the number of document categories. Assigning documents to a predetermined set of classes is called text classification. It is utilized extensively in a wide range of data-intensive applications. However, the fact that real-world implementations of these models are plagued with shortcomings begs for more investigation. Imbalanced datasets hinder the most prevalent high-performance algorithms. In this paper, we propose an approach name multi-class Convolutional Neural Network (MCNN)-Long Short-Time Memory (LSTM), which combines two deep learning techniques, Convolutional Neural Network (CNN) and Long Short-Time Memory, for text classification in news data. CNN's are used as feature extractors for the LSTMs on text input data and have the spatial structure of words in a sentence, paragraph, or document. The dataset is also imbalanced, and we use the Tomek-Link algorithm to balance the dataset and then apply our model, which shows better performance in terms of F1-score (98%) and Accuracy (99.71%) than the existing works. The combination of deep learning techniques used in our approach is ideal for the classification of imbalanced datasets with underrepresented categories. Hence, our method outperformed other machine learning algorithms in text classification by a large margin. We also compare our results with traditional machine learning algorithms in terms of imbalanced and balanced datasets

    BreastNet18: A High Accuracy Fine-Tuned VGG16 Model Evaluated Using Ablation Study for Diagnosing Breast Cancer from Enhanced Mammography Images

    No full text
    Background: Identification and treatment of breast cancer at an early stage can reduce mortality. Currently, mammography is the most widely used effective imaging technique in breast cancer detection. However, an erroneous mammogram based interpretation may result in false diagnosis rate, as distinguishing cancerous masses from adjacent tissue is often complex and error-prone. Methods: Six pre-trained and fine-tuned deep CNN architectures: VGG16, VGG19, MobileNetV2, ResNet50, DenseNet201, and InceptionV3 are evaluated to determine which model yields the best performance. We propose a BreastNet18 model using VGG16 as foundational base, since VGG16 performs with the highest accuracy. An ablation study is performed on BreastNet18, to evaluate its robustness and achieve the highest possible accuracy. Various image processing techniques with suitable parameter values are employed to remove artefacts and increase the image quality. A total dataset of 1442 preprocessed mammograms was augmented using seven augmentation techniques, resulting in a dataset of 11,536 images. To investigate possible overfitting issues, a k-fold cross validation is carried out. The model was then tested on noisy mammograms to evaluate its robustness. Results were compared with previous studies. Results: Proposed BreastNet18 model performed best with a training accuracy of 96.72%, a validating accuracy of 97.91%, and a test accuracy of 98.02%. In contrast to this, VGGNet19 yielded test accuracy of 96.24%, MobileNetV2 77.84%, ResNet50 79.98%, DenseNet201 86.92%, and InceptionV3 76.87%. Conclusions: Our proposed approach based on image processing, transfer learning, fine-tuning, and ablation study has demonstrated a high correct breast cancer classification while dealing with a limited number of complex medical images

    A Robust Framework Combining Image Processing and Deep Learning Hybrid Model to Classify Cardiovascular Diseases Using a Limited Number of Paper-Based Complex ECG Images

    No full text
    Heart disease can be life-threatening if not detected and treated at an early stage. The electrocardiogram (ECG) plays a vital role in classifying cardiovascular diseases, and often physicians and medical researchers examine paper-based ECG images for cardiac diagnosis. An automated heart disease prediction system might help to classify heart diseases accurately at an early stage. This study aims to classify cardiac diseases into five classes with paper-based ECG images using a deep learning approach with the highest possible accuracy and the lowest possible time complexity. This research consists of two approaches. In the first approach, five deep learning models, InceptionV3, ResNet50, MobileNetV2, VGG19, and DenseNet201, are employed. In the second approach, an integrated deep learning model (InRes-106) is introduced, combining InceptionV3 and ResNet50. This model is developed as a deep convolutional neural network capable of extracting hidden and high-level features from images. An ablation study is conducted on the proposed model altering several components and hyperparameters, improving the performance even further. Before training the model, several image pre-processing techniques are employed to remove artifacts and enhance the image quality. Our proposed hybrid InRes-106 model performed best with a testing accuracy of 98.34%. The InceptionV3 model acquired a testing accuracy of 90.56%, the ResNet50 89.63%, the DenseNet201 88.94%, the VGG19 87.87%, and the MobileNetV2 achieved 80.56% testing accuracy. The model is trained with a k-fold cross-validation technique with different k values to evaluate the robustness further. Although the dataset contains a limited number of complex ECG images, our proposed approach, based on various image pre-processing techniques, model fine-tuning, and ablation studies, can effectively diagnose cardiac diseases

    Automated Detection of Broncho-Arterial Pairs Using CT Scans Employing Different Approaches to Classify Lung Diseases

    No full text
    Current research indicates that for the identification of lung disorders, comprising pneumonia and COVID-19, structural distortions of bronchi and arteries (BA) should be taken into account. CT scans are an effective modality to detect lung anomalies. However, anomalies in bronchi and arteries can be difficult to detect. Therefore, in this study, alterations of bronchi and arteries are considered in the classification of lung diseases. Four approaches to highlight these are introduced: (a) a Hessian-based approach, (b) a region-growing algorithm, (c) a clustering-based approach, and (d) a color-coding-based approach. Prior to this, the lungs are segmented, employing several image preprocessing algorithms. The utilized COVID-19 Lung CT scan dataset contains three classes named Non-COVID, COVID, and community-acquired pneumonia, having 6983, 7593, and 2618 samples, respectively. To classify the CT scans into three classes, two deep learning architectures, (a) a convolutional neural network (CNN) and (b) a CNN with long short-term memory (LSTM) and an attention mechanism, are considered. Both these models are trained with the four datasets achieved from the four approaches. Results show that the CNN model achieved test accuracies of 88.52%, 87.14%, 92.36%, and 95.84% for the Hessian, the region-growing, the color-coding, and the clustering-based approaches, respectively. The CNN with LSTM and an attention mechanism model results in an increase in overall accuracy for all approaches with an 89.61%, 88.28%, 94.61%, and 97.12% test accuracy for the Hessian, region-growing, color-coding, and clustering-based approaches, respectively. To assess overfitting, the accuracy and loss curves and k-fold cross-validation technique are employed. The Hessian-based and region-growing algorithm-based approaches produced nearly equivalent outcomes. Our proposed method outperforms state-of-the-art studies, indicating that it may be worthwhile to pay more attention to BA features in lung disease classification based on CT images

    An Effective Ensemble Machine Learning Approach to Classify Breast Cancer Based on Feature Selection and Lesion Segmentation Using Preprocessed Mammograms

    No full text
    Background: Breast cancer, behind skin cancer, is the second most frequent malignancy among women, initiated by an unregulated cell division in breast tissues. Although early mammogram screening and treatment result in decreased mortality, differentiating cancer cells from surrounding tissues are often fallible, resulting in fallacious diagnosis. Method: The mammography dataset is used to categorize breast cancer into four classes with low computational complexity, introducing a feature extraction-based approach with machine learning (ML) algorithms. After artefact removal and the preprocessing of the mammograms, the dataset is augmented with seven augmentation techniques. The region of interest (ROI) is extracted by employing several algorithms including a dynamic thresholding method. Sixteen geometrical features are extracted from the ROI while eleven ML algorithms are investigated with these features. Three ensemble models are generated from these ML models employing the stacking method where the first ensemble model is built by stacking ML models with an accuracy of over 90% and the accuracy thresholds for generating the rest of the ensemble models are >95% and >96. Five feature selection methods with fourteen configurations are applied to notch up the performance. Results: The Random Forest Importance algorithm, with a threshold of 0.045, produces 10 features that acquired the highest performance with 98.05% test accuracy by stacking Random Forest and XGB classifier, having a higher than >96% accuracy. Furthermore, with K-fold cross-validation, consistent performance is observed across all K values ranging from 3–30. Moreover, the proposed strategy combining image processing, feature extraction and ML has a proven high accuracy in classifying breast cancer

    BreastNet18: A High Accuracy Fine-Tuned VGG16 Model Evaluated Using Ablation Study for Diagnosing Breast Cancer from Enhanced Mammography Images

    No full text
    Background: Identification and treatment of breast cancer at an early stage can reduce mortality. Currently, mammography is the most widely used effective imaging technique in breast cancer detection. However, an erroneous mammogram based interpretation may result in false diagnosis rate, as distinguishing cancerous masses from adjacent tissue is often complex and error-prone. Methods: Six pre-trained and fine-tuned deep CNN architectures: VGG16, VGG19, MobileNetV2, ResNet50, DenseNet201, and InceptionV3 are evaluated to determine which model yields the best performance. We propose a BreastNet18 model using VGG16 as foundational base, since VGG16 performs with the highest accuracy. An ablation study is performed on BreastNet18, to evaluate its robustness and achieve the highest possible accuracy. Various image processing techniques with suitable parameter values are employed to remove artefacts and increase the image quality. A total dataset of 1442 preprocessed mammograms was augmented using seven augmentation techniques, resulting in a dataset of 11,536 images. To investigate possible overfitting issues, a k-fold cross validation is carried out. The model was then tested on noisy mammograms to evaluate its robustness. Results were compared with previous studies. Results: Proposed BreastNet18 model performed best with a training accuracy of 96.72%, a validating accuracy of 97.91%, and a test accuracy of 98.02%. In contrast to this, VGGNet19 yielded test accuracy of 96.24%, MobileNetV2 77.84%, ResNet50 79.98%, DenseNet201 86.92%, and InceptionV3 76.87%. Conclusions: Our proposed approach based on image processing, transfer learning, fine-tuning, and ablation study has demonstrated a high correct breast cancer classification while dealing with a limited number of complex medical images

    Train and test data distribution.

    No full text
    Named Entity Recognition (NER) plays a significant role in enhancing the performance of all types of domain specific applications in Natural Language Processing (NLP). According to the type of application, the goal of NER is to identify target entities based on the context of other existing entities in a sentence. Numerous architectures have demonstrated good performance for high-resource languages such as English and Chinese NER. However, currently existing NER models for Bengali could not achieve reliable accuracy due to morphological richness of Bengali and limited availability of resources. This work integrates both Data and Model Centric AI concepts to achieve a state-of-the-art performance. A unique dataset was created for this study demonstrating the impact of a good quality dataset on accuracy. We proposed a method for developing a high quality NER dataset for any language. We have used our dataset to evaluate the performance of various Deep Learning models. A hybrid model performed with the exact match F1 score of 87.50%, partial match F1 score of 92.31%, and micro F1 score of 98.32%. Our proposed model reduces the need for feature engineering and utilizes minimal resources.</div

    Pixel-level image analysis to derive the broncho-artery (BA) ratio employing HRCT scans: A computer-aided approach

    No full text
    Bronchiectasis in children is a major health issue which can be life-threatening if not diagnosed and effectively treated. In the diagnosis of bronchiectasis, an increased broncho-arterial (BA) ratio is considered a significant marker. The BA ratio is measured by evaluating BA pairs, using high-resolution computed tomography (HRCT) scans. Detecting BA pairs automatically is challenging due to the complex characteristics of BA pairs and the ambiguous appearance of the bronchi. This study proposes an effective computerized approach to detect BA pairs and assess BA ratio using HRCT scans of children and employing computer-aided techniques and novel custom-build algorithms. Attention is given to reconstructing broken bronchial walls and identifying discrete BA pairs using custom-built kernel based and patch-based algorithms for pixel-level image analysis. To detect BA pairs, the lung region is segmented in the HRCT slices and image preprocessing techniques, including noise reduction, binarizing, largest contour detection and a hole-filling algorithm, are applied. A histogram analysis method is introduced to clean the images. A kernel-based algorithm is proposed to reconstruct the pixel distribution if the bronchial wall is so that the bronchi can be detected precisely. Potential arteries are detected using balanced histogram thresholding, morphological opening and an approach based on four conditions related to the object area circularity, rectangular boundary box ratio and enclosing circle area ratio. Potential bronchi are detected through matching of object coordinates with potential arteries, hole-filling and four condition based approaches. The potential BA pairs are detected by matching the coordinates of potential bronchi with those of potential arteries as the artery and bronchus are adjacent to each other in BA pairs. Finally, from the potential BA pairs, actual BA pairs are identified using a custom-built patch algorithm. The study is conducted using 2471 HRCT slices of seven children, obtained from the Royal Darwin Hospital, Australia. The BA ratio is derived based on the ratio of diameters, major axis lengths, minor axis lengths, area, convex hull and equivalent diameter where the BA ratios are respectively 0.51–0.65, 0.49–0.59, 0.59–0.77, 0.25–0.42, 0.29–0.47, 1.5–2 and 0.50–0.65

    Sample evaluation.

    No full text
    Named Entity Recognition (NER) plays a significant role in enhancing the performance of all types of domain specific applications in Natural Language Processing (NLP). According to the type of application, the goal of NER is to identify target entities based on the context of other existing entities in a sentence. Numerous architectures have demonstrated good performance for high-resource languages such as English and Chinese NER. However, currently existing NER models for Bengali could not achieve reliable accuracy due to morphological richness of Bengali and limited availability of resources. This work integrates both Data and Model Centric AI concepts to achieve a state-of-the-art performance. A unique dataset was created for this study demonstrating the impact of a good quality dataset on accuracy. We proposed a method for developing a high quality NER dataset for any language. We have used our dataset to evaluate the performance of various Deep Learning models. A hybrid model performed with the exact match F1 score of 87.50%, partial match F1 score of 92.31%, and micro F1 score of 98.32%. Our proposed model reduces the need for feature engineering and utilizes minimal resources.</div
    corecore