853 research outputs found

    Sparse multinomial kernel discriminant analysis (sMKDA)

    No full text
    Dimensionality reduction via canonical variate analysis (CVA) is important for pattern recognition and has been extended variously to permit more flexibility, e.g. by "kernelizing" the formulation. This can lead to over-fitting, usually ameliorated by regularization. Here, a method for sparse, multinomial kernel discriminant analysis (sMKDA) is proposed, using a sparse basis to control complexity. It is based on the connection between CVA and least-squares, and uses forward selection via orthogonal least-squares to approximate a basis, generalizing a similar approach for binomial problems. Classification can be performed directly via minimum Mahalanobis distance in the canonical variates. sMKDA achieves state-of-the-art performance in terms of accuracy and sparseness on 11 benchmark datasets

    Efficient Extraction and Automated Thyroid Prediction with an Optimized Gated Recurrent Unit in Recurrent Neural Networks

    Get PDF
    Computer-aided tools are becoming increasingly important in medical diagnostics. This paper introduces the Efficient Feature Extraction Based Recurrent Neural Network (FERNN) for computer-aided thyroid disease prediction. The FERNN model uses a Gated Recurrent Unit Recurrent Neural Network (GRU-RNN) optimized with the COOT Optimization Algorithm.The study begins by gathering data from an open-source system and preprocessing it using min-max normalization to address missing values. The preprocessed data undergoes a two-level feature extraction (TLFE) procedure. In the first level, a ranked filter feature set technique is used to prioritize features based on medical expert recommendations. In the second level, a variety of metrics, including information gain, gain ratio, chi-square, and relief, are used to rank and select features. A composite measure guided by fuzzy logic is then used to select a judicious subset of features. The FERNN model uses the GRU-RNN to classify thyroid diseases in the databases. To optimise, the COOT optimization method is employed. The model's weights. The FERNN model was put into practise in MATLAB and assessed with a variety of statistical metrics, including kappa, accuracy, precision, recall, sensitivity, specificity, and the F-measure. The proposed methodology was benchmarked against traditional techniques, including the deep belief neural network (DBN), artificial neural network (ANN), and support vector machine (SVM)

    Thyroid disease prediction using selective features and machine learning techniques

    Get PDF
    Producción CientíficaSimple Summary: The study presents a thyroid disease prediction approach which utilizes random forest-based features to obtain high accuracy. The approach can obtain a 0.99 accuracy to predict ten thyroid diseases.Thyroid disease prediction has emerged as an important task recently. Despite existing approaches for its diagnosis, often the target is binary classification, the used datasets are small-sized and results are not validated either. Predominantly, existing approaches focus on model optimization and the feature engineering part is less investigated. To overcome these limitations, this study presents an approach that investigates feature engineering for machine learning and deep learning models. Forward feature selection, backward feature elimination, bidirectional feature elimination, and machine learning-based feature selection using extra tree classifiers are adopted. The proposed approach can predict Hashimoto’s thyroiditis (primary hypothyroid), binding protein (increased binding protein), autoimmune thyroiditis (compensated hypothyroid), and non-thyroidal syndrome (NTIS) (concurrent non-thyroidal illness). Extensive experiments show that the extra tree classifier-based selected feature yields the best results with 0.99 accuracy and an F1 score when used with the random forest classifier. Results suggest that the machine learning models are a better choice for thyroid disease detection regarding the provided accuracy and the computational complexity. K-fold cross-validation and performance comparison with existing studies corroborate the superior performance of the proposed approach

    A Machine Learning Framework for Identifying Molecular Biomarkers from Transcriptomic Cancer Data

    Get PDF
    Cancer is a complex molecular process due to abnormal changes in the genome, such as mutation and copy number variation, and epigenetic aberrations such as dysregulations of long non-coding RNA (lncRNA). These abnormal changes are reflected in transcriptome by turning oncogenes on and tumor suppressor genes off, which are considered cancer biomarkers. However, transcriptomic data is high dimensional, and finding the best subset of genes (features) related to causing cancer is computationally challenging and expensive. Thus, developing a feature selection framework to discover molecular biomarkers for cancer is critical. Traditional approaches for biomarker discovery calculate the fold change for each gene, comparing expression profiles between tumor and healthy samples, thus failing to capture the combined effect of the whole gene set. Also, these approaches do not always investigate cancer-type prediction capabilities using discovered biomarkers. In this work, we proposed a machine learning-based framework to address all of the above challenges in discovering lncRNA biomarkers. First, we developed a machine learning pipeline that takes lncRNA expression profiles of cancer samples as input and outputs a small set of key lncRNAs that can accurately predict multiple cancer types. A significant innovation of our work is its ability to identify biomarkers without using healthy samples. However, this initial framework cannot identify cancer-specific lncRNAs. Second, we extended our framework to identify cancer type and subtype-specific lncRNAs. Third, we proposed to use a state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. Thus, we proposed a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. Our deep learning-based pipeline significantly extended the previous state-of-the-art feature selection techniques. Finally, we showed that discovered biomarkers are biologically relevant using literature review and prognostically significant using survival analyses. The discovered novel biomarkers could be used as a screening tool for different cancer diagnoses and as therapeutic targets

    Classification of histological images of thyroid nodules based on a combination of Deep Features and Machine Learning

    Get PDF
    Background: Thyroid nodules are a prevalent worldwide disease with complex pathological types. They can be classified as either benign or malignant. This paper presents a tool for automatically classifying histological images of thyroid nodules, with a focus on papillary carcinoma and follicular adenoma. Methods: In this work, two pre-trained Convolutional Neural Network (CNN) architectures, VGG16 and VGG19, are used to extract deep features. Then, a principal component analysis was used to reduce the dimensionality of the vectors. Then, three machine learning algorithms (Support Vector Machine, K-Nearest Neighbor, and Random Forest) were used for classification. These investigations were applied to our database collection, Results: The proposed investigations have been applied to our private database collection with a total of 112 histological images. The highest results were obtained by the VGG16 transfer deep feature and the SVM classifier with an accuracy rate equal to 100%
    • …