681 research outputs found

    Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network

    Get PDF
    The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN) is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI) machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets

    Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks

    Full text link
    Neural networks (NNs) have become the state of the art in many machine learning applications, especially in image and sound processing [1]. The same, although to a lesser extent [2,3], could be said in natural language processing (NLP) tasks, such as named entity recognition. However, the success of NNs remains dependent on the availability of large labelled datasets, which is a significant hurdle in many important applications. One such case are electronic health records (EHRs), which are arguably the largest source of medical data, most of which lies hidden in natural text [4,5]. Data access is difficult due to data privacy concerns, and therefore annotated datasets are scarce. With scarce data, NNs will likely not be able to extract this hidden information with practical accuracy. In our study, we develop an approach that solves these problems for named entity recognition, obtaining 94.6 F1 score in I2B2 2009 Medical Extraction Challenge [6], 4.3 above the architecture that won the competition. Beyond the official I2B2 challenge, we further achieve 82.4 F1 on extracting relationships between medical terms. To reach this state-of-the-art accuracy, our approach applies transfer learning to leverage on datasets annotated for other I2B2 tasks, and designs and trains embeddings that specially benefit from such transfer.Comment: 11 pages, 4 figures, 8 table

    Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition

    Get PDF
    Over the last decade, research has focused on machine learning and data mining to develop frameworks that can improve data analysis and output performance; to build accurate decision support systems that benefit from real-life datasets. This leads to the field of clinical data analysis, which has attracted a significant amount of interest in the computing, information systems, and medical fields. To create and develop models by machine learning algorithms, there is a need for a particular type of data for the existing algorithms to build an efficient model. Clinical datasets pose several issues that can affect the classification of the dataset: missing values, high dimensionality, and class imbalance. In order to build a framework for mining the data, it is necessary first to preprocess data, by eliminating patients’ records that have too many missing values, imputing missing values, addressing high dimensionality, and classifying the data for decision support.This thesis investigates a real clinical dataset to solve their challenges. Autoencoder is employed as a tool that can compress data mining methodology, by extracting features and classifying data in one model. The first step in data mining methodology is to impute missing values, so several imputation methods are analysed and employed. Then high dimensionality is demonstrated and used to discard irrelevant and redundant features, in order to improve prediction accuracy and reduce computational complexity. Class imbalance is manipulated to investigate the effect on feature selection algorithms and classification algorithms.The first stage of analysis is to investigate the role of the missing values. Results found that techniques based on class separation will outperform other techniques in predictive ability. The next stage is to investigate the high dimensionality and a class imbalance. However it was found a small set of features that can improve the classification performance, the balancing class does not affect the performance as much as imbalance class

    Artificial Neural Network Parameter Tuning Framework For Heart Disease Classification

    Get PDF
    Heart Disease are among the leading cause of death worldwide. The application of artificial neural network as decision support tool for heart disease detection. However, artificial neural network required multitude of parameter setting in order to find the optimum parameter setting that produce the best performance. This paper proposed the parameter tuning framework for artificial neural network. Statlog heart disease dataset and Cleveland heart disease dataset is used to evaluate the performance of the proposed framework. The results show that the proposed framework able to produce high classification accuracy where the overall classification accuracy for Cleveland dataset is 90.9% and 90% for Statlog dataset

    Breast Tumor Classification Using an Ensemble Machine Learning Method

    Get PDF
    Breast cancer is the most common cause of death for women worldwide. Thus, the ability of artificial intelligence systems to detect possible breast cancer is very important. In this paper, an ensemble classification mechanism is proposed based on a majority voting mechanism. First, the performance of different state-of-the-art machine learning classification algorithms were evaluated for the Wisconsin Breast Cancer Dataset (WBCD). The three best classifiers were then selected based on their F3 score. F3 score is used to emphasize the importance of false negatives (recall) in breast cancer classification. Then, these three classifiers, simple logistic regression learning, support vector machine learning with stochastic gradient descent optimization and multilayer perceptron network, are used for ensemble classification using a voting mechanism. We also evaluated the performance of hard and soft voting mechanism. For hard voting, majority-based voting mechanism was used and for soft voting we used average of probabilities, product of probabilities, maximum of probabilities and minimum of probabilities-based voting methods. The hard voting (majority-based voting) mechanism shows better performance with 99.42%, as compared to the state-of-the-art algorithm for WBCD

    A New Approach of Rough Set Theory for ‎Feature Selection and Bayes Net Classifier ‎Applied on Heart Disease Dataset

    Get PDF
    درسنا في هذا البحث اختيار الصفات بالاعتماد على نهج جديد من  خوارزمية مجموعة التقريب حيث تعتمد هذه الطريقة على اختيار الصفات الأكثر تاثيرا. لجئنا الى انتقاء الصفات اختصارا للوقت , وجود الصفة تؤثر على دقة النتائج او قد تكون الصفة غير متوفرة . تم تطبيق الخوارزمية على بيانات امراض القلب لاختيار افضل الصفات المؤثرة. ان المشكلة الرئيسية هو كيفية تشخيص الإصابة فيما لو كان مصاب بمرض القلب من عدمه.هذه المشكلة تمثل تحدي لان لا نسطيع اتخاذ القرار بصورة مباشرة. تعتمد الطريقة المقترحة على ترميز البيانات الاصلية .ان الناتج من هذه الخوارزميه هي الصفات الأكثر أهمية حيث تهمل الصفات السيئة والغير ضرورية.وتم تطبيق النتائج على خوارزمية شكبة بيزينت كخوارزمية للتنبؤ بالمرض وقد حصلنا على النتائج 82.17 , 83.49 , 74.58 عند استخدام جميع الصفات ,12 , 7 طول الصفات على التوالي.وتم تطبيق نتائج خوارزمية مجموعة التقريب الاصلية على خوارزمية البيزين وحصلنا على النتائج 58.41 ,81.51  عند استخدام 2 , 12 طول الصفات على التواليIn this paper a new approach of rough set features selection has been proposed. Feature selection has been used for several reasons a) decrease time of prediction b) feature possibly is not found c) present of feature case bad prediction. Rough set has been used to select most significant features. The proposed rough set has been applied on heart diseases data sets. The main problem is how to predict patient has heart disease or not depend on given features. The problem is challenge, because it cannot determine decision directly .Rough set has been modified to get attributes for prediction by ignored unnecessary and bad features. Bayes net has been used for classified method. 10-fold cross validation is used for evaluation. The Correct Classified Instances were 82.17, 83.49, and 74.58 when use full, 12, 7 length of attributes respectively. Traditional rough set has been applied, the minimum Correct Classified Instances were 58.41 and 81.51 when use 2 length of attributes respectivel

    Artificial Intelligence Techniques for Cancer Detection and Classification: Review Study

    Get PDF
    Cancer is the general name for a group of more than 100 diseases. Although cancer includes different types of diseases, they all start because abnormal cells grow out of control. Without treatment, cancer can cause serious health problems and even loss of life. Early detection of cancer may reduce mortality and morbidity. This paper presents a review of the detection methods for lung, breast, and brain cancers. These methods used for diagnosis include artificial intelligence techniques, such as support vector machine neural network, artificial neural network, fuzzy logic, and adaptive neuro-fuzzy inference system, with medical imaging like X-ray, ultrasound, magnetic resonance imaging, and computed tomography scan images. Imaging techniques are the most important approach for precise diagnosis of human cancer. We investigated all these techniques to identify a method that can provide superior accuracy and determine the best medical images for use in each type of cancer

    A voting-based machine learning approach for classifying biological and clinical datasets.

    Get PDF
    BACKGROUND: Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS: The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value \u3c 0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies by ~ 10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION: Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans
    corecore