141 research outputs found
Breast Cancer Diagnosis from Perspective of Class Imbalance
Introduction: Breast cancer is the second cause of mortality among women. Early detection is the only rescue to reduce the risk of breast cancer mortality. Traditional methods cannot effectively diagnose tumor since they are based on the assumption of well-balanced dataset.. However, a hybrid method can help to alleviate the two-class imbalance problem existing in the diagnosis of breast cancer and establish a more accurate diagnosis. Material and Methods: The proposed hybrid approach was based on improved Laplacian score (LS) andK-nearest neighbor (KNN) algorithms called LS-KNN. An improved LS algorithm was used for obtaining the optimal feature subset. The KNN with automatic K was utilized for classifying the data which guaranteed the effectiveness of the proposed method by reducing the computational effort and making the classification more faster. The effectiveness of LS-KNN was also examined on two biased-representative breast cancer datasets using classification accuracy, sensitivity, specificity, G-mean, and Matthews correlation coefficient. Results: Applying the proposed algorithm on two breast cancer datasets indicated that the efficiency of the new method was higher than the previously introduced methods. The obtained values of accuracy, sensitivity, specificity, G-mean, and Matthews correlation coefficient were 99.27%, 99.12%, 99.51%, 99.42%, respectively. Conclusion: Experimental results showed that the proposed approach worked well with breast cancer datasets and could be a good alternative to the well-known machine learning method
Implementing decision tree-based algorithms in medical diagnostic decision support systems
As a branch of healthcare, medical diagnosis can be defined as finding the disease based on the signs and symptoms of the patient. To this end, the required information is gathered from different sources like physical examination, medical history and general information of the patient. Development of smart classification models for medical diagnosis is of great interest amongst the researchers. This is mainly owing to the fact that the machine learning and data mining algorithms are capable of detecting the hidden trends between features of a database. Hence, classifying the medical datasets using smart techniques paves the way to design more efficient medical diagnostic decision support systems.
Several databases have been provided in the literature to investigate different aspects of diseases. As an alternative to the available diagnosis tools/methods, this research involves machine learning algorithms called Classification and Regression Tree (CART), Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) for the development of classification models that can be implemented in computer-aided diagnosis systems. As a decision tree (DT), CART is fast to create, and it applies to both the quantitative and qualitative data. For classification problems, RF and ET employ a number of weak learners like CART to develop models for classification tasks.
We employed Wisconsin Breast Cancer Database (WBCD), Z-Alizadeh Sani dataset for coronary artery disease (CAD) and the databanks gathered in Ghaem Hospital’s dermatology clinic for the response of patients having common and/or plantar warts to the cryotherapy and/or immunotherapy methods. To classify the breast cancer type based on the WBCD, the RF and ET methods were employed. It was found that the developed RF and ET models forecast the WBCD type with 100% accuracy in all cases. To choose the proper treatment approach for warts as well as the CAD diagnosis, the CART methodology was employed. The findings of the error analysis revealed that the proposed CART models for the applications of interest attain the highest precision and no literature model can rival it. The outcome of this study supports the idea that methods like CART, RF and ET not only improve the diagnosis precision, but also reduce the time and expense needed to reach a diagnosis. However, since these strategies are highly sensitive to the quality and quantity of the introduced data, more extensive databases with a greater number of independent parameters might be required for further practical implications of the developed models
Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers
This work was partially supported by the National Natural Science Foundation of China (61403206, 61876089,61876185), the Natural Science Foundation of Jiangsu Province (BK20141005), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (14KJB520025), the Engineering Research Center of Digital Forensics, Ministry of Education, and the Priority Academic Program Development of Jiangsu Higher Education Institutions.Peer reviewedPostprin
Computational models and approaches for lung cancer diagnosis
The success of treatment of patients with cancer depends on establishing an accurate diagnosis. To this end, the aim of this study is to developed novel lung cancer diagnostic models. New algorithms are proposed to analyse the biological data and extract knowledge that assists in achieving accurate diagnosis results
Storage Capacity Estimation of Commercial Scale Injection and Storage of CO2 in the Jacksonburg-Stringtown Oil Field, West Virginia
Geological capture, utilization and storage (CCUS) of carbon dioxide (CO2) in depleted oil and gas reservoirs is one method to reduce greenhouse gas emissions with enhanced oil recovery (EOR) and extending the life of the field. Therefore CCUS coupled with EOR is considered to be an economic approach to demonstration of commercial-scale injection and storage of anthropogenic CO2. Several critical issues should be taken into account prior to injecting large volumes of CO2, such as storage capacity, project duration and long-term containment. Reservoir characterization and 3D geological modeling are the best way to estimate the theoretical CO 2 storage capacity in mature oil fields. The Jacksonburg-Stringtown field, located in northwestern West Virginia, has produced over 22 million barrels of oil (MMBO) since 1895. The sandstone of the Late Devonian Gordon Stray is the primary reservoir.;The Upper Devonian fluvial sandstone reservoirs in Jacksonburg-Stringtown oil field, which has produced over 22 million barrels of oil since 1895, are an ideal candidate for CO2 sequestration coupled with EOR. Supercritical depth (\u3e2500 ft.), minimum miscible pressure (941 psi), favorable API gravity (46.5°) and good water flood response are indicators that facilitate CO 2-EOR operations. Moreover, Jacksonburg-Stringtown oil field is adjacent to a large concentration of CO2 sources located along the Ohio River that could potentially supply enough CO2 for sequestration and EOR without constructing new pipeline facilities.;Permeability evaluation is a critical parameter to understand the subsurface fluid flow and reservoir management for primary and enhanced hydrocarbon recovery and efficient carbon storage. In this study, a rapid, robust and cost-effective artificial neural network (ANN) model is constructed to predict permeability using the model\u27s strong ability to recognize the possible interrelationships between input and output variables. Two commonly available conventional well logs, gamma ray and bulk density, and three logs derived variables, the slope of GR, the slope of bulk density and Vsh were selected as input parameters and permeability was selected as desired output parameter to train and test an artificial neural network. The results indicate that the ANN model can be applied effectively in permeability prediction.;Porosity is another fundamental property that characterizes the storage capability of fluid and gas bearing formations in a reservoir. In this study, a support vector machine (SVM) with mixed kernels function (MKF) is utilized to construct the relationship between limited conventional well log suites and sparse core data. The input parameters for SVM model consist of core porosity values and the same log suite as ANN\u27s input parameters, and porosity is the desired output. Compared with results from the SVM model with a single kernel function, mixed kernel function based SVM model provide more accurate porosity prediction values.;Base on the well log analysis, four reservoir subunits within a marine-dominated estuarine depositional system are defined: barrier sand, central bay shale, tidal channels and fluvial channel subunits. A 3-D geological model, which is used to estimate theoretical CO2 sequestration capacity, is constructed with the integration of core data, wireline log data and geological background knowledge. Depending on the proposed 3-D geological model, the best regions for coupled CCUS-EOR are located in southern portions of the field, and the estimated CO2 theoretical storage capacity for Jacksonburg-Stringtown oil field vary between 24 to 383 million metric tons. The estimation results of CO2 sequestration and EOR potential indicate that the Jacksonburg-Stringtown oilfield has significant potential for CO2 storage and value-added EOR
Recommended from our members
Evolutionary computation-based feature selection for finding a stable set of features in high-dimensional data
Evolutionary Computation (EC) algorithms have proved to work well for feature selection because they are powerful search techniques and can produce multiple good solutions. However, they suffer from some limitations for real world applications. Firstly, ECs require high computation time as they evaluate many solutions at each iteration. Secondly, a classifier is usually used as their fitness function which causes the selected subset to perform well only on the utilised classifier (e.g. classifier-bias). Lastly, ECs, as stochastic search methods, return a different final subset in different runs which poses a problem for finding a stable set of features (e.g. stability issue). To address computation time and classifier-bias limitations, this thesis proposes a new two-stage selection approach called filter/filter in which two filter feature selection algorithms are combined. In the first stage, a ranking algorithm forms a reduced dataset by selecting the most informative features from the original dataset. In the second stage, the reduced dataset is fed to a novel EC algorithm to select final feature subset. This new EC algorithm is a Tabu search hybridised with an Asexual Genetic Algorithm called TAGA. TAGA benefits from new search components and solution representation which can effectively reduce computation time. To select a classifier-unbiased final subset, a statistical criterion is used as the fitness function which evaluates the subset independent of any classifier. Experiments show that the proposed filter/filter requires an acceptable computation time and selects more classifier-unbiased features compared to the state-of-the-arts. To find a stable set of features, a novel Generalisation Power Index (GPI) is proposed to analyse the generalisation power of final subsets of an EC in several runs. Generalisation power refers to performance capability of a subset over wide range of classifiers. Computation results confirm that GPI is able to find a stable set of features which achieves near optimal accuracy when used to train various classifiers. To ex amine the suitability of the proposed methods for real-world applications, the filter/filter approach and GPI are integrated to select a stable set of features for METABRIC breast cancer subtype classification problem. Experimental results show that this integration not only can address the limitations of ECs for a real-world biomedical feature selection problem but it performs better than alternatives methods
Applications
Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
- …