7 research outputs found

    Feature Selection for Text and Image Data Using Differential Evolution with SVM and Naïve Bayes Classifiers

    Get PDF
    Classification problems are increasing in various important applications such as text categorization, images, medical imaging diagnosis and bimolecular analysis etc. due to large amount of attribute set. Feature extraction methods in case of large dataset play an important role to reduce the irrelevant feature and thereby increases the performance of classifier algorithm. There exist various methods based on machine learning for text and image classification. These approaches are utilized for dimensionality reduction which aims to filter less informative and outlier data. Therefore, these approaches provide compact representation and computationally better tractable accuracy. At the same time, these methods can be challenging if the search space is doubled multiple time. To optimize such challenges, a hybrid approach is suggested in this paper. The proposed approach uses differential evolution (DE) for feature selection with naïve bayes (NB) and support vector machine (SVM) classifiers to enhance the performance of selected classifier. The results are verified using text and image data which reflects improved accuracy compared with other conventional techniques. A 25 benchmark datasets (UCI) from different domains are considered to test the proposed algorithms.  A comparative study between proposed hybrid classification algorithms are presented in this work. Finally, the experimental result shows that the differential evolution with NB classifier outperforms and produces better estimation of probability terms. The proposed technique in terms of computational time is also feasible

    FEASIBILITY OF B2C CUSTOMER RELATIONSHIP ANALYTICS IN THE B2B INDUSTRIAL CONTEXT

    Get PDF
    Abstract The purpose of the paper is to evaluate the feasibility of business-to-consumer (B2C) customer relationship analytics in the industrial business-to-business (B2B) context, in particular spare part sales. The contribution of the paper is twofold; the article identifies analytics approaches with value potential for B2B decision-making, and illustrates their value in use. The identified analytics approaches, customer segmentation, market basket analysis and target customer selection, are common in the B2C marketing and e-commerce. However, in the industrial B2B marketing, the application of these approaches is not yet common.. The different kinds of analytics under examination in this paper use machine learning (ML) techniques. The examination takes into account the applicability and usefulness of the techniques as well as implementation challenges. The research suggests that the identified analytics may serve different business purposes and may be relatively straightforward to implement. This requires careful examination of the desired purposes of use in a particular business context. However, the continuous and real-time use of such analyses remains a challenge for further examination also in information systems research. Keywords: Business analytics, B2B decision-making, Machine learning, Data mining, Artificial intelligence, CR

    Data mining for heart failure : an investigation into the challenges in real life clinical datasets

    Get PDF
    Clinical data presents a number of challenges including missing data, class imbalance, high dimensionality and non-normal distribution. A motivation for this research is to investigate and analyse the manner in which the challenges affect the performance of algorithms. The challenges were explored with the help of a real life heart failure clinical dataset known as Hull LifeLab, obtained from a live cardiology clinic at the Hull Royal Infirmary Hospital. A Clinical Data Mining Workflow (CDMW) was designed with three intuitive stages, namely, descriptive, predictive and prescriptive. The naming of these stages reflects the nature of the analysis that is possible within each stage; therefore a number of different algorithms are employed. Most algorithms require the data to be distributed in a normal manner. However, the distribution is not explicitly used within the algorithms. Approaches based on Bayes use the properties of the distributions very explicitly, and thus provides valuable insight into the nature of the data.The first stage of the analysis is to investigate if the assumptions made for Bayes hold, e.g. the strong independence assumption and the assumption of a Gaussian distribution. The next stage is to investigate the role of missing values. Results found that imputation does not affect the performance as much as those records which are initially complete. These records are often not outliers, but contain problem variables. A method was developed to identify these. The effect of skews in the data was also investigated within the CDMW. However, it was found that methods based on Bayes were able to handle these, albeit with a small variability in performance. The thesis provides an insight into the reasons why clinical data often causes problems. Even the issue of imbalanced classes is not an issue, for Bayes is independent of this

    Developing a high-performance soil fertility status prediction voting ensemble using brute exhaustive optimization in automated multiprecision weights of hybrid classifiers

    Get PDF
    A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Information and Communication Science and Engineering of the Nelson Mandela African Institution of Science and TechnologyWith the advent of machine learning (ML) techniques, various algorithms have been applied in previous studies to develop models for predicting soil fertility status. However, these models are observed to use varying fertility target classes, and variations have been reported in these models' predictive performances. As a result, practical applications of these models for obtaining the most accurate predictions may become hindered. While the weighted voting ensemble (WVE) ML technique can be used to improve soil fertility status prediction by aggregating individual models prediction, guaranteeing finding of an optimal WVE assignment weights is challenging. Whereas a brute exhaustive search procedure can be applied for the mentioned task, there is a lack of exploration on the exploitation of automated classifiers' precise weights combinations as search spaces for successful optimization. This research aims to develop a high-performance soil fertility status prediction voting ensemble using brute exhaustive optimization in automated 1EXP(-)Z+ multi-precision weights of hybrid classifiers. Soil chemical properties and ML modeling algorithms for modeling soil fertility status were identified. Base hybrid ML classification models for predicting soil fertility status were evaluated using Tanzania as a case study. Finally, the base ML hybrids WVE models were optimized using brute exhaustive search procedure’s novel developed search spaces generation algorithm for guaranteed optimal solution finding. The research was designed using design science research methodology, with the application of unsupervised machine learning K-mean algorithm with a knee detection method to find the optimal number of soil fertility status target classes, and supervised learning algorithms were applied to model classifiers for those optimal classes. Three soil fertility target classes were identified by clustering technique. The model achieved on test data a predictive accuracy of 98.93%, with respective AUC of 82%, 83%, and 87% for low, medium, and high soil fertility targets classes. Whereas these performances are observed higher compared to models in previous studies, 92% correct classifications were obtained on validation against external unseen laboratory-based tested soil results. Therefore, soil testing laboratories and farmers should consider using the model to smartly manage soil fertility which may lead to improved crop growth and productivity. The government could set agricultural-related policies that require the use of the model by farmers with the provision of agricultural inputs subsidies. Future work could be to develop an integrated real-time web and mobile application for providing farmers with soil fertility status information

    Elasticity mapping for breast cancer diagnosis using tactile imaging and auxiliary sensor fusion

    Get PDF
    Tactile Imaging (TI) is a technology utilising capacitive pressure sensors to image elasticity distributions within soft tissues such as the breast for cancer screening. TI aims to solve critical problems in the cancer screening pathway, particularly: low sensitivity of manual palpation, patient discomfort during X-ray mammography, and the poor quality of breast cancer referral forms between primary and secondary care facilities. TI is effective in identifying ‘non-palpable’, early-stage tumours, with basic differential ability that reduced unnecessary biopsies by 21% in repeated clinical studies. TI has its limitations, particularly: the measured hardness of a lesion is relative to the background hardness, and lesion location estimates are subjective and prone to operator error. TI can achieve more than simple visualisation of lesions and can act as an accurate differentiator and material analysis tool with further metric development and acknowledgement of error sensitivities when transferring from phantom to clinical trials. This thesis explores and develops two methods, specifically inertial measurement and IR vein imaging, for determining the breast background elasticity, and registering tactile maps for lesion localisation, based on fusion of tactile and auxiliary sensors. These sensors enhance the capabilities of TI, with background tissue elasticity determined with MAE < 4% over tissues in the range 9 kPa – 90 kPa and probe trajectory across the breast measured with an error ratio < 0.3%, independent of applied load, validated on silicone phantoms. A basic TI error model is also proposed, maintaining tactile sensor stability and accuracy with 1% settling times < 1.5s over a range of realistic operating conditions. These developments are designed to be easily implemented into commercial systems, through appropriate design, to maximise impact, providing a stable platform for accurate tissue measurements. This will allow clinical TI to further reduce benign referral rates in a cost-effective manner, by elasticity differentiation and lesion classification in future works.Tactile Imaging (TI) is a technology utilising capacitive pressure sensors to image elasticity distributions within soft tissues such as the breast for cancer screening. TI aims to solve critical problems in the cancer screening pathway, particularly: low sensitivity of manual palpation, patient discomfort during X-ray mammography, and the poor quality of breast cancer referral forms between primary and secondary care facilities. TI is effective in identifying ‘non-palpable’, early-stage tumours, with basic differential ability that reduced unnecessary biopsies by 21% in repeated clinical studies. TI has its limitations, particularly: the measured hardness of a lesion is relative to the background hardness, and lesion location estimates are subjective and prone to operator error. TI can achieve more than simple visualisation of lesions and can act as an accurate differentiator and material analysis tool with further metric development and acknowledgement of error sensitivities when transferring from phantom to clinical trials. This thesis explores and develops two methods, specifically inertial measurement and IR vein imaging, for determining the breast background elasticity, and registering tactile maps for lesion localisation, based on fusion of tactile and auxiliary sensors. These sensors enhance the capabilities of TI, with background tissue elasticity determined with MAE < 4% over tissues in the range 9 kPa – 90 kPa and probe trajectory across the breast measured with an error ratio < 0.3%, independent of applied load, validated on silicone phantoms. A basic TI error model is also proposed, maintaining tactile sensor stability and accuracy with 1% settling times < 1.5s over a range of realistic operating conditions. These developments are designed to be easily implemented into commercial systems, through appropriate design, to maximise impact, providing a stable platform for accurate tissue measurements. This will allow clinical TI to further reduce benign referral rates in a cost-effective manner, by elasticity differentiation and lesion classification in future works

    Dynamic risk assessment of process operations

    Get PDF
    Process engineering systems have become increasingly complex and more vulnerable to potential accidents. The risks posed by these systems are alarming and worrisome. The operation of these complex process engineering systems requires a high level of understanding both from the operational as well as the safety perspective. This study focuses on dynamic risk assessment and management of complex process engineering systems’ operations. To reduce risk posed by process systems, there is a need to develop process accident models capable of capturing system dynamics in real-time. This thesis presents a set of predictive process accident models developed over four years. It is prepared in manuscript style and consists of nine chapters, five of which are published in peer reviewed journals. A dynamic operational risk management tool for process systems is developed, considering evolving process conditions. The obvious advantage of the developed methodologies is that it dynamically captures the real time changes occurring in the process operations. The real time risk profile provided by the methodologies developed serve as performance indicator for operational decision making. The research has made contributions on the following topics: (a) process accident model considering dependency among contributory factors, (b) dynamic safety analysis of process systems using a nonlinear and non-sequential accident model, (c) dynamic failure analysis of process systems using principal component analysis and a Bayesian network, (d) dynamic failure analysis of process systems using a neural network and (e) an integrated approach for dynamic economic risk assessment of process systems
    corecore