Search CORE

2,320 research outputs found

Data-based fault detection in chemical processes: Managing records with operator intervention and uncertain labels

Author: Askarian Mahdieh
Benítez Iglesias Raúl
Graells Sobré Moisès
Zarghami Reza
Publication venue: 'Elsevier BV'
Publication date: 23/06/2016
Field of study

Developing data-driven fault detection systems for chemical plants requires managing uncertain data labels and dynamic attributes due to operator-process interactions. Mislabeled data is a known problem in computer science that has received scarce attention from the process systems community. This work introduces and examines the effects of operator actions in records and labels, and the consequences in the development of detection models. Using a state space model, this work proposes an iterative relabeling scheme for retraining classifiers that continuously refines dynamic attributes and labels. Three case studies are presented: a reactor as a motivating example, flooding in a simulated de-Butanizer column, as a complex case, and foaming in an absorber as an industrial challenge. For the first case, detection accuracy is shown to increase by 14% while operating costs are reduced by 20%. Moreover, regarding the de-Butanizer column, the performance of the proposed strategy is shown to be 10% higher than the filtering strategy. Promising results are finally reported in regard of efficient strategies to deal with the presented problemPeer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Unlocking biomarker discovery: Large scale application of aptamer proteomic technology for early detection of lung cancer

Author: Alex Stewart
Dom Zichi
Edward N. Brody
Harvey I. Pass
Jeffrey J. Walker
Jill M. Siegfried
Joel L. Weissfeld
Larry Gold
Mike Mehan
Rachel M. Ostroff
Stephen Williams
Wilbur Franklin
William L. Bigbee
William N. Rom
York E. Miller
Publication venue
Publication date: 13/06/2010
Field of study

Lung cancer is the leading cause of cancer deaths, because ~84% of cases are diagnosed at an advanced stage. Worldwide in 2008, ~1.5 million people were diagnosed and ~1.3 million died – a survival rate unchanged since 1960. However, patients diagnosed at an early stage and have surgery experience an 86% overall 5-year survival. New diagnostics are therefore needed to identify lung cancer at this stage. Here we present the first large scale clinical use of aptamers to discover blood protein biomarkers in disease with our breakthrough proteomic technology. This multi-center case-control study was conducted in archived samples from 1,326 subjects from four independent studies of non-small cell lung cancer (NSCLC) in long-term tobacco-exposed populations. We measured >800 proteins in 15uL of serum, identified 44 candidate biomarkers, and developed a 12-protein panel that distinguished NSCLC from controls with 91% sensitivity and 84% specificity in a training set and 89% sensitivity and 83% specificity in a blinded, independent verification set. Performance was similar for early and late stage NSCLC. This is a significant advance in proteomics in an area of high clinical need

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

Nature Precedings

Cost-sensitive Bayesian network learning using sampling

Author: G.-Z. Ma
G.E.A.P.A. Batista
J.H. Friedman
S. Vadera
S.. Vadera
V.S. Sheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

A significant advance in recent years has been the development of cost-sensitive decision tree learners, recognising that real world classification problems need to take account of costs of misclassification and not just focus on accuracy. The literature contains well over 50 cost-sensitive decision tree induction algorithms, each with varying performance profiles. Obtaining good Bayesian networks can be challenging and hence several algorithms have been proposed for learning their structure and parameters from data. However, most of these algorithms focus on learning Bayesian networks that aim to maximise the accuracy of classifications. Hence an obvious question that arises is whether it is possible to develop cost-sensitive Bayesian networks and whether they would perform better than cost-sensitive decision trees for minimising classification cost? This paper explores this question by developing a new Bayesian network learning algorithm based on changing the data distribution to reflect the costs of misclassification. The proposed method is explored by conducting experiments on over 20 data sets. The results show that this approach produces good results in comparison to more complex cost-sensitive decision tree algorithms

University of Salford Institutional Repository

Crossref

Evaluating Classifiers\u27 Optimal Performances Over a Range of Misclassification Costs by Using Cost-Sensitive Classification

Author: Al-Saffar Ramy
Publication venue: LSU Digital Commons
Publication date: 28/03/2018
Field of study

We believe that using the classification accuracy is not enough to evaluate the performances of classification algorithms. It can be misleading due to overlooking an important element which is the cost if classification is inaccurate. Furthermore, the Receiver Operational Characteristic (ROC) is one of the most popular graphs used to evaluate classifiers performances. However, one of the biggest ROC’s shortcomings is the assumption of equal costs for all misclassified data. Therefore, our goal is to reduce the total cost of decision making by selecting the classifier that has the least total misclassification cost. Nevertheless, the exact misclassification cost is usually unknown and hard to determine. To overcome such hurdle, we classify the data against a range of error costs. Thus, we use the cost range and the operating classification threshold range to show any performance differences among classifiers

Louisiana State University

Empirical Assessment of Machine Learning Techniques for Software Requirements Risk Prediction

Author: Ahmad Arshad
Antonino Daviu José Alfonso
Dunai Larisa
Glowacz Adam
Irfan Muhammad
Muhammad Fazal
Naseem Rashid
Shah Muhammad Arif
Shaukat Zain
Sulaiman Adel
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

[EN] Software risk prediction is the most sensitive and crucial activity of Software Development Life Cycle (SDLC). It may lead to success or failure of a project. The risk should be predicted earlier to make a software project successful. A Model is proposed for the prediction of software requirement risks using requirement risk dataset and machine learning techniques. Also, a comparison is done between multiple classifiers that are K-Nearest Neighbour (KNN), Average One Dependency Estimator (A1DE), Naïve Bayes (NB), Composite Hypercube on Iterated Random Projection (CHIRP), Decision Table (DT), Decision Table/ Naïve Bayes Hybrid Classifier (DTNB), Credal Decision Trees (CDT), Cost-Sensitive Decision Forest (CS-Forest), J48 Decision Tree (J48), and Random Forest (RF) to achieve best suited technique for the model according to the nature of dataset. These techniques are evaluated using various evaluation metrics including CCI (correctly Classified Instances), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), Root Relative Squared Error (RRSE), precision, recall, F-measure, Matthew¿s Correlation Coefficient (MCC), Receiver Operating Characteristic Area (ROC area), Precision-Recall Curves area (PRC area), and accuracy. The inclusive outcome of this study shows that in terms of reducing error rates, CDT outperforms other techniques achieving 0.013 for MAE, 0.089 for RMSE, 4.498% for RAE, and 23.741% for RRSE. However, in terms of increasing accuracy, DT, DTNB and CDT achieve better results.This work was supported by by Generalitat Valenciana, Conselleria de Innovacion, Universidades, Ciencia y Sociedad Digital, (project AICO/019/224)Naseem, R.; Shaukat, Z.; Irfan, M.; Shah, MA.; Ahmad, A.; Muhammad, F.; Glowacz, A.... (2021). Empirical Assessment of Machine Learning Techniques for Software Requirements Risk Prediction. Electronics. 10(2):1-19. https://doi.org/10.3390/electronics1002016811910

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

RiuNet

A novel Big Data analytics and intelligent technique to predict driver's intent

Author: Abtahi
Adam Grzywaczewski
Agrawal
Al-Sultan
Asimov
Bernardo
Bezdek
Bhavsar
Bostrom
Chang
Chen
Dawson
De Domenico
Diaz-Cabrera
Doctor
Doctor
Dreier
Faiyaz Doctor
Filev
Froehlich
Gerhardt
Grudin
Grzywaczewski
Hashem
Hawkins
Hawkins
Haykin
Hirsch
Huang
Huang
Iqbal
Jaguar Land Rover Limited
Jain
James
Kaisler
Kapicioglu
Karyotis
Karyotis
Kotsiantis
Kumar
Kumar
Kurihata
Lech Birek
Liao
Liu
Luukka
Mahmud
Maniak
Maniak
McFarland
McInerney
Mitchell
Nasoz
Noulas
Palen
Pang
Parpinelli
Poli
Quercia
Rahat Iqbal
Rainville
Reininger
Richards
Rish
Sagiroglu
Simmons
Sun
Suthaharan
Tan
Tran
Utgoff
Victor Chang
Wang
Warren
Wells-Parker
Whitley
Zadeh
Publication venue: 'Elsevier BV'
Publication date: 06/04/2018
Field of study

Modern age offers a great potential for automatically predicting the driver's intent through the increasing miniaturization of computing technologies, rapid advancements in communication technologies and continuous connectivity of heterogeneous smart objects. Inside the cabin and engine of modern cars, dedicated computer systems need to possess the ability to exploit the wealth of information generated by heterogeneous data sources with different contextual and conceptual representations. Processing and utilizing this diverse and voluminous data, involves many challenges concerning the design of the computational technique used to perform this task. In this paper, we investigate the various data sources available in the car and the surrounding environment, which can be utilized as inputs in order to predict driver's intent and behavior. As part of investigating these potential data sources, we conducted experiments on e-calendars for a large number of employees, and have reviewed a number of available geo referencing systems. Through the results of a statistical analysis and by computing location recognition accuracy results, we explored in detail the potential utilization of calendar location data to detect the driver's intentions. In order to exploit the numerous diverse data inputs available in modern vehicles, we investigate the suitability of different Computational Intelligence (CI) techniques, and propose a novel fuzzy computational modelling methodology. Finally, we outline the impact of applying advanced CI and Big Data analytics techniques in modern vehicles on the driver and society in general, and discuss ethical and legal issues arising from the deployment of intelligent self-learning cars

University of Essex Research Repository

Crossref

Teeside University's Research Repository

Coventry University Pure Portal

Fairness and Interpretability in Machine Learning Models

Author: Weinbach Bjørn Christian
Publication venue: 'Saint Louis University'
Publication date: 01/01/2022
Field of study

Machine Learning has become more and more prominent in our daily lives as the Information Age and Fourth industrial revolution progresses. Many of these machine learning systems are evaluated in terms of how accurately they are able to predict the correct outcome that are present in existing historical datasets. In the last years we have observed how evaluating machine learning systems in this way has allowed decision making systems to treat certain groups unfairly. Some authors have proposed methods to overcome this. These methods include new metrics which incorporate measures of unfairly treating individuals based on group affiliation, probabilistic graphical models that assume dataset labels are inherently unfair and use dataset to infer the true fair labels as well as tree based methods that introduce new splitting criterions for fairness. We have evaluated these methods on datasets used in fairness research and evaluated if the results claimed by the authors are reproducible. Additionally, we have implemented new interpretability methods on top of the proposed methods to more explicitly explain their behaviour. We have found that some of the models do not achieve their claimed results and do not learn behaviour to achieve fairness while other models do achieve better predictions in terms of fairness by affirmative actions. This thesis show that machine learning interpretability and new machine learning models and approaches are necessary to achieve more fair decision making systems

NORA - Norwegian Open Research Archives

UiS Brage

Review of the machine learning methods in the classification of phishing attack

Author: Ismail Mohd Arfian
Jupin John Arthur
Kasim Shahreen
Mohamad Mohd Saberi
Stiawan Deris
Sutikno Tole
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/01/2019
Field of study

The development of computer networks today has increased rapidly. This can be seen based on the trend of computer users around the world, whereby they need to connect their computer to the Internet. This shows that the use of Internet networks is very important, whether for work purposes or access to social media accounts. However, in widely using this computer network, the privacy of computer users is in danger, especially for computer users who do not install security systems in their computer. This problem will allow hackers to hack and commit network attacks. This is very dangerous, especially for Internet users because hackers can steal confidential information such as bank login account or social media login account. The attacks that can be made include phishing attacks. The goal of this study is to review the types of phishing attacks and current methods used in preventing them. Based on the literature, the machine learning method is widely used to prevent phishing attacks. There are several algorithms that can be used in the machine learning method to prevent these attacks. This study focused on an algorithm that was thoroughly made and the methods in implementing this algorithm are discussed in detail

Bulletin of Electrical Engineering and Informatics

UMP Institutional Repository