Search CORE

1,988 research outputs found

EEG sleep stages identification based on weighted undirected complex networks

Author: Abdulla Shahab
Diykh Mohammed
Li Yan
Publication venue: 'Elsevier BV'
Publication date: 01/02/2020
Field of study

Sleep scoring is important in sleep research because any errors in the scoring of the patient's sleep electroencephalography (EEG) recordings can cause serious problems such as incorrect diagnosis, medication errors, and misinterpretations of patient's EEG recordings. The aim of this research is to develop a new automatic method for EEG sleep stages classification based on a statistical model and weighted brain networks. Methods each EEG segment is partitioned into a number of blocks using a sliding window technique. A set of statistical features are extracted from each block. As a result, a vector of features is obtained to represent each EEG segment. Then, the vector of features is mapped into a weighted undirected network. Different structural and spectral attributes of the networks are extracted and forwarded to a least square support vector machine (LS-SVM) classifier. At the same time the network's attributes are also thoroughly investigated. It is found that the network's characteristics vary with their sleep stages. Each sleep stage is best represented using the key features of their networks. Results In this paper, the proposed method is evaluated using two datasets acquired from different channels of EEG (Pz-Oz and C3-A2) according to the R&K and the AASM without pre-processing the original EEG data. The obtained results by the LS-SVM are compared with those by Naïve, k-nearest and a multi-class-SVM. The proposed method is also compared with other benchmark sleep stages classification methods. The comparison results demonstrate that the proposed method has an advantage in scoring sleep stages based on single channel EEG signals. Conclusions An average accuracy of 96.74% is obtained with the C3-A2 channel according to the AASM standard, and 96% with the Pz-Oz channel based on the R&K standard

University of Southern Queensland ePrints

An analysis of feature relevance in the classification of astronomical transients with machine learning methods

Author: Brescia Massimo
Cavuoti Stefano
D'Isanto Antonio
Djorgovski Stanislav G.
Donalek Ciro
Longo Giuseppe
Riccio Giuseppe
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

The exploitation of present and future synoptic (multi-band and multi-epoch) surveys requires an extensive use of automatic methods for data processing and data interpretation. In this work, using data extracted from the Catalina Real Time Transient Survey (CRTS), we investigate the classification performance of some well tested methods: Random Forest, MLPQNA (Multi Layer Perceptron with Quasi Newton Algorithm) and K-Nearest Neighbors, paying special attention to the feature selection phase. In order to do so, several classification experiments were performed. Namely: identification of cataclysmic variables, separation between galactic and extra-galactic objects and identification of supernovae.Comment: Accepted by MNRAS, 11 figures, 18 page

arXiv.org e-Print Archive

Archivio della ricerca - Università degli studi di Napoli Federico II

OA@INAF - Istituto Nazionale di Astrofisica

Caltech Authors

Threshold Choice Methods: the Missing Link

Author: Ferri Cèsar
Flach Peter
Hernández-Orallo José
Publication venue
Publication date: 12/12/2011
Field of study

Many performance metrics have been introduced for the evaluation of classification performance, with different origins and niches of application: accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the absolute error, and the Brier score (with its decomposition into refinement and calibration). One way of understanding the relation among some of these metrics is the use of variable operating conditions (either in the form of misclassification costs or class proportions). Thus, a metric may correspond to some expected loss over a range of operating conditions. One dimension for the analysis has been precisely the distribution we take for this range of operating conditions, leading to some important connections in the area of proper scoring rules. However, we show that there is another dimension which has not received attention in the analysis of performance metrics. This new dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among others. By calculating the loss of these methods for a uniform range of operating conditions we get the 0-1 loss, the absolute error, the Brier score (mean squared error), the AUC and the refinement loss respectively. This provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, namely: take a model, apply several threshold choice methods consistent with the information which is (and will be) available about the operating condition, and compare their expected losses. In order to assist in this procedure we also derive several connections between the aforementioned performance metrics, and we highlight the role of calibration in choosing the threshold choice method

arXiv.org e-Print Archive

CiteSeerX

Explore Bristol Research

A machine learning approach to pedestrian detection for autonomous vehicles using High-Definition 3D Range Data

Author: Alonso Cáceres Diego
Borraz Morón Raúl
Fernández Andrés José Carlos
Navarro Lorente Pedro Javier
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

This article describes an automated sensor-based system to detect pedestrians in an autonomous vehicle application. Although the vehicle is equipped with a broad set of sensors, the article focuses on the processing of the information generated by a Velodyne HDL-64E LIDAR sensor. The cloud of points generated by the sensor (more than 1 million points per revolution) is processed to detect pedestrians, by selecting cubic shapes and applying machine vision and machine learning algorithms to the XY, XZ, and YZ projections of the points contained in the cube. The work relates an exhaustive analysis of the performance of three different machine learning algorithms: k-Nearest Neighbours (kNN), Naïve Bayes classifier (NBC), and Support Vector Machine (SVM). These algorithms have been trained with 1931 samples. The final performance of the method, measured a real traffic scenery, which contained 16 pedestrians and 469 samples of non-pedestrians, shows sensitivity (81.2%), accuracy (96.2%) and specificity (96.8%).This work was partially supported by ViSelTR (ref. TIN2012-39279) and cDrone (ref. TIN2013-45920-R) projects of the Spanish Government, and the “Research Programme for Groups of Scientific Excellence at Region of Murcia” of the Seneca Foundation (Agency for Science and Technology of the Region of Murcia—19895/GERM/15). 3D LIDAR has been funded by UPCA13-3E-1929 infrastructure projects of the Spanish Government. Diego Alonso wishes to thank the Spanish Ministerio de Educación, Cultura y Deporte, Subprograma Estatal de Movilidad, Plan Estatal de Investigación Científica y Técnica y de Innovación 2013–2016 for grant CAS14/00238

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Digital de la Universidad Politécnica de Cartagena

Setting decision thresholds when operating conditions are uncertain

Author: Ferri Ramírez César
Flach Peter
Hernández-Orallo José
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2019
Field of study

[EN] The quality of the decisions made by a machine learning model depends on the data and the operating conditions during deployment. Often, operating conditions such as class distribution and misclassification costs have changed during the time since the model was trained and evaluated. When deploying a binary classifier that outputs scores, once we know the new class distribution and the new cost ratio between false positives and false negatives, there are several methods in the literature to help us choose an appropriate threshold for the classifier's scores. However, on many occasions, the information that we have about this operating condition is uncertain. Previous work has considered ranges or distributions of operating conditions during deployment, with expected costs being calculated for ranges or intervals, but still the decision for each point is made as if the operating condition were certain. The implications of this assumption have received limited attention: a threshold choice that is best suited without uncertainty may be suboptimal under uncertainty. In this paper we analyse the effect of operating condition uncertainty on the expected loss for different threshold choice methods, both theoretically and experimentally. We model uncertainty as a second conditional distribution over the actual operation condition and study it theoretically in such a way that minimum and maximum uncertainty are both seen as special cases of this general formulation. This is complemented by a thorough experimental analysis investigating how different learning algorithms behave for a range of datasets according to the threshold choice method and the uncertainty level.We thank the anonymous reviewers for their comments, which have helped to improve this paper significantly. This work has been partially supported by the EU (FEDER) and the Spanish MINECO under Grant TIN 2015-69175-C4-1-R and by Generalitat Valenciana under Grant PROMETEOII/2015/013. Jose Hernandez-Orallo was supported by a Salvador de Madariaga Grant (PRX17/00467) from the Spanish MECD for a research stay at the Leverhulme Centre for the Future of Intelligence (CFI), Cambridge, a BEST Grant (BEST/2017/045) from Generalitat Valenciana for another research stay also at the CFI and an FLI Grant RFP2-152.Ferri Ramírez, C.; Hernández-Orallo, J.; Flach, P. (2019). Setting decision thresholds when operating conditions are uncertain. Data Mining and Knowledge Discovery. 33(4):805-847. https://doi.org/10.1007/s10618-019-00613-7S805847334Adams N, Hand D (1999) Comparing classifiers when the misallocation costs are uncertain. Pattern Recognit 32(7):1139–1147Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585Bishop C (2011) Embracing uncertainty: applied machine learning comes of age. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 4Brier GW (1950) Verification of forecasts expressed in terms of probability. Monthly Weather Rev 78(1):1–3Dalton LA (2016) Optimal ROC-based classification and performance analysis under Bayesian uncertainty models. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 13(4):719–729de Melo C, Eduardo C, Bastos Cavalcante Prudencio R (2014) Cost-sensitive measures of algorithm similarity for meta-learning. In: 2014 Brazilian conference on intelligent systems (BRACIS). IEEE, pp 7–12Dou H, Yang X, Song X, Yu H, Wu WZ, Yang J (2016) Decision-theoretic rough set: a multicost strategy. Knowl-Based Syst 91:71–83Drummond C, Holte RC (2000) Explicitly representing expected cost: an alternative to roc representation. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, KDD ’00, pp 198–207Drummond C, Holte RC (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95–130Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on artificial intelligence, vol 2. Morgan Kaufmann Publishers Inc., IJCAI’01, pp 973–978Fawcett T (2003) In vivo spam filtering: a challenge problem for KDD. ACM SIGKDD Explor. Newsl. 5(2):140–148Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874Fawcett T, Niculescu-Mizil A (2007) PAV and the ROC convex hull. Mach Learn 68(1):97–106Ferri C, Flach PA, Hernández-Orallo J (2017) R code for threshold choice methods with context uncertainty. https://github.com/ceferra/ThresholdChoiceMethods/tree/master/UncertaintyFlach P (2004) The many faces of ROC analysis in machine learning. In: Proceedings of the twenty-first international conference on tutorial, machine learning (ICML 2004)Flach P (2014) Classification in context: adapting to changes in class and cost distribution. In: First international workshop on learning over multiple contexts at European conference on machine learning and principles and practice of knowledge discovery in databases ECML-PKDD’2014Flach P, Matsubara ET (2007) A simple lexicographic ranker and probability estimator. In: 18th European conference on machine learning, ECML2007. Springer, pp 575–582Flach P, Hernández-Orallo J, Ferri C (2011) A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th international conference on machine learning, ICML2011Guzella TS, Caminhas WM (2009) A review of machine learning approaches to spam filtering. Expert Syst Appl 36(7):10206–10222Hand D (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123Hernández-Orallo J, Flach P, Ferri C (2011) Brier curves: a new cost-based visualisation of classifier performance. In: Proceedings of the 28th international conference on machine learning, ICML2011Hernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813–2869Hernández-Orallo J, Flach P, Ferri C (2013) ROC curves in cost space. Mach Learn 93(1):71–91Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225–232Huang Y (2015) Dynamic cost-sensitive naive bayes classification for uncertain data. Int J Database Theory Appl 8(1):271–280Johnson RA, Raeder T, Chawla NV (2015) Optimizing classifiers for hypothetical scenarios. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 264–276Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/mlLiu M, Zhang Y, Zhang X, Wang Y (2011) Cost-sensitive decision tree for uncertain data. In: Advanced data mining and applications. Springer, Berlin, pp 243–255Liu XY, Zhou ZH (2010) Learning with cost intervals. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 403–412Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231Provost FJ, Fawcett T et al (1997) Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. KDD 97:43–48Qin B, Xia Y, Li F (2009) DTU: a decision tree for uncertain data. In: Advances in knowledge discovery and data mining. Springer, Berlin, pp 4–15Ren J, Lee SD, Chen X, Kao B, Cheng R, Cheung D (2009) Naive Bayes classification of uncertain data. In: Ninth IEEE international conference on data mining, 2009. ICDM’09. IEEE, pp 944–949Ridzuan F, Potdar V, Talevski A (2010) Factors involved in estimating cost of email spam. In: Taniar D, Gervasi O, Murgante B, Pardede E, Apduhan BO (eds) Computational science and its applications—ICCSA 2010. Springer, Berlin, pp 383–399Sakkis G, Androutsopoulos I, Paliouras G, Karkaletsis V, Spyropoulos CD, Stamatopoulos P (2003) A memory-based approach to anti-spam filtering for mailing lists. Inf Retr 6(1):49–73Tsang S, Kao B, Yip KY, Ho WS, Lee SD (2011) Decision trees for uncertain data. IEEE Trans Knowl Data Eng 23(1):64–78Wang R, Tang K (2012) Minimax classifier for uncertain costs. arXiv preprint arXiv:1205.0406Zadrozny B, Elkan C (2001a) Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 204–213Zadrozny B, Elkan C (2001b) Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001), pp 609–61

RiuNet

Explore Bristol Research

Time series classification for the prediction of dialysis in critically ill patients using echo state networks

Author: Benoit Dominique
De Turck Filip
Decruyenaere Johan
Dhaene Tom
Ongenae Femke
Schrauwen Benjamin
Van Looy Stijn
Verplancke Thierry
Verstraeten David
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography

Recommended from our members

Predicting business failure using artificial intelligence system

Author: Allozi Yaser
Publication venue: Brunel University London
Publication date: 01/01/2021
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonPredicting business insolvency is considered one of the main supportive sources of information for decision making for financial institutions, investors, creditors, and other participants in the business market. Financial reporting systems provide relevant information that can be used to assess the financial position of firms. It is crucial to have classification and prediction models that can analyse this financial information and provide accurate assurance for users about business health. Recent studies have explored the use of machine learning tools as substitute for traditional statistical methods to develop classification models to classify firm insolvency according to financial statement information. However, these models have no ideal classifier, since each provides a certain percentage of wrong outputs, which is a crucial consideration; every percentage of wrong response can mean massive financial losses for stakeholders. Therefore, this study proposes new insolvency classification and perdition models based on machine learning modelling techniques to develop an improved classifier. Individual modelling techniques using statistical methods and machine learning were used to develop the classification model of business insolvency. The results showed that machine learning method outperformed statistical methods. Deep Learning (DPL) achieved the highest performance based on all performance measurements used in the study, and it was the best individual classifier, with average accuracy of 97.2% using all-years dataset. Ensemble- Boosted Decision Tree classifier ranked second, followed by Decision Tree classifier. Thus, it has been proven that DPL modelling approach is useful for business insolvency classification. A key contribution in enhancing individual classifier outputs is the use of traditional combining methods with two new aggregation methods in business insolvency (Fuzzy Logic and Consensus Approach). The Consensus Approach showed the best improvement in the results of all individual classifiers with average accuracy of 97.7%, and it is considered the best classification method not only in comparison with individual classifiers, but also with traditional combiners. This study pioneers the development of a time series business insolvency prediction model with Big Data for UK businesses. The aim of the model is to provide early prediction about a business health. Three prediction models were developed based on Nonlinear Autoregressive with Exogenous Input models (NARX), Nonlinear Autoregressive Neural Network (NAR), and Deep Learning Time-series model (DPL-SA) and achieved average accuracy rates of 83.6%, 89.5%, and 91.35%, respectively. The results show relatively high performance in comparison with the best individual classifier (deep learning)

Brunel University Research Archive

Automatic Identification of Assumptions from the Hibernate Developer Mailing List

Author: Chatzigeorgiou Alexander
Digkas Georgios
Li Ruiyin
Liang Peng
Xiong Zhuang
Yang Chen
Publication venue: IEEE Computer Society
Publication date: 01/12/2019
Field of study

During the software development life cycle, assumptions are an important type of software development knowledge that can be extracted from textual artifacts. Analyzing assumptions can help to, for example, comprehend software design and further facilitate software maintenance. Manual identification of assumptions by stakeholders is rather time-consuming, especially when analyzing a large dataset of textual artifacts. To address this problem, one promising way is to use automatic techniques for assumption identification. In this study, we conducted an experiment to evaluate the performance of existing machine learning classification algorithms for automatic assumption identification, through a dataset extracted from the Hibernate developer mailing list. The dataset is composed of 400 'Assumption' sentences and 400 'Non-Assumption' sentences. Seven classifiers using different machine learning algorithms were selected and evaluated. The experiment results show that the SVM algorithm achieved the best performance (with a precision of 0.829, a recall of 0.812, and an F1-score of 0.819). Additionally, according to the ROC curves and related AUC values, the SVM-based classifier comparatively performed better than other classifiers for the binary classification of assumptions.</p

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen