74 research outputs found
A two-tiered 2D visual tool for assessing classifier performance
In this article, a new kind of 2D tool is proposed, namely ⟨φ δ⟩ diagrams, able to highlight most of the information deemed relevant for classifier building and assessment. In particular, accuracy, bias and break-even points are immediately evident therein. These diagrams come in two different forms: the first is aimed at representing the phenomenon under investigation in a space where the imbalance between negative and positive samples is not taken into account, the second (which is a generalization of the first) is able to visualize relevant information in a space that accounts also for the imbalance. According to a specific design choice, all properties found in the first space hold also in the second. The combined use of φ and δ can give important information to researchers involved in the activity of building intelligent systems, in particular for classifier performance assessment and feature ranking/selection
Separation of pulsar signals from noise with supervised machine learning algorithms
We evaluate the performance of four different machine learning (ML)
algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ),
Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of
pulsars from radio frequency interference (RFI) and other sources of noise,
using a dataset obtained from the post-processing of a pulsar search pi peline.
This dataset was previously used for cross-validation of the SPINN-based
machine learning engine, used for the reprocessing of HTRU-S survey data
arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique
(SMOTE) to deal with high class imbalance in the dataset. We report a variety
of quality scores from all four of these algorithms on both the non-SMOTE and
SMOTE datasets. For all the above ML methods, we report high accuracy and
G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances
using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum
Relevance approach to report algorithm-agnostic feature ranking. From these
methods, we find that the signal to noise of the folded profile to be the best
feature. We find that all the ML algorithms report FPRs about an order of
magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for
the same recall value.Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and
Computin
A Detailed Investigation into Low-Level Feature Detection in Spectrogram Images
Being the first stage of analysis within an image, low-level feature detection is a crucial step in the image analysis process and, as such, deserves suitable attention. This paper presents a systematic investigation into low-level feature detection in spectrogram images. The result of which is the identification of frequency tracks. Analysis of the literature identifies different strategies for accomplishing low-level feature detection. Nevertheless, the advantages and disadvantages of each are not explicitly investigated. Three model-based detection strategies are outlined, each extracting an increasing amount of information from the spectrogram, and, through ROC analysis, it is shown that at increasing levels of extraction the detection rates increase. Nevertheless, further investigation suggests that model-based detection has a limitation—it is not computationally feasible to fully evaluate the model of even a simple sinusoidal track. Therefore, alternative approaches, such as dimensionality reduction, are investigated to reduce the complex search space. It is shown that, if carefully selected, these techniques can approach the detection rates of model-based strategies that perform the same level of information extraction. The implementations used to derive the results presented within this paper are available online from http://stdetect.googlecode.com
Tracking fish abundance by underwater image recognition
Marine cabled video-observatories allow the non-destructive sampling of species at frequencies and durations that have never been attained before. Nevertheless, the lack of appropriate methods to automatically process video imagery limits this technology for the purposes of ecosystem monitoring. Automation is a prerequisite to deal with the huge quantities of video footage captured by cameras, which can then transform these devices into true autonomous sensors. In this study, we have developed a novel methodology that is based on genetic programming for content-based image analysis. Our aim was to capture the temporal dynamics of fish abundance. We processed more than 20,000 images that were acquired in a challenging real-world coastal scenario at the OBSEA-EMSO testing-site. The images were collected at 30-min. frequency, continuously for two years, over day and night. The highly variable environmental conditions allowed us to test the effectiveness of our approach under changing light radiation, water turbidity, background confusion, and bio-fouling growth on the camera housing. The automated recognition results were highly correlated with the manual counts and they were highly reliable when used to track fish variations at different hourly, daily, and monthly time scales. In addition, our methodology could be easily transferred to other cabled video-observatories.Peer ReviewedPostprint (published version
Multi-objective optimisation for receiver operating characteristic analysis
Copyright © 2006 Springer-Verlag Berlin Heidelberg. The final publication is available at link.springer.comBook title: Multi-Objective Machine LearningSummary
Receiver operating characteristic (ROC) analysis is now a standard tool for the comparison of binary classifiers and the selection operating parameters when the costs of misclassification are unknown.
This chapter outlines the use of evolutionary multi-objective optimisation techniques for ROC analysis, in both its traditional binary classification setting, and in the novel multi-class ROC situation.
Methods for comparing classifier performance in the multi-class case, based on an analogue of the Gini coefficient, are described, which leads to a natural method of selecting the classifier operating point. Illustrations are given concerning synthetic data and an application to Short Term Conflict Alert
Development of a lesion localisation tool to improve outcome prediction in Traumatic Brain Injury patients
Tese de mestrado integrado, Engenharia Biomédica e Biofísica (Engenharia Clínica e Instrumentação Médica) Universidade de Lisboa, Faculdade de Ciências, 2022Traumatic brain injury (TBI) is a highly heterogeneous pathology that poses severe health and socioeconomic problems on a global scale. Neuroimaging research and development has advanced its clinical care in numerous ways, as injured brains are being imaged and studied in greater detail. The size and location of TBI lesions are often necessary to accurately determine a prognosis, which is key in defining a patient-specific rehabilitation program. This dissertation aims to investigate the impact of lesion characteristics, such as volume and location, on outcome prediction in TBI patients. Lesion localisation was achieved by comparing annotated TBI lesions to a brain atlas. Furthermore, other lesion characteristics were examined across different Magnetic Resonance Imaging (MRI) sequences and scanners, with results suggesting that the use of different scanners or MRI contrasts introduced biases in said lesion characteristics. Patient outcome was predicted using four generalised linear models. Besides clinical variables, these models included lesion volume, group and location as predictors. Model comparison indicated that lesion volume could be beneficial for outcome prediction, but may be dependent on both lesion group and location. Overall, this methodology showed potential in uncovering the effect that certain lesion groups and/or locations have on patient outcome after TBI
An alternative confusion matrix implementation for PreCall
In this work, we examine literature on creating visualizations for the performance of machine learning classifiers, with our target group being users with limited machine learning experience. The underlying data is taken from Wikipedia, and more specifically ORES - Wikimedia’s service, which employs a machine learning model to score edits and articles. The interface also expands on PreCall’s implementation, and features multiple interactive components allowing the user to dynamically adjust parameters and see the immediate change in the classifier’s performance. After providing a summary of the relevant literature, we go over the ORES API and its relevant endpoints and parameters. Then, we outline the most popular ways to visualize a machine learning classifier’s performance. Following that is a thorough description of our target group,
goals, and requirements, as well as the reasoning behind each design decision. Finally, there is an overview of the design and development process and we conduct a feedback session with a machine learning expert with background in ORES, and the feedback we receive is mostly positive, with some suggestions for improvement
- …