Search CORE

7,617 research outputs found

Using Feature Selection with Machine Learning for Generation of Insurance Insights

Author: Cosgrave Bernard
McKeever Susan
Taha Ayman
Publication venue: Technological University Dublin
Publication date: 01/01/2022
Field of study

Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are often of poor quality with noisy subsets of data (or features). Choosing the right features of data is a significant pre-processing step in the creation of machine learning models. The inclusion of irrelevant and redundant features has been demonstrated to affect the performance of learning models. In this article, we propose a framework for improving predictive machine learning techniques in the insurance sector via the selection of relevant features. The experimental results, based on five publicly available real insurance datasets, show the importance of applying feature selection for the removal of noisy features before performing machine learning techniques, to allow the algorithm to focus on influential features. An additional business benefit is the revelation of the most and least important features in the datasets. These insights can prove useful for decision making and strategy development in areas/business problems that are not limited to the direct target of the downstream algorithms. In our experiments, machine learning techniques based on a set of selected features suggested by feature selection algorithms outperformed the full feature set for a set of real insurance datasets. Specifically, 20% and 50% of features in our five datasets had improved downstream clustering and classification performance when compared to whole datasets. This indicates the potential for feature selection in the insurance sector to both improve model performance and to highlight influential features for business insights

Arrow@TUDublin

Automated Classification for Electrophysiological Data: Machine Learning Approaches for Disease Detection and Emotion Recognition

Author: Yan Xucun
Publication venue: Faculty of Engineering, School of Electrical and Information Engineering
Publication date: 01/01/2022
Field of study

Smart healthcare is a health service system that utilizes technologies, e.g., artificial intelligence and big data, to alleviate the pressures on healthcare systems. Much recent research has focused on the automatic disease diagnosis and recognition and, typically, our research pays attention on automatic classifications for electrophysiological signals, which are measurements of the electrical activity. Specifically, for electrocardiogram (ECG) and electroencephalogram (EEG) data, we develop a series of algorithms for automatic cardiovascular disease (CVD) classification, emotion recognition and seizure detection. With the ECG signals obtained from wearable devices, the candidate developed novel signal processing and machine learning method for continuous monitoring of heart conditions. Compared to the traditional methods based on the devices at clinical settings, the developed method in this thesis is much more convenient to use. To identify arrhythmia patterns from the noisy ECG signals obtained through the wearable devices, CNN and LSTM are used, and a wavelet-based CNN is proposed to enhance the performance. An emotion recognition method with a single channel ECG is developed, where a novel exploitative and explorative GWO-SVM algorithm is proposed to achieve high performance emotion classification. The attractive part is that the proposed algorithm has the capability to learn the SVM hyperparameters automatically, and it can prevent the algorithm from falling into local solutions, thereby achieving better performance than existing algorithms. A novel EEG-signal based seizure detector is developed, where the EEG signals are transformed to the spectral-temporal domain, so that the dimension of the input features to the CNN can be significantly reduced, while the detector can still achieve superior detection performance

Sydney eScholarship

Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks

Author: Tuia Devis
Volpi Michele
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/10/2016
Field of study

Semantic labeling (or pixel-level land-cover classification) in ultra-high resolution imagery (< 10cm) requires statistical models able to learn high level concepts from spatial data, with large appearance variations. Convolutional Neural Networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper we present a CNN-based system relying on an downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including i) state-of-the-art numerical accuracy, ii) improved geometric accuracy of predictions and iii) high efficiency at inference time. We test the proposed system on the Vaihingen and Potsdam sub-decimeter resolution datasets, involving semantic labeling of aerial images of 9cm and 5cm resolution, respectively. These datasets are composed by many large and fully annotated tiles allowing an unbiased evaluation of models making use of spatial information. We do so by comparing two standard CNN architectures to the proposed one: standard patch classification, prediction of local label patches by employing only convolutions and full patch labeling by employing deconvolutions. All the systems compare favorably or outperform a state-of-the-art baseline relying on superpixels and powerful appearance descriptors. The proposed full patch labeling CNN outperforms these models by a large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201

arXiv.org e-Print Archive

Wageningen University & Research Publications

ZORA

A Contribution to land cover and land use mapping: in Portugal with multi-temporal Sentinel-2 data and supervised classification

Author: Moraes Daniel
Publication venue
Publication date: 12/03/2021
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Geographic Information Systems and ScienceRemote sensing techniques have been widely employed to map and monitor land cover and land use, important elements for the description of the environment. The current land cover and land use mapping paradigm takes advantage of a variety of data options with proper spatial, spectral and temporal resolutions along with advances in technology. This enabled the creation of automated data processing workflows integrated with classification algorithms to accurately map large areas with multi-temporal data. In Portugal, the General Directorate for Territory (DGT) is developing an operational Land Cover Monitoring System (SMOS), which includes an annual land cover cartography product (COSsim) based on an automatic process using supervised classification of multi-temporal Sentinel-2 data. In this context, a range of experiments are being conducted to improve map accuracy and classification efficiency. This study provides a contribution to DGT’s work. A classification of the biogeographic region of Trás-os-Montes in the North of Portugal was performed for the agricultural year of 2018 using Random Forest and an intra-annual multi-temporal Sentinel-2 dataset, with stratification of the study area and a combination of manually and automatically extracted training samples, with the latter being based on existing reference datasets. This classification was compared to a benchmark classification, conducted without stratification and with training data collected automatically only. In addition, an assessment of the influence of training sample size in classification accuracy was conducted. The main focus of this study was to investigate whether the use of vi classification uncertainty to create an improved training dataset could increase classification accuracy. A process of extracting additional training samples from areas of high classification uncertainty was conducted, then a new classification was performed and the results were compared. Classification accuracy assessment for all proposed experiments was conducted using the overall accuracy, precision, recall and F1-score. The use of stratification and combination of training strategies resulted in a classification accuracy of 66.7%, in contrast to 60.2% in the case of the benchmark classification. Despite the difference being considered not statistically significant, visual inspection of both maps indicated that stratification and introduction of manual training contributed to map land cover more accurately in some areas. Regarding the influence of sample size in classification accuracy, the results indicated a small difference, considered not statistically significant, in accuracy even after a reduction of over 90% in the sample size. This supports the findings of other studies which suggested that Random Forest has low sensitivity to variations in training sample size. However, the results might have been influenced by the training strategy employed, which uses spectral subclasses, thus creating spectral diversity in the samples independently of their size. With respect to the use of classification uncertainty to improve training sample, a slight increase of approximately 1% was observed, which was considered not statistically significant. This result could have been affected by limitations in the process of collecting additional sampling units for some classes, which resulted in a lack of additional training for some classes (eg. agriculture) and an overall imbalanced training dataset. Additionally, some classes had their additional training sampling units collected from a limited number of polygons, which could limit the spectral diversity of new samples. Nevertheless, visual inspection of the map suggested that the new training contributed to reduce confusion between some classes, improving map agreement with ground truth. Further investigation can be conducted to explore more deeply the potential of classification uncertainty, especially focusing on addressing problems related to the collection of the additional samples

Repositório da Universidade Nova de Lisboa

Spectral Feature Selection for Data Mining

Author: Liu Huan
Zhao Zheng Alan
Publication venue: 'Informa UK Limited'
Publication date
Field of study

This timely introduction to spectral feature selection illustrates the potential of this powerful dimensionality reduction technique in high-dimensional data processing. It presents the theoretical foundations of spectral feature selection, its connections to other algorithms, and its use in handling both large-scale data sets and small sample problems. Readers learn how to use spectral feature selection to solve challenging problems in real-life applications and discover how general feature selection and extraction are connected to spectral feature selection. Source code for the algorithms is available online

OAPEN Library