Search CORE

143 research outputs found

Leveraging Metalearning for Bagging Classifiers

Author: Fábio Hernâni dos Santos Costa Pinto
Publication venue
Publication date: 14/06/2018
Field of study

Repositório Aberto da Universidade do Porto

Fair Pricing in the Telecommunications Sector

Author: Ana Beatriz Simões Pereira Querido
Publication venue
Publication date: 14/11/2022
Field of study

Repositório Aberto da Universidade do Porto

Recommended from our members

Evolutionary computation-based feature selection for finding a stable set of features in high-dimensional data

Author: Salesi Mousaabadi S
Publication venue
Publication date: 01/09/2019
Field of study

Evolutionary Computation (EC) algorithms have proved to work well for feature selection because they are powerful search techniques and can produce multiple good solutions. However, they suﬀer from some limitations for real world applications. Firstly, ECs require high computation time as they evaluate many solutions at each iteration. Secondly, a classiﬁer is usually used as their ﬁtness function which causes the selected subset to perform well only on the utilised classiﬁer (e.g. classiﬁer-bias). Lastly, ECs, as stochastic search methods, return a diﬀerent ﬁnal subset in diﬀerent runs which poses a problem for ﬁnding a stable set of features (e.g. stability issue). To address computation time and classiﬁer-bias limitations, this thesis proposes a new two-stage selection approach called ﬁlter/ﬁlter in which two ﬁlter feature selection algorithms are combined. In the ﬁrst stage, a ranking algorithm forms a reduced dataset by selecting the most informative features from the original dataset. In the second stage, the reduced dataset is fed to a novel EC algorithm to select ﬁnal feature subset. This new EC algorithm is a Tabu search hybridised with an Asexual Genetic Algorithm called TAGA. TAGA beneﬁts from new search components and solution representation which can eﬀectively reduce computation time. To select a classiﬁer-unbiased ﬁnal subset, a statistical criterion is used as the ﬁtness function which evaluates the subset independent of any classiﬁer. Experiments show that the proposed ﬁlter/ﬁlter requires an acceptable computation time and selects more classiﬁer-unbiased features compared to the state-of-the-arts. To ﬁnd a stable set of features, a novel Generalisation Power Index (GPI) is proposed to analyse the generalisation power of ﬁnal subsets of an EC in several runs. Generalisation power refers to performance capability of a subset over wide range of classiﬁers. Computation results conﬁrm that GPI is able to ﬁnd a stable set of features which achieves near optimal accuracy when used to train various classiﬁers. To ex amine the suitability of the proposed methods for real-world applications, the ﬁlter/ﬁlter approach and GPI are integrated to select a stable set of features for METABRIC breast cancer subtype classiﬁcation problem. Experimental results show that this integration not only can address the limitations of ECs for a real-world biomedical feature selection problem but it performs better than alternatives methods

Nottingham Trent Institutional Repository (IRep)

Recommender systems in industrial contexts

Author: Meyer Frank
Publication venue
Publication date: 25/01/2012
Field of study

This thesis consists of four parts: - An analysis of the core functions and the prerequisites for recommender systems in an industrial context: we identify four core functions for recommendation systems: Help do Decide, Help to Compare, Help to Explore, Help to Discover. The implementation of these functions has implications for the choices at the heart of algorithmic recommender systems. - A state of the art, which deals with the main techniques used in automated recommendation system: the two most commonly used algorithmic methods, the K-Nearest-Neighbor methods (KNN) and the fast factorization methods are detailed. The state of the art presents also purely content-based methods, hybridization techniques, and the classical performance metrics used to evaluate the recommender systems. This state of the art then gives an overview of several systems, both from academia and industry (Amazon, Google ...). - An analysis of the performances and implications of a recommendation system developed during this thesis: this system, Reperio, is a hybrid recommender engine using KNN methods. We study the performance of the KNN methods, including the impact of similarity functions used. Then we study the performance of the KNN method in critical uses cases in cold start situation. - A methodology for analyzing the performance of recommender systems in industrial context: this methodology assesses the added value of algorithmic strategies and recommendation systems according to its core functions.Comment: version 3.30, May 201

arXiv.org e-Print Archive

Theses.fr

Maximum relevancy maximum complementary based ordered aggregation for ensemble pruning

Author: A Idris
A Tsymbal
AHR Ko
AS Britto
B Bakker
B Krawczyk
B Sun
C Shen
CEA Shannon
EK Tang
F Yang
G Martinez-Muoz
G Tsch
GDC Cavalcanti
H Peng
H Ykhlef
H Zhang
H Zhou
I Mukherjee
I Partalas
IH Laradji
J Kittler
JJ Rodriguez
JL Hodges
L Guo
L Li
L Wang
LI Kuncheva
M Hall
MS Haghighi
Q Dai
Q Dai
QL Zhao
S Chernbumroong
S Holm
S Özögür-Akyüz
XC Yin
Y Zhang
Z Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Training Datasets for Machine Reading Comprehension and Their Limitations

Author: Welbl Johannes
Publication venue: UCL (University College London)
Publication date: 28/09/2020
Field of study

Neural networks are a powerful model class to learn machine Reading Comprehen- sion (RC), yet they crucially depend on the availability of suitable training datasets. In this thesis we describe methods for data collection, evaluate the performance of established models, and examine a number of model behaviours and dataset limita- tions. We first describe the creation of a data resource for the science exam QA do- main, and compare existing models on the resulting dataset. The collected ques- tions are plausible – non-experts can distinguish them from real exam questions with 55% accuracy – and using them as additional training data leads to improved model scores on real science exam questions. Second, we describe and apply a distant supervision dataset construction method for multi-hop RC across documents. We identify and mitigate several dataset assembly pitfalls – a lack of unanswerable candidates, label imbalance, and spurious correlations between documents and particular candidates – which often leave shallow predictive cues for the answer. Furthermore we demonstrate that se- lecting relevant document combinations is a critical performance bottleneck on the datasets created. We thus investigate Pseudo-Relevance Feedback, which leads to improvements compared to TF-IDF-based document combination selection both in retrieval metrics and answer accuracy. Third, we investigate model undersensitivity: model predictions do not change when given adversarially altered questions in SQUAD2.0 and NEWSQA, even though they should. We characterise affected samples, and show that the phe- nomenon is related to a lack of structurally similar but unanswerable samples during training: data augmentation reduces the adversarial error rate, e.g. from 51.7% to 20.7% for a BERT model on SQUAD2.0, and improves robustness also in other settings. Finally we explore efficient formal model verification via Interval Bound Propagation (IBP) to measure and address model undersensitivity, and show that using an IBP-derived auxiliary loss can improve verification rates, e.g. from 2.8% to 18.4% on the SNLI test set

UCL Discovery

Temporospatial Context-Aware Vehicular Crash Risk Prediction

Author: Mehrannia Pouya
Publication venue: 'University of Waterloo'
Publication date: 28/05/2020
Field of study

With the demand for more vehicles increasing, road safety is becoming a growing concern. Traffic collisions take many lives and cost billions of dollars in losses. This explains the growing interest of governments, academic institutions and companies in road safety. The vastness and availability of road accident data has provided new opportunities for gaining a better understanding of accident risk factors and for developing more effective accident prediction and prevention regimes. Much of the empirical research on road safety and accident analysis utilizes statistical models which capture limited aspects of crashes. On the other hand, data mining has recently gained interest as a reliable approach for investigating road-accident data and for providing predictive insights. While some risk factors contribute more frequently in the occurrence of a road accident, the importance of driver behavior, temporospatial factors, and real-time traffic dynamics have been underestimated. This study proposes a framework for predicting crash risk based on historical accident data. The proposed framework incorporates machine learning and data analytics techniques to identify driving patterns and other risk factors associated with potential vehicle crashes. These techniques include clustering, association rule mining, information fusion, and Bayesian networks. Swarm intelligence based association rule mining is employed to uncover the underlying relationships and dependencies in collision databases. Data segmentation methods are employed to eliminate the effect of dependent variables. Extracted rules can be used along with real-time mobility to predict crashes and their severity in real-time. The national collision database of Canada (NCDB) is used in this research to generate association rules with crash risk oriented subsequents, and to compare the performance of the swarm intelligence based approach with that of other association rule miners. Many industry-demanding datasets, including road-accident datasets, are deficient in descriptive factors. This is a significant barrier for uncovering meaningful risk factor relationships. To resolve this issue, this study proposes a knwoledgebase approximation framework to enhance the crash risk analysis by integrating pieces of evidence discovered from disparate datasets capturing different aspects of mobility. Dempster-Shafer theory is utilized as a key element of this knowledgebase approximation. This method can integrate association rules with acceptable accuracy under certain circumstances that are discussed in this thesis. The proposed framework is tested on the lymphography dataset and the road-accident database of the Great Britain. The derived insights are then used as the basis for constructing a Bayesian network that can estimate crash likelihood and risk levels so as to warn drivers and prevent accidents in real-time. This Bayesian network approach offers a way to implement a naturalistic driving analysis process for predicting traffic collision risk based on the findings from the data-driven model. A traffic incident detection and localization method is also proposed as a component of the risk analysis model. Detecting and localizing traffic incidents enables timely response to accidents and facilitates effective and efficient traffic flow management. The results obtained from the experimental work conducted on this component is indicative of the capability of our Dempster-Shafer data-fusion-based incident detection method in overcoming the challenges arising from erroneous and noisy sensor readings

University of Waterloo's Institutional Repository

Gaining Insight into Determinants of Physical Activity using Bayesian Network Learning

Author: Bemelmans R.
Bolman C.
Cao L.
Hommersom A.J.
Lechner L.
Tummers S.
Publication venue: 'Leiden University Library - OAPEN'
Publication date: 01/01/2020
Field of study

Contains fulltext : 228326pre.pdf (preprint version ) (Open Access) Contains fulltext : 228326pub.pdf (publisher's version ) (Open Access)BNAIC/BeneLearn 202

Open University of the Netherlands Research Portal

Radboud Repository

NEW, MULTI-SCALE APPROACHES TO CHARACTERIZE PATTERNS IN VEGETATION, FUELS, AND WILDFIRE

Author: Moran Christopher Jacob
Publication venue: University of Montana, Maureen and Mike Mansfield Library
Publication date: 01/01/2019
Field of study

Pattern and scale are key to understanding ecological processes. My dissertation research aims for novel quantification of vegetation, fuel, and wildfire patterns at multiple scales and to leverage these data for insights into fire processes. Core to this motivation is the 3-dimensional (3-D) characterization of forest properties from light detection and ranging (LiDAR) and structure-from-motion (SfM) photogrammetry. Analytical methods for extracting useable information currently lag the ability to collect such 3-D data. The chapters that follow focus on this limitation blending interests in machine learning and data science, remote sensing, wildland fuels (vegetation), and wildfire. In Chapter 2, forest canopy structure is characterized from multiple landscapes using LiDAR data and a novel data-driven framework to identify and compare structural classes. Motivations for this chapter include the desire to systematically assess forest structure from landscape to global scales and increase the utility of data collected by government agencies for landscape restoration planning. Chapter 3 endeavors to link 3-D canopy fuels attributes to conventional optical remote sensing data with the goal of extending the reach of laser measurements to the entire western US while exploring geographic differences in LiDAR-Landsat relationships. Development of predictive models and resulting datasets increase accuracy and spatial variation over currently used canopy fuel datasets. Chapters 4 and 5 characterize fire and fuel variability using unmanned aerial systems (UAS) and quantify trends in the influence of fuel patterns on fire processes

University of Montana