1,082 research outputs found

    On pruning and feature engineering in Random Forests.

    Get PDF
    Random Forest (RF) is an ensemble classification technique that was developed by Leo Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for optimizing RF further by enhancing and improving its performance accuracy. This explains why there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. The main focus of this dissertation is to develop new extensions of RF using new optimization techniques that, to the best of our knowledge, have never been used before to optimize RF. These techniques are clustering, the local outlier factor, diversified weighted subspaces, and replicator dynamics. Applying these techniques on RF produced four extensions which we have termed CLUB-DRF, LOFB-DRF, DSB-RF, and RDB-DR respectively. Experimental studies on 15 real datasets showed favorable results, demonstrating the potential of the proposed methods. Performance-wise, CLUB-DRF is ranked first in terms of accuracy and classifcation speed making it ideal for real-time applications, and for machines/devices with limited memory and processing power

    A Primer on Bayesian Neural Networks: Review and Debates

    Full text link
    Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.Comment: 65 page

    Ensemble missing data techniques for software effort prediction

    Get PDF
    Constructing an accurate effort prediction model is a challenge in software engineering. The development and validation of models that are used for prediction tasks require good quality data. Unfortunately, software engineering datasets tend to suffer from the incompleteness which could result to inaccurate decision making and project management and implementation. Recently, the use of machine learning algorithms has proven to be of great practical value in solving a variety of software engineering problems including software prediction, including the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper proposes a method for improving software effort prediction accuracy produced by a decision tree learning algorithm and by generating the ensemble using two imputation methods as elements. Benchmarking results on ten industrial datasets show that the proposed ensemble strategy has the potential to improve prediction accuracy compared to an individual imputation method, especially if multiple imputation is a component of the ensemble

    Machine learning in nondestructive estimation of neutron-induced reactor pressure vessel embrittlement

    Get PDF
    Several nuclear power plants in the European Union are approaching the ends of their originally planned lifetimes. Extensions to the lifetimes are made to secure the supply of nuclear power in the coming decades. To ensure the safe long-term operation of a nuclear power plant, the neutron-induced embrittlement of the reactor pressure vessel (RPV) must be assessed periodically. The embrittlement of RPV steel alloys is determined by measuring the ductile-to-brittle transition temperature (DBTT) and upper-shelf energy (USE) of the material. Traditionally, a destructive Charpy impact test is used to determine the DBTT and USE. This thesis contributes to the NOMAD project. The goal of the NOMAD project is to develop a tool that uses nondestructively measured parameters to estimate the DBTT and USE of RPV steel alloys. The NOMAD Database combines data measured using six nondestructive methods with destructively measured DBTT and USE data. Several non-irradiated and irradiated samples made out of four different steel alloys have been measured. As nondestructively measured parameters do not directly describe material embrittlement, their relationship with the DBTT and USE needs to be determined. A machine learning regression algorithm can be used to build a model that describes the relationship. In this thesis, six models are built using six different algorithms, and their use is studied in predicting the DBTT and USE based on the nondestructively measured parameters in the NOMAD Database. The models estimate the embrittlement with sufficient accuracy. All models predict the DBTT and USE based on unseen input data with mean absolute errors of approximately 20 °C and 10 J, respectively. Two of the models can be used to evaluate the importance of the nondestructively measured parameters. In the future, machine learning algorithms could be used to build a tool that uses nondestructively measured parameters to estimate the neutron-induced embrittlement of RPVs on site

    THE INTERPLAY BETWEEN PRIVACY AND FAIRNESS IN LEARNING AND DECISION MAKING PROBLEMS

    Get PDF
    The availability of large datasets and computational resources has driven significant progress in Artificial Intelligence (AI) and, especially,Machine Learning (ML). These advances have rendered AI systems instrumental for many decision making and policy operations involving individuals: they include assistance in legal decisions, lending, and hiring, as well determinations of resources and benefits, all of which have profound social and economic impacts. While data-driven systems have been successful in an increasing number of tasks, the use of rich datasets, combined with the adoption of black-box algorithms, has sparked concerns about how these systems operate. How much information these systems leak about the individuals whose data is used as input and how they handle biases and fairness issues are two of these critical concerns. While some people argue that privacy and fairness are in alignment, the majority instead believe these are two contrasting metrics. This thesis firstly studies the interaction between privacy and fairness in machine learning and decision problems. It focuses on the scenario when fairness and privacy are at odds and investigates different factors that can explain for such behaviors. It then proposes effective and efficient mitigation solutions to improve fairness under privacy constraints. In the second part, it analyzes the connection between fairness and other machine learning concepts such as model compression and adversarial robustness. Finally, it introduces a novel privacy concept and an initial implementation to protect such proposed users privacy at inference time
    corecore