Search CORE

1,532 research outputs found

Recommended from our members

Building more accurate decision trees with the additive tree.

Author: Diffenderfer Eric S
Eaton Eric
Friedman Jerome H
Gennatas Efstathios D
Jensen Shane T
Luna José Marcio
Simone Charles B
Solberg Timothy D
Ungar Lyle H
Valdes Gilmer
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches

eScholarship - University of California

Knowledge Extraction from Work Instructions through Text Processing and Analysis

Author: Koneru Abhiram
Publication venue: Clemson University Libraries
Publication date: 01/12/2013
Field of study

The objective of this thesis is to design, develop and implement an automated approach to support processing of historical assembly data to extract useful knowledge about assembly instructions and time studies to facilitate the development of decision support systems, for a large automotive original equipment manufacturer (OEM). At a conceptual level, this research establishes a framework for sustainable and scalable approach to extract knowledge from big data using techniques from Natural Language Processing (NLP) and Machine Learning (ML). Process sheets are text documents that contain detailed instructions to assemble a portion of the vehicle, specification of parts and tools to be used, and time study. To maintain consistency in the authorship process, assembly process sheets are required to be written in a standardized structure using controlled language. To realize this goal, 567 work instructions from 236 process sheets are parsed using Stanford parser using Natural Language Toolkit (NLTK) as a platform and a standard vocabulary consisting of 31 verbs is formed. Time study is the process of estimating assembly times from a predetermined motion time system, known as MTM, based on factors such as the activity performed by the associate, difficulty in assembling, parts and tools used, distance covered. The MTM compromises of a set of tables, constructed through statistical analysis and best-suited for batch production. These MTM tables are suggested based on the activity described in the work instruction text. The process of performing time studies for the process sheets is time consuming, labor intensive and error-prone. A set of (IF AND THEN ) rules are developed, by analyzing 1019 time study steps from 236 process sheets, that guide the user to an appropriate MTM table. These rules are computationally generated by a decision tree algorithm, J48, in WEKA, a machine learning software package. A decision support tool is developed to enable testing of the MTM mapping rules. The tool demonstrates how NLP techniques can be used to read work instructions authored in free-form text and provides MTM table suggestions to the planner. The accuracy of the MTM mapping rules is found to be 84.6%

Clemson University: TigerPrints

Meta-Learning and the Full Model Selection Problem

Author: Sun Quan
Publication venue: 'University of Waikato'
Publication date: 24/01/2014
Field of study

When working as a data analyst, one of my daily tasks is to select appropriate tools from a set of existing data analysis techniques in my toolbox, including data preprocessing, outlier detection, feature selection, learning algorithm and evaluation techniques, for a given data project. This indeed was an enjoyable job at the beginning, because to me finding patterns and valuable information from data is always fun. Things become tricky when several projects needed to be done in a relatively short time. Naturally, as a computer science graduate, I started to ask myself, "What can be automated here?"; because, intuitively, part of my work is more or less a loop that can be programmed. Literally, the loop is "choose, run, test and choose again... until some criterion/goals are met". In other words, I use my experience or knowledge about machine learning and data mining to guide and speed up the process of selecting and applying techniques in order to build a relatively good predictive model for a given dataset for some purpose. So the following questions arise: "Is it possible to design and implement a system that helps a data analyst to choose from a set of data mining tools? Or at least that provides a useful recommendation about tools that potentially save some time for a human analyst." To answer these questions, I decided to undertake a long-term study on this topic, to think, define, research, and simulate this problem before coding my dream system. This thesis presents research results, including new methods, algorithms, and theoretical and empirical analysis from two directions, both of which try to propose systematic and efficient solutions to the questions above, using different resource requirements, namely, the meta-learning-based algorithm/parameter ranking approach and the meta-heuristic search-based full-model selection approach. Some of the results have been published in research papers; thus, this thesis also serves as a coherent collection of results in a single volume

Research Commons@Waikato

Predicting Success of University Applicants Based on Subjects’ Preferences as an Extra Tool for Admission Considerations Predictive Analytics Approach

Author: Menezes Simão Caridade
Publication venue
Publication date: 11/04/2022
Field of study

Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis study uses a dataset of student performance indicators and psychological patterns associated with each individual to examine the prediction efficiency of psychological traits on academic results, more specifically grade point average (GPA). We propose building a classification machine learning model that predicts GPA performance, dividing the students into the top and bottom performers. Several features were used in the modelling, namely, student's previous performance, such as GPA, course progression (how close the student master is related to previous academic courses), and personality traits obtained by surveying 319 students and recent graduates with a quiz developed by Association Better Future based on the RIASEC model for type theory of personality. It is widely accepted that psychological characteristics can impact student churn and performance (Costa and McCrae, 1992). Furthermore, numerous papers have found that GPA can be predicted by multiple factors, including past performance, intelligence coefficient (IQ), demographic background, previous area of studies, but, to increase the model’s accuracy, psychological factors are recommended for future works (Abele and Spurk, 2009). Whilst past performance and, to a lesser extent, IQ are currently evaluated in university admissions, psychological traits are yet to have a place in selecting the best candidates. In this study we propose that, although IQ and past performance are good indicators of student performance, the predictive power of psychological traits, when combined with these classical indicators, increases the predictability accuracy of the machine learning model. With this in mind, we used the performance of past and current university students, measured in GPA, analysed it against the collected psychological indicators and developed multiple machine learning models to predict the student GPA based on the collected indicators. These were divided into 3 groups: psychological traits only, GPA and age only, and a combination of both. Four types of models were used: neural networks, Support Vector Machines (SVM), decision forests and decision trees. Decision forests, for the problem at hand, consistently outperformed neural networks, SVM and decision trees both in accuracy and Area Under the Curve (AUC), the curve being the Receiver Operating Characteristic (ROC). From the database with 176 entries, comparing the models created with the GPA and age-based dataset with the ones based on the full dataset that includes psychological variables, decision forests were the model with higher fitness to the training model, and with the higher AUC against the validation set, with values of 0.717 and 0.790, respectively. The models based on the full dataset, including psychological variables, consistently outperformed the models based solely on the classical GPA predicting metrics. We further propose and discuss that the model can be used as an extra indicator for the admission process

Repositório da Universidade Nova de Lisboa

Modelling atmospheric ozone concentration using machine learning algorithms

Author: Eman S. Al-Abri (7169582)
Publication venue
Publication date: 01/01/2016
Field of study

Air quality monitoring is one of several important tasks carried out in the area of environmental science and engineering. Accordingly, the development of air quality predictive models can be very useful as such models can provide early warnings of pollution levels increasing to unsatisfactory levels. The literature review conducted within the research context of this thesis revealed that only a limited number of widely used machine learning algorithms have been employed for the modelling of the concentrations of atmospheric gases such as ozone, nitrogen oxides etc. Despite this observation the research and technology area of machine learning has recently advanced significantly with the introduction of ensemble learning techniques, convolutional and deep neural networks etc. Given these observations the research presented in this thesis aims to investigate the effective use of ensemble learning algorithms with optimised algorithmic settings and the appropriate choice of base layer algorithms to create effective and efficient models for the prediction and forecasting of specifically, ground level ozone (O3). Three main research contributions have been made by this thesis in the application area of modelling O3 concentrations. As the first contribution, the performance of several ensemble learning (Homogeneous and Heterogonous) algorithms were investigated and compared with all popular and widely used single base learning algorithms. The results have showed impressive prediction performance improvement obtainable by using meta learning (Bagging, Stacking, and Voting) algorithms. The performances of the three investigated meta learning algorithms were similar in nature giving an average 0.91 correlation coefficient, in prediction accuracy. Thus as a second contribution, the effective use of feature selection and parameter based optimisation was carried out in conjunction with the application of Multilayer Perceptron, Support Vector Machines, Random Forest and Bagging based learning techniques providing significant improvements in prediction accuracy. The third contribution of research presented in this thesis includes the univariate and multivariate forecasting of ozone concentrations based of optimised Ensemble Learning algorithms. The results reported supersedes the accuracy levels reported in forecasting Ozone concentration variations based on widely used, single base learning algorithms. In summary the research conducted within this thesis bridges an existing research gap in big data analytics related to environment pollution modelling, prediction and forecasting where present research is largely limited to using standard learning algorithms such as Artificial Neural Networks and Support Vector Machines often available within popular commercial software packages

Loughborough University Institutional Repository

Modelling affect for horror soundscapes

Author: Liapis Antonios
Lopes Phil
Yannakakis Georgios N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The feeling of horror within movies or games relies on the audience’s perception of a tense atmosphere — often achieved through sound accompanied by the on-screen drama — guiding its emotional experience throughout the scene or game-play sequence. These progressions are often crafted through an a priori knowledge of how a scene or game-play sequence will playout, and the intended emotional patterns a game director wants to transmit. The appropriate design of sound becomes even more challenging once the scenery and the general context is autonomously generated by an algorithm. Towards realizing sound-based affective interaction in games this paper explores the creation of computational models capable of ranking short audio pieces based on crowdsourced annotations of tension, arousal and valence. Affect models are trained via preference learning on over a thousand annotations with the use of support vector machines, whose inputs are low-level features extracted from the audio assets of a comprehensive sound library. The models constructed in this work are able to predict the tension, arousal and valence elicited by sound, respectively, with an accuracy of approximately 65%, 66% and 72%.peer-reviewe

OAR@UM

Classification in sparse, high dimensional environments applied to distributed systems failure prediction

Author: A.S. Tanenbaum
B. Schroeder
F. Salfner
G. King
H. Zou
M. Gallet
N. Trendafilov
W. Ahmed
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Network failures are still one of the main causes of distributed systems’ lack of reliability. To overcome this problem we present an improvement over a failure prediction system, based on Elastic Net Logistic Regression and the application of rare events prediction techniques, able to work with sparse, high dimensional datasets. Specifically, we prove its stability, fine tune its hyperparameter and improve its industrial utility by showing that, with a slight change in dataset creation, it can also predict the location of a failure, a key asset when trying to take a proactive approach to failure management

Crossref

Archivo Digital UPM