1,532 research outputs found

    Knowledge Extraction from Work Instructions through Text Processing and Analysis

    Get PDF
    The objective of this thesis is to design, develop and implement an automated approach to support processing of historical assembly data to extract useful knowledge about assembly instructions and time studies to facilitate the development of decision support systems, for a large automotive original equipment manufacturer (OEM). At a conceptual level, this research establishes a framework for sustainable and scalable approach to extract knowledge from big data using techniques from Natural Language Processing (NLP) and Machine Learning (ML). Process sheets are text documents that contain detailed instructions to assemble a portion of the vehicle, specification of parts and tools to be used, and time study. To maintain consistency in the authorship process, assembly process sheets are required to be written in a standardized structure using controlled language. To realize this goal, 567 work instructions from 236 process sheets are parsed using Stanford parser using Natural Language Toolkit (NLTK) as a platform and a standard vocabulary consisting of 31 verbs is formed. Time study is the process of estimating assembly times from a predetermined motion time system, known as MTM, based on factors such as the activity performed by the associate, difficulty in assembling, parts and tools used, distance covered. The MTM compromises of a set of tables, constructed through statistical analysis and best-suited for batch production. These MTM tables are suggested based on the activity described in the work instruction text. The process of performing time studies for the process sheets is time consuming, labor intensive and error-prone. A set of (IF AND THEN ) rules are developed, by analyzing 1019 time study steps from 236 process sheets, that guide the user to an appropriate MTM table. These rules are computationally generated by a decision tree algorithm, J48, in WEKA, a machine learning software package. A decision support tool is developed to enable testing of the MTM mapping rules. The tool demonstrates how NLP techniques can be used to read work instructions authored in free-form text and provides MTM table suggestions to the planner. The accuracy of the MTM mapping rules is found to be 84.6%

    Meta-Learning and the Full Model Selection Problem

    Get PDF
    When working as a data analyst, one of my daily tasks is to select appropriate tools from a set of existing data analysis techniques in my toolbox, including data preprocessing, outlier detection, feature selection, learning algorithm and evaluation techniques, for a given data project. This indeed was an enjoyable job at the beginning, because to me finding patterns and valuable information from data is always fun. Things become tricky when several projects needed to be done in a relatively short time. Naturally, as a computer science graduate, I started to ask myself, "What can be automated here?"; because, intuitively, part of my work is more or less a loop that can be programmed. Literally, the loop is "choose, run, test and choose again... until some criterion/goals are met". In other words, I use my experience or knowledge about machine learning and data mining to guide and speed up the process of selecting and applying techniques in order to build a relatively good predictive model for a given dataset for some purpose. So the following questions arise: "Is it possible to design and implement a system that helps a data analyst to choose from a set of data mining tools? Or at least that provides a useful recommendation about tools that potentially save some time for a human analyst." To answer these questions, I decided to undertake a long-term study on this topic, to think, define, research, and simulate this problem before coding my dream system. This thesis presents research results, including new methods, algorithms, and theoretical and empirical analysis from two directions, both of which try to propose systematic and efficient solutions to the questions above, using different resource requirements, namely, the meta-learning-based algorithm/parameter ranking approach and the meta-heuristic search-based full-model selection approach. Some of the results have been published in research papers; thus, this thesis also serves as a coherent collection of results in a single volume

    Predicting Success of University Applicants Based on Subjects’ Preferences as an Extra Tool for Admission Considerations Predictive Analytics Approach

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis study uses a dataset of student performance indicators and psychological patterns associated with each individual to examine the prediction efficiency of psychological traits on academic results, more specifically grade point average (GPA). We propose building a classification machine learning model that predicts GPA performance, dividing the students into the top and bottom performers. Several features were used in the modelling, namely, student's previous performance, such as GPA, course progression (how close the student master is related to previous academic courses), and personality traits obtained by surveying 319 students and recent graduates with a quiz developed by Association Better Future based on the RIASEC model for type theory of personality. It is widely accepted that psychological characteristics can impact student churn and performance (Costa and McCrae, 1992). Furthermore, numerous papers have found that GPA can be predicted by multiple factors, including past performance, intelligence coefficient (IQ), demographic background, previous area of studies, but, to increase the model’s accuracy, psychological factors are recommended for future works (Abele and Spurk, 2009). Whilst past performance and, to a lesser extent, IQ are currently evaluated in university admissions, psychological traits are yet to have a place in selecting the best candidates. In this study we propose that, although IQ and past performance are good indicators of student performance, the predictive power of psychological traits, when combined with these classical indicators, increases the predictability accuracy of the machine learning model. With this in mind, we used the performance of past and current university students, measured in GPA, analysed it against the collected psychological indicators and developed multiple machine learning models to predict the student GPA based on the collected indicators. These were divided into 3 groups: psychological traits only, GPA and age only, and a combination of both. Four types of models were used: neural networks, Support Vector Machines (SVM), decision forests and decision trees. Decision forests, for the problem at hand, consistently outperformed neural networks, SVM and decision trees both in accuracy and Area Under the Curve (AUC), the curve being the Receiver Operating Characteristic (ROC). From the database with 176 entries, comparing the models created with the GPA and age-based dataset with the ones based on the full dataset that includes psychological variables, decision forests were the model with higher fitness to the training model, and with the higher AUC against the validation set, with values of 0.717 and 0.790, respectively. The models based on the full dataset, including psychological variables, consistently outperformed the models based solely on the classical GPA predicting metrics. We further propose and discuss that the model can be used as an extra indicator for the admission process

    Modelling atmospheric ozone concentration using machine learning algorithms

    Get PDF
    Air quality monitoring is one of several important tasks carried out in the area of environmental science and engineering. Accordingly, the development of air quality predictive models can be very useful as such models can provide early warnings of pollution levels increasing to unsatisfactory levels. The literature review conducted within the research context of this thesis revealed that only a limited number of widely used machine learning algorithms have been employed for the modelling of the concentrations of atmospheric gases such as ozone, nitrogen oxides etc. Despite this observation the research and technology area of machine learning has recently advanced significantly with the introduction of ensemble learning techniques, convolutional and deep neural networks etc. Given these observations the research presented in this thesis aims to investigate the effective use of ensemble learning algorithms with optimised algorithmic settings and the appropriate choice of base layer algorithms to create effective and efficient models for the prediction and forecasting of specifically, ground level ozone (O3). Three main research contributions have been made by this thesis in the application area of modelling O3 concentrations. As the first contribution, the performance of several ensemble learning (Homogeneous and Heterogonous) algorithms were investigated and compared with all popular and widely used single base learning algorithms. The results have showed impressive prediction performance improvement obtainable by using meta learning (Bagging, Stacking, and Voting) algorithms. The performances of the three investigated meta learning algorithms were similar in nature giving an average 0.91 correlation coefficient, in prediction accuracy. Thus as a second contribution, the effective use of feature selection and parameter based optimisation was carried out in conjunction with the application of Multilayer Perceptron, Support Vector Machines, Random Forest and Bagging based learning techniques providing significant improvements in prediction accuracy. The third contribution of research presented in this thesis includes the univariate and multivariate forecasting of ozone concentrations based of optimised Ensemble Learning algorithms. The results reported supersedes the accuracy levels reported in forecasting Ozone concentration variations based on widely used, single base learning algorithms. In summary the research conducted within this thesis bridges an existing research gap in big data analytics related to environment pollution modelling, prediction and forecasting where present research is largely limited to using standard learning algorithms such as Artificial Neural Networks and Support Vector Machines often available within popular commercial software packages

    Modelling affect for horror soundscapes

    Get PDF
    The feeling of horror within movies or games relies on the audience’s perception of a tense atmosphere — often achieved through sound accompanied by the on-screen drama — guiding its emotional experience throughout the scene or game-play sequence. These progressions are often crafted through an a priori knowledge of how a scene or game-play sequence will playout, and the intended emotional patterns a game director wants to transmit. The appropriate design of sound becomes even more challenging once the scenery and the general context is autonomously generated by an algorithm. Towards realizing sound-based affective interaction in games this paper explores the creation of computational models capable of ranking short audio pieces based on crowdsourced annotations of tension, arousal and valence. Affect models are trained via preference learning on over a thousand annotations with the use of support vector machines, whose inputs are low-level features extracted from the audio assets of a comprehensive sound library. The models constructed in this work are able to predict the tension, arousal and valence elicited by sound, respectively, with an accuracy of approximately 65%, 66% and 72%.peer-reviewe

    Classification in sparse, high dimensional environments applied to distributed systems failure prediction

    Get PDF
    Network failures are still one of the main causes of distributed systems’ lack of reliability. To overcome this problem we present an improvement over a failure prediction system, based on Elastic Net Logistic Regression and the application of rare events prediction techniques, able to work with sparse, high dimensional datasets. Specifically, we prove its stability, fine tune its hyperparameter and improve its industrial utility by showing that, with a slight change in dataset creation, it can also predict the location of a failure, a key asset when trying to take a proactive approach to failure management
    corecore