27 research outputs found

    Chemical laboratories 4.0: A two-stage machine learning system for predicting the arrival of samples

    Get PDF
    This paper presents a two-stage Machine Learning (ML) model to predict the arrival time of In-Process Control (IPC) samples at the quality testing laboratories of a chemical company. The model was developed using three iterations of the CRoss-Industry Standard Process for Data Mining (CRISP-DM) methodology, each focusing on a different regression approach. To reduce the ML analyst effort, an Automated Machine Learning (AutoML) was adopted during the modeling stage of CRISP-DM. The AutoML was set to select the best among six distinct state-of-the-art regression algorithms. Using recent real-world data, the three main regression approaches were compared, showing that the proposed two-stage ML model is competitive and provides interesting predictions to support the laboratory management decisions (e.g., preparation of testing instruments). In particular, the proposed method can accurately predict 70% of the examples under a tolerance of 4 time units.This work has been supported by FCT – Funda ̧c ̃ao para a Ciˆencia e Tecnologiawithin the R&D Units Project Scope: UIDB/00319/2020. The authors also wishto thank the chemical company staff involved with this project for providing thedata and also the valuable domain feedback

    Rule-Based Forecasting: Using Judgment in Time-Series Extrapolation

    Get PDF
    Rule-Based Forecasting (RBF) is an expert system that uses judgment to develop and apply rules for combining extrapolations. The judgment comes from two sources, forecasting expertise and domain knowledge. Forecasting expertise is based on more than a half century of research. Domain knowledge is obtained in a structured way; one example of domain knowledge is managers= expectations about trends, which we call “causal forces.” Time series are described in terms of 28 conditions, which are used to assign weights to extrapolations. Empirical results on multiple sets of time series show that RBF produces more accurate forecasts than those from traditional extrapolation methods or equal-weights combined extrapolations. RBF is most useful when it is based on good domain knowledge, the domain knowledge is important, the series is well behaved (such that patterns can be identified), there is a strong trend in the data, and the forecast horizon is long. Under ideal conditions, the error for RBF’s forecasts were one-third less than those for equal-weights combining. When these conditions are absent, RBF neither improves nor harms forecast accuracy. Some of RBF’s rules can be used with traditional extrapolation procedures. In a series of studies, rules based on causal forces improved the selection of forecasting methods, the structuring of time series, and the assessment of prediction intervals

    ACL injuries identifiable for pre-participation imagiological analysis: Risk factors

    Get PDF
    Identification of pre-participation risk factors for noncontact anterior cruciate ligament (ACL) injuries has been attracting a great deal of interest in the sports medicine and traumatology communities. Appropriate methods that enable predicting which patients could benefit from pre- ventive strategies are most welcome. This would enable athlete-specific training and conditioning or tailored equipment in order to develop appropriate strategies to reduce incidence of injury. In order to accomplish these goals, the ideal system should be able to assess both anatomic and functional features. Complementarily, the screening method must be cost-effective and suited for widespread application. Anatomic study protocol requiring only standard X rays could answer some of such demands. Dynamic MRI/CT evaluation and electronically assisted pivot-shift evaluation can be powerful tools providing complementary information. These upcoming insights, when validated and properly combined, envision changing pre-participation knee examination in the near future. Herein different methods (validated or under research) aiming to improve the capacity to identify persons/athletes with higher risk for ACL injury are overviewed.

    Standards and Practices for Forecasting

    Get PDF
    One hundred and thirty-nine principles are used to summarize knowledge about forecasting. They cover formulating a problem, obtaining information about it, selecting and applying methods, evaluating methods, and using forecasts. Each principle is described along with its purpose, the conditions under which it is relevant, and the strength and sources of evidence. A checklist of principles is provided to assist in auditing the forecasting process. An audit can help one to find ways to improve the forecasting process and to avoid legal liability for poor forecasting

    Stakeholder Salience for Small Businesses : A Social Proximity Perspective

    Get PDF
    This paper advances stakeholder salience theory from the viewpoint of small businesses. It is argued that the stakeholder salience process for small businesses is influenced by their local embeddedness, captured by the idea of social proximity, and characterised by multiple relationships that the owner-manager and stakeholders share beyond the business context. It is further stated that the ethics of care is a valuable ethical lens through which to understand social proximity in small businesses. The contribution of the study conceptualises how the perceived social proximity between local stakeholders and small business owner-managers influences managerial considerations of the legitimacy, power and urgency of stakeholders and their claims. Specifically, the paradoxical nature of close relationships in the salience process is acknowledged and discussed.Peer reviewe

    Predictive performance of international COVID-19 mortality forecasting models

    No full text

    How to evaluate sentiment classifiers for Twitter time-ordered data?

    Get PDF
    Social media are becoming an increasingly important source of information about the public mood regarding issues such as elections, Brexit, stock market, etc. In this paper we focus on sentiment classification of Twitter data. Construction of sentiment classifiers is a standard text mining task, but here we address the question of how to properly evaluate them as there is no settled way to do so. Sentiment classes are ordered and unbalanced, and Twitter produces a stream of time-ordered data. The problem we address concerns the procedures used to obtain reliable estimates of performance measures, and whether the temporal ordering of the training and test data matters. We collected a large set of 1.5 million tweets in 13 European languages. We created 138 sentiment models and out-of-sample datasets, which are used as a gold standard for evaluations. The corresponding 138 in-sample datasets are used to empirically compare six different estimation procedures: three variants of cross-validation, and three variants of sequential validation (where test set always follows the training set). We find no significant difference between the best cross-validation and sequential validation. However, we observe that all cross-validation variants tend to overestimate the performance, while the sequential methods tend to underestimate it. Standard cross-validation with random selection of examples is significantly worse than the blocked cross-validation, and should not be used to evaluate classifiers in time-ordered data scenarios

    A Multivariate and Multi-step Ahead Machine Learning Approach to Traditional and Cryptocurrencies Volatility Forecasting

    No full text
    Multivariate time series forecasting involves the learning of historical multivariate information in order to predict the future values of several quantities of interests, accounting for interdependencies among them. In finance, several of this quantities of interests (stock valuations, return, volatility) have been shown to be mutually influencing each other, making the prediction of such quantities a difficult task, especially while dealing with an high number of variables and multiple horizons in the future. Here we propose a machine learning based framework, the DFML, based on the Dynamic Factor Model, to first perform a dimensionality reduction and then perform a multiple step ahead forecasting of a reduced number of components. Finally, the components are transformed again into an high dimensional space, providing the desired forecast. Our results, comparing the DFML with several state of the art techniques from different domanins (PLS, RNN, LSTM, DFM), on both traditional stock markets and cryptocurrencies market and for different families of volatility proxies show that the DFML outperforms the concurrent methods, especially for longer horizons. We conclude by explaining how we wish to further improve the performances of the framework, both in terms of accuracy and computational efficiency.SCOPUS: cp.kinfo:eu-repo/semantics/publishe
    corecore