1,270 research outputs found

    Rails Quality Data Modelling via Machine Learning-Based Paradigms

    Get PDF

    A Novel Business Process Prediction Model Using a DeepLearning Method

    Get PDF
    The ability to proactively monitor business pro-cesses is a main competitive differentiator for firms. Processexecution logs generated by process aware informationsystems help to make process specific predictions forenabling a proactive situational awareness. The goal of theproposed approach is to predict the next process event fromthe completed activities of the running process instance,based on the execution log data from previously completedprocess instances. By predicting process events, companiescan initiate timely interventions to address undesired devi-ations from the desired workflow. The paper proposes amulti-stage deep learning approach that formulates the nextevent prediction problem as a classification problem. Fol-lowing a feature pre-processing stage with n-grams andfeature hashing, a deep learning model consisting of anunsupervised pre-training component with stacked autoen-coders and a supervised fine-tuning component is applied.Experiments on a variety of business process log datasetsshow that the multi-stage deep learning approach providespromising results. The study also compared the results toexisting deep recurrent neural networks and conventionalclassification approaches. Furthermore, the paper addressesthe identification of suitable hyperparameters for the pro-posed approach, and the handling of the imbalanced nature ofbusiness process event datasets

    The Challenge of Machine Learning in Space Weather Nowcasting and Forecasting

    Get PDF
    The numerous recent breakthroughs in machine learning (ML) make imperative to carefully ponder how the scientific community can benefit from a technology that, although not necessarily new, is today living its golden age. This Grand Challenge review paper is focused on the present and future role of machine learning in space weather. The purpose is twofold. On one hand, we will discuss previous works that use ML for space weather forecasting, focusing in particular on the few areas that have seen most activity: the forecasting of geomagnetic indices, of relativistic electrons at geosynchronous orbits, of solar flares occurrence, of coronal mass ejection propagation time, and of solar wind speed. On the other hand, this paper serves as a gentle introduction to the field of machine learning tailored to the space weather community and as a pointer to a number of open challenges that we believe the community should undertake in the next decade. The recurring themes throughout the review are the need to shift our forecasting paradigm to a probabilistic approach focused on the reliable assessment of uncertainties, and the combination of physics-based and machine learning approaches, known as gray-box.Comment: under revie

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Few-Shot Learning for Post-Earthquake Urban Damage Detection

    Get PDF
    Koukouraki, E., Vanneschi, L., & Painho, M. (2022). Few-Shot Learning for Post-Earthquake Urban Damage Detection. Remote Sensing, 14(1), 1-20. [40]. https://doi.org/10.3390/rs14010040 ------------------------------ Funding: This study was partially supported by FCT, Portugal, through funding of projects BINDER (PTDC/CCI-INF/29168/2017) and AICE DSAIPA/DS/0113/2019). E.K. would like to acknowledge the Erasmus Mundus scholarship program, for providing the context and financial support to carry out this study, through the admission to the Master of Science in Geospatial Technologies.Among natural disasters, earthquakes are recorded to have the highest rates of human loss in the past 20 years. Their unexpected nature has severe consequences on both human lives and material infrastructure, demanding urgent action to be taken. For effective emergency relief, it is necessary to gain awareness about the level of damage in the affected areas. The use of remotely sensed imagery is popular in damage assessment applications; however, it requires a considerable amount of labeled data, which are not always easy to obtain. Taking into consideration the recent developments in the fields of Machine Learning and Computer Vision, this study investigates and employs several Few-Shot Learning (FSL) strategies in order to address data insufficiency and imbalance in post-earthquake urban damage classification. While small datasets have been tested against binary classification problems, which usually divide the urban structures into collapsed and non-collapsed, the potential of limited training data in multi-class classification has not been fully explored. To tackle this gap, four models were created, following different data balancing methods, namely cost-sensitive learning, oversampling, undersampling and Prototypical Networks. After a quantitative comparison among them, the best performing model was found to be the one based on Prototypical Networks, and it was used for the creation of damage assessment maps. The contribution of this work is twofold: we show that oversampling is the most suitable data balancing method for training Deep Convolutional Neural Networks (CNN) when compared to cost-sensitive learning and undersampling, and we demonstrate the appropriateness of Prototypical Networks in the damage classification context.publishersversionpublishe

    Learning Invariant Representations with a Nonparametric Nadaraya-Watson Head

    Full text link
    Machine learning models will often fail when deployed in an environment with a data distribution that is different than the training distribution. When multiple environments are available during training, many methods exist that learn representations which are invariant across the different distributions, with the hope that these representations will be transportable to unseen domains. In this work, we present a nonparametric strategy for learning invariant representations based on the recently-proposed Nadaraya-Watson (NW) head. The NW head makes a prediction by comparing the learned representations of the query to the elements of a support set that consists of labeled data. We demonstrate that by manipulating the support set, one can encode different causal assumptions. In particular, restricting the support set to a single environment encourages the model to learn invariant features that do not depend on the environment. We present a causally-motivated setup for our modeling and training strategy and validate on three challenging real-world domain generalization tasks in computer vision.Comment: Accepted to NeurIPS 202

    Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with Applications

    Get PDF
    Analysis and predictive modeling of massive datasets is an extremely significant problem that arises in many practical applications. The task of predictive modeling becomes even more challenging when data are imperfect or uncertain. The real data are frequently affected by outliers, uncertain labels, and uneven distribution of classes (imbalanced data). Such uncertainties create bias and make predictive modeling an even more difficult task. In the present work, we introduce a cost-sensitive learning method (CSL) to deal with the classification of imperfect data. Typically, most traditional approaches for classification demonstrate poor performance in an environment with imperfect data. We propose the use of CSL with Support Vector Machine, which is a well-known data mining algorithm. The results reveal that the proposed algorithm produces more accurate classifiers and is more robust with respect to imperfect data. Furthermore, we explore the best performance measures to tackle imperfect data along with addressing real problems in quality control and business analytics