288 research outputs found

    Statistical post-processing of visibility ensemble forecasts

    Full text link
    To be able to produce accurate and reliable predictions of visibility has crucial importance in aviation meteorology, as well as in water- and road transportation. Nowadays, several meteorological services provide ensemble forecasts of visibility; however, the skill, and reliability of visibility predictions are far reduced compared to other variables, such as temperature or wind speed. Hence, some form of calibration is strongly advised, which usually means estimation of the predictive distribution of the weather quantity at hand either by parametric or non-parametric approaches, including also machine learning-based techniques. As visibility observations - according to the suggestion of the World Meteorological Organization - are usually reported in discrete values, the predictive distribution for this particular variable is a discrete probability law, hence calibration can be reduced to a classification problem. Based on visibility ensemble forecasts of the European Centre for Medium-Range Weather Forecasts covering two slightly overlapping domains in Central and Western Europe and two different time periods, we investigate the predictive performance of locally, semi-locally and regionally trained proportional odds logistic regression (POLR) and multilayer perceptron (MLP) neural network classifiers. We show that while climatological forecasts outperform the raw ensemble by a wide margin, post-processing results in further substantial improvement in forecast skill and in general, POLR models are superior to their MLP counterparts.Comment: 23 pages, 13 figures, 4 table

    Accident prediction using machine learning:analyzing weather conditions, and model performance

    Get PDF
    Abstract. The primary focus of this study was to investigate the impact of weather and road conditions on the severity of accidents and to determine the feasibility of machine learning models in accurately predicting the likelihood of such incidents. The research was centered on two key research questions. Firstly, the study examined the influence of weather and road conditions on accident severity and identified the most related factors contributing to accidents. We utilized an open-source accident dataset, which was preprocessed using techniques like variable selection, missing data elimination, and data balancing through the Synthetic Minority Over-sampling Technique (SMOTE). Chi-square statistical analysis was performed, suggesting that all weather-related variables are more or less associated with the severity of accidents. Visibility and temperature were found to be the most critical factors affecting the severity of road accidents. Hence, appropriate measures such as implementing effective fog dispersal systems, heatwave alerts, or improved road maintenance during extreme temperatures could help reduce accident severity. Secondly, the research evaluated the ability of machine learning models including decision trees, random forests, naive bayes, extreme gradient boost, and neural networks to predict accident likelihood. The models’ performance was gauged using metrics like accuracy, precision, recall, and F1 score. The Random Forest model emerged as the most reliable and accurate model for predicting accidents, with an overall accuracy of 98.53%. The Decision Tree model also showed high overall accuracy (95.33%), indicating its reliability. However, the Naive Bayes model showed the lowest accuracy (63.31%) and was deemed less reliable in this context. It is concluded that machine learning models can be effectively used to predict the likelihood of accidents, with models like Random Forest and Decision Tree proving the most effective. However, the effectiveness of each model may vary depending on the dataset and context, necessitating further testing and validation for real-world implementation. These findings not only provide insight into the factors affecting accident severity but also open a promising avenue in employing machine learning techniques for proactive accident prediction and mitigation. Future studies can aim to refine the models further and potentially integrate them into traffic management systems to enhance road safety

    Advances in Data Mining Knowledge Discovery and Applications

    Get PDF
    Advances in Data Mining Knowledge Discovery and Applications aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications

    Prediction of Airport Arrival Rates Using Data Mining Methods

    Get PDF
    This research sought to establish and utilize relationships between environmental variable inputs and airport efficiency estimates by data mining archived weather and airport performance data at ten geographically and climatologically different airports. Several meaningful relationships were discovered using various statistical modeling methods within an overarching data mining protocol and the developed models were tested using historical data. Additionally, a selected model was deployed using real-time predictive weather information to estimate airport efficiency as a demonstration of potential operational usefulness. This work employed SAS® Enterprise Miner TM data mining and modeling software to train and validate decision tree, neural network, and linear regression models to estimate the importance of weather input variables in predicting Airport Arrival Rates (AAR) using the FAA’s Aviation System Performance Metric (ASPM) database. The ASPM database contains airport performance statistics and limited weather variables archived at 15-minute and hourly intervals, and these data formed the foundation of this study. In order to add more weather parameters into the data mining environment, National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Information (NCEI) meteorological hourly station data were merged with the ASPM data to increase the number of environmental variables (e.g., precipitation type and amount) into the analyses. Using the SAS® Enterprise Miner TM, three different types of models were created, compared, and scored at the following ten airports: a) Hartsfield-Jackson Atlanta International Airport (ATL), b) Los Angeles International Airport (LAX), c) O’Hare International Airport (ORD), d) Dallas/Fort Worth International Airport (DFW), e) John F. Kennedy International Airport (JFK), f) Denver International Airport (DEN), g) San Francisco International Airport (SFO), h) Charlotte-Douglas International Airport (CLT), i) LaGuardia Airport (LGA), and j) Newark Liberty International Airport (EWR). At each location, weather inputs were used to estimate AARs as a metric of efficiency easily interpreted by FAA airspace managers. To estimate Airport Arrival Rates, three data sets were used: a) 15-minute and b) hourly ASPM data, along with c) a merged ASPM and meteorological hourly station data set. For all three data sets, the models were trained and validated using data from 2014 and 2015, and then tested using 2016 data. Additionally, a selected airport model was deployed using National Weather Service (NWS) Localized Aviation MOS (Model Output Statistics) Program (LAMP) weather guidance as the input variables over a 24-hour period as a test. The resulting AAR output predictions were then compared with the real-world AARs observed. Based on model scoring using 2016 data, LAX, ATL, and EWR demonstrated useful predictive performance that potentially could be applied to estimate real-world AARs. Marginal, but perhaps useful AAR prediction might be gleaned operationally at LGA, SFO, and DFW, as the number of successfully scored cases fall loosely within one standard deviation of acceptable model performance arbitrarily set at ten percent of the airport’s maximum AAR. The remaining models studied, DEN, CLT, ORD, and JFK appeared to have little useful operational application based on the 2016 model scoring results

    Statistical Postprocessing of Numerical Weather Prediction Forecasts using Machine Learning

    Get PDF
    Nowadays, weather prediction is based on numerical models of the physics of the atmosphere. These models are usually run multiple times based on randomly perturbed initial conditions. The resulting so-called ensemble forecasts represent distinct scenarios of the future and provide probabilistic projections. However, these forecasts are subject to systematic errors such as biases and they are often unable to quantify the forecast uncertainty adequately. Statistical postprocessing methods aim to exploit structure in past pairs of forecasts and observations to correct these errors when applied to future forecasts. In this thesis, we develop statistical postprocessing methods based on the central paradigm of probabilistic forecasting, that is, to maximize the sharpness subject to calibration. A wide range of statistical and machine learning methods is presented with a focus on novel neural network-based postprocessing techniques. In particular, we analyze the aggregation of distributional forecasts from neural network ensembles and develop statistical postprocessing methods for ensemble forecasts of wind gusts, with a focus on European winter storms

    The 8th International Conference on Time Series and Forecasting

    Get PDF
    The aim of ITISE 2022 is to create a friendly environment that could lead to the establishment or strengthening of scientific collaborations and exchanges among attendees. Therefore, ITISE 2022 is soliciting high-quality original research papers (including significant works-in-progress) on any aspect time series analysis and forecasting, in order to motivating the generation and use of new knowledge, computational techniques and methods on forecasting in a wide range of fields

    Exploring Machine Learning Models for Wind Speed Prediction

    Get PDF
    The aim of this work present a comprehensive exploration of machine learning models and compare their performance for wind speed prediction. The prediction is based on variables from atmospheric reanalysis data from a specific wind farm located in Spain as predictive inputs for the system

    Koneoppimiseen perustuvat sään vaikutusennustukset

    Get PDF
    Defence is held on 2.11.2021 15:00 – 19:00 Remote connection link https://aalto.zoom.us/j/69735940472Natural disasters influenced over 4 billion people, required 1.23 million lives, and caused almost US$ 3 trillion economic losses between 2000 and 2019. The picture becomes even more deplorable when hazards, smaller-scale severe weather events not requiring casualties, are considered. For example, 78 percent of power outages in Finland were inflicted by extreme weather in 2017, and train delays, often caused by adverse weather, have been estimated to cost 1 billion pounds during 2006 and 2007 in the UK. To mitigate the effects of the adverse weather and increase the resilience of the societies, the World Meteorological Organisation (WMO) raised the consciousness of impact-based warnings along with impact forecasts. Such warnings and predictions can be used in various domains to prepare, alleviate and recuperate from adverse weather conditions.  This thesis studies how to preprocess data and use machine learning to create valuable impact forecasts for power grid and rail traffic operators. The thesis introduces a novel object-oriented method to predict power outages caused by convective storms. The method combines state-of-the-art storm identification, tracking, and nowcasting algorithms with modern machine learning methods. The proposed object-oriented method is also adapted to predict power outages caused by large-scale extratropical storms days ahead. In addition, the thesis studies the task of predicting weather-inflicted train delays. The method presented in the thesis hinges weather parameters on train delays to anticipate the delays days ahead. The thesis shows that the object-oriented approach is a vindicable method to predict power outages caused by convective storms and that a similar approach is feasible also in the context of extratropical storms. The introduced methods provide power grid operators increasingly accurate outage predictions. The thesis also demonstrates that the train delays related to adverse weather can be predicted with good quality training data. Such predictions offer cardinal information for rail traffic operators in preparing the challenging conditions. Presumably, similar approaches can be applied to any other domain with quantitative impacts produced by identifiable weather events, if sufficient impact data are available. Several advanced machine learning methods were evaluated in the tasks. The results corroborate with existing research: random forests provided a robust performance in all tasks, but also gradient boosting trees, Gaussian processes, and support vector machines proved useful.Luonnonkatastrofit vaikuttivat yli 4 miljardiin henkeen, vaativat 1,23 miljoonaa kuolonuhria ja tuottivat lähes 3 biljoonan dollarin taloudelliset tappiot vuosina 2000 -- 2019. Kuva heikkenee entisestään, mikäli huomioidaan myös pienemmän luokan vakavat säätapahtumat. Esimerkiksi 78 prosenttia Suomen vuoden 2017 sähkökatkoista oli sään aiheuttamia. Toisaalta -- usein säähän liittyvät -- junien myöhästymiset tuottivat arviolta miljardin punnan tappiot vuosina 2006 -- 2007 Isossa-Britanniassa. Maailman ilmatieteiden järjestö (WMO) onkin tähdentänyt vaikutusperusteisen varoitusten ja vaikutusennusteiden tärkeyttä vaaralliseen säähän varautumisessa. Vaikutusperusteiset varoitukset ja ennustukset ovat tärkeä apuväline useilla yhteiskunnan osa-alueilla varautuessa ääreviin sääilmiöihin sekä lievittäessä niiden vaikutuksia ja toipuessa niistä.  Tämä väitöskirja tutkii kuinka esiprosessoida dataa ja hyödyntää koneoppmimista sähköverkko- ja junaliikenneoperaattoreille tuotetuissa vaikutusennusteissa. Väitöskirja esittelee uuden oliopohjaisen metodin konvektiivisten rajuilmojen aiheuttamien sähkökatkojen ennustamiseksi. Metodi yhdistää ajantasaiset myrskyn tunnistus-, seuraus- ja lähihetkiennustusalgoritmit moderneihin koneoppimismenetelmiin. Ehdotettu oliopohjainen metodi on myös muokattu ennustamaan laaja-alaisten matalapainemyrskyjen aiheuttamia sähköatkoja. Lisäksi, väitöskirja tutkii sään aiheuttamien junien myöhästymisten ennustamista. Väitöskirjassa esitetty methodi yhdistää sääparametrit junien myöhästymisdataan, jotta myöhästymisiä voidaan ennakoida päiviä etukäteen.  Väitöskirja osoittaa, että oliopohjainen lähestymistapa toimii hyvin konvektiivisten myrskyjen aiheuttamien sähkökatkojen ennustamisessa, ja että vastaavaa metodia voidaan soveltaa myös matalapainemyrskyjen tapauksessa. Väitöskirjassa esitetyt metodit tarjoavat sähköverkko-operaattoreille entistä tarkempia sähkökatkoennusteita. Väitöskirja osoittaa myös, että sään aiheuttamien junien myöhästymisiä voidaan ennustaa mikäli hyvälaatuista koulutusdataa on saatavilla. Tällaiset ennustukset ovat hyvin tärkeitä junaliikenneoperaattoreille haasteellisiin olosuhteisiin varauduttaessa. Oletettavasti samoja lähestymistapoja voidaan hyödyntää myös muilla aloilla, joilla vaikutuksia ovat numeerisesti mallinnettavia ja tunnistettavan säätapahtuman tuottamia sekä kunnollista vaikutusdataa on saatavilla. Väitöskirja vertailee useiden koneoppmismetodeiden soveltuvuutta käsiteltäviin tähtäviin. Tulokset ovat linjassa edellisten tutkimusten kanssa: erityisesti satunnaismetsät ('random forests') tarjosivat toimitavarmoja ennusteita kaikissa tehtävissä, mutta gradienttivahvisteiset puut ('gradient boosting trees'), Gaussiset prosessit ('Gaussian processes') ja tukiverkkokoneet ('support vector machines') toimivat tehtävissä

    Data Mining

    Get PDF
    Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment
    corecore