84 research outputs found

    A Machine Learning Tutorial for Operational Meteorology, Part II: Neural Networks and Deep Learning

    Full text link
    Over the past decade the use of machine learning in meteorology has grown rapidly. Specifically neural networks and deep learning have been used at an unprecedented rate. In order to fill the dearth of resources covering neural networks with a meteorological lens, this paper discusses machine learning methods in a plain language format that is targeted for the operational meteorological community. This is the second paper in a pair that aim to serve as a machine learning resource for meteorologists. While the first paper focused on traditional machine learning methods (e.g., random forest), here a broad spectrum of neural networks and deep learning methods are discussed. Specifically this paper covers perceptrons, artificial neural networks, convolutional neural networks and U-networks. Like the part 1 paper, this manuscript discusses the terms associated with neural networks and their training. Then the manuscript provides some intuition behind every method and concludes by showing each method used in a meteorological example of diagnosing thunderstorms from satellite images (e.g., lightning flashes). This paper is accompanied with an open-source code repository to allow readers to explore neural networks using either the dataset provided (which is used in the paper) or as a template for alternate datasets

    Computational Optimizations for Machine Learning

    Get PDF
    The present book contains the 10 articles finally accepted for publication in the Special Issue “Computational Optimizations for Machine Learning” of the MDPI journal Mathematics, which cover a wide range of topics connected to the theory and applications of machine learning, neural networks and artificial intelligence. These topics include, among others, various types of machine learning classes, such as supervised, unsupervised and reinforcement learning, deep neural networks, convolutional neural networks, GANs, decision trees, linear regression, SVM, K-means clustering, Q-learning, temporal difference, deep adversarial networks and more. It is hoped that the book will be interesting and useful to those developing mathematical algorithms and applications in the domain of artificial intelligence and machine learning as well as for those having the appropriate mathematical background and willing to become familiar with recent advances of machine learning computational optimization mathematics, which has nowadays permeated into almost all sectors of human life and activity

    Contributions to ionospheric modeling with GNSS in mapping function, tomography and polar electron

    Get PDF
    This dissertation focuses on determining the vertical electron content distribution in low and high vertical resolution from ground-based and LEO on board GNSS data and improving the knowledge of ionosphere climatology in northern mid-latitude and polar regions. The novelty is summarized in the following four aspects: The first contribution is to propose a new ionospheric mapping function concept - Barcelona Ionospheric Mapping Function (BIMF), in order to improve STEC (Slant Total Electron Content) conversion accuracy from any given VTEC (Vertical Total Electron Content) model. BIMF is based on the climatic modeling of the VTEC fraction in the second layer - µ2, which is the byproduct of UQRG generated by UPC. The first implementation of BIMF is BIMF-nml for the northern mid-latitudes, where the latitudinal variation of µ2 is neglected. µ2 is modeled as function of date and local time. From the user’s perspective, BIMF is the linear combination of µ2 and the standard ionospheric mapping function, and only needs 41 constant coefficients, making BIMF achieve the simplicity for application. The good performance has been demonstrated in the dSTEC assessment for different IGSGIMs: UQRG, CODG and JPLG. The second contribution is to confirm the capability of UQRG GIMs to detect representative ionospheric features in polar regions through six case studies, including TOI (Tongue of Ionization), trough, flux transfer event, theta-aurora, ionospheric convection patterns and storm enhanced density. The long-term VTEC and µ2 data provide valuable databases for studying the morphology and climatology of polar ionospheric phenomena. The unsupervised clustering results of normalized VTEC distribution show that TOI and polar cap patches exhibit an annual dependence, i.e. most TOI and patches occurring in the North Hemisphere winter and the South Hemisphere summer. The third contribution is to propose a hybrid method - AVHIRO (the Abel-VaryChap Hybrid modeling from topside Incomplete RO data), to solve an ill-posed rank-deficient problem in the Abel electron density retrieval. This work is driven by the future EUMETSAT Polar System 2nd Generation, which provides truncated ionospheric RO data, only below impact heights of 500 km, in order to guarantee a full data gathering of the neutral part. AVHIRO takes advantage of one Linear Vary-Chap model, where the scale height increases linearly with altitude above the F2 layer peak, and uses Powell search to solve the full electron densities, ambiguity term, and four parameters of the Vary-Chap model simultaneously, taking into account the nonlinear interactions between the unknown parameters. The fourth contribution is to take advantage of the geometry brought by combining DORIS, ground-based Galileo, ground-based, LEO-POD and vessel-based GPS data and ingest the multi-source dual-frequency carrier phase measurements into the tomographic model to improve the GIM VTEC estimation precision. The impact of adding each type of measurements, which are Galileo data, vessel-based GPS data, DORIS and LEO-POD GPS data, to ground-based GPS data on GIM product is examined according to two complementing evaluation criteria, JASON-3 VTEC comparison and GPS dSTEC test. This study proves the expected better GIM performance by new data ingestion into tomographic model, which is a successful step forward from conception to initial experimental validation.electrones en resolución vertical baja y alta a partir de medidas GNSS terrestres y a bordo de satélites de órbita baja (LEO), además de utilizar medidas GNSS desde buques y medidas DORIS, además de mejorar el conocimiento de la climatología de la ionosfera en las regiones polares y en latitudes medias del hemisferio norte. Las contribuciones se pueden resumir en los siguientes cuatro aspectos: La primera contribución consiste en proponer un nuevo concepto de función de mapeo ionosférico: la función de mapeo ionosférico de Barcelona (BIMF), con el fin de mejorar la precisión de conversión de STEC (contenido total de electrones inclinado) a partir de cualquier modelo de VTEC (contenido total de electrones vertical). BIMF se basa en el modelado climático de la fracción VTEC en la segunda capa - μ2, que es el subproducto de UQRG generado por UPC. La primera implementación de BIMF es BIMF-nml para las latitudes medias del hemisferio norte. μ2 se modela en función del dia y la hora local. Desde la perspectiva del usuario, BIMF es la combinación lineal de μ2 y la función de mapeo ionosférico estándar, y solo necesita 41 coeficientes constantes, lo que hace que BIMF sea facilmente aplicable. Su buen comportamiento se demostró en la evaluación dSTEC para diferentes IGS GIM: UQRG, CODG y JPLG. La segunda contribución se centró en confirmar la capacidad de los GIM UQRG para detectar características ionosféricas representativas en regiones polares a través de seis estudios de casos, que incluyen lenguas de ionización (TOI), depresión de ionización en forma de canal, sucesos de transferencia de flujo, theta-aurora, patrones de convección ionosférica y densidad aumentada durante tormentas geomagnéticas. Los datos a largo plazo de VTEC y μ2 proporcionan valiosas bases de datos para estudiar la morfología y climatología de los fenómenos ionosféricos polares. Los resultados de agrupamiento no supervisados de la distribución normalizada de VTEC muestran que los TOI y los parches en los casquetes polares exhiben una dependencia anual, es decir, la mayoría de los TOI y parches ocurren en el invierno del Hemisferio Norte y el verano del Hemisferio Sur. La tercera contribución ha consistido en proponer un método híbrido: AVHIRO (el modelo híbrido Abel-VaryChap a partir de datos de RO incompletos en la parte superior), para resolver un problema de rango deficiente en la recuperación de la densidad electrónica con el modelo de Abel. Este trabajo está motivado por el futuro sistema polar EUMETSAT de segunda generación, que proporciona datos truncados de RO ionosférica, sólo por debajo de las alturas de impacto de 500 km, con el fin de garantizar una recopilación completa de medidas de la parte neutra. AVHIRO aprovecha un modelo Linear Vary-Chap, donde la altura de la escala aumenta linealmente con la altitud por encima del pico de la capa F2, y utiliza la búsqueda Powell para resolver las densidades completas de electrones, el término de ambig ¨ uedad y cuatro parámetros del modelo Vary-Chap simultáneamente, teniendo en cuenta las interacciones no lineales entre los parámetros desconocidos. La cuarta contribución es aprovechar la geometría aportada por la combinación de datos GPS DORIS, Galileo en tierra, LEO-POD y en barco, e incorporar las mediciones de la fase de la portadora de doble frecuencia de múltiples fuentes en el modelo tomográfico para mejorar la precisión de estimación de GIM VTEC. El impacto de agregar cada tipo de mediciones, que son datos de Galileo, datos de GPS basados en embarcaciones, datos de GPS DORIS y LEO-POD, a datos de GPS terrestres en productos GIM se examina de acuerdo con dos criterios de evaluación complementarios, comparación con VTEC[JASON-3] y con dSTEC[GPS]. Este estudio demuestra el mejor rendimiento esperado de GIM por la nueva ingesta de datos en el modelo tomográfico, que es un exitoso paso adelante desde la concepción hasta la validación experimental inicial

    단기 기상 예측을 위한 기계 학습 기법

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2020. 2. 문병로.Machine learning is the study of artificial intelligence that automatically generates programs from data. It is distinguished from conventional programming, which needs to write a series of specific instructions directly to perform a specific task. Machine learning is preferred when it is difficult to develop an effective algorithm for given tasks such as natural language processing or computer vision. Traditionally, numerical weather prediction (NWP) has been a prevailing method to forecast weather. The NWP predicts future weather through simulations using mathematical models based on current weather conditions. However, the NWP has some problems: errors in the current observations are amplified as simulation proceeds; spatial and temporal resolutions are limited; and there is a spin-up problem, in which initial forecasts are unreliable while the model attempts to stabilize. An alternative approach is needed to complement NWP on small spatial and temporal scales. Therefore, we propose short-range weather forecast models that employ machine learning techniques appropriate for a given forecasting problem. First, we introduce dimensionality reduction techniques to construct effective forecasting models with high-dimensional input data. As the dimension of input data increases, the amount of time or memory required by machine learning techniques can increase significantly. This phenomenon is referred to as the curse of dimensionality, which can be ialleviated by dimensionality reduction techniques. Dimensionality reduction techniques include feature selection and feature extraction. Feature selection selects a subset of input variables, while feature extraction projects high-dimensional features to a lower dimensional space. The details of correlation-based feature selection, and principal component analysis (PCA) which is a representative feature extraction are provided. We then propose a scheme for precipitation type forecast as an example of meteorological forecasting using dimensionality reduction techniques. This scheme takes 93 meteorological variables as input, and uses feature selection to assemble an effective subset of input variables. Multinomial logistic regression is used to classify precipitation as rain, snow, or sleet. This scheme achieved predictions which are 13 % more accurate than the original forecasts, and feature selection improved the accuracy to a statistically significant level. Second, we present sampling techniques that help predict rare meteorological events. Machine learning algorithms tend to sacrifice performance on rare instances to overall performance, which is referred to as class imbalance problem. To resolve this problem, undersampling reduces the number of common instances. As an example of meteorological forecasting using undersampling, we propose a scheme for lightning forecast. Meteorological variables from European Centre for Medium-range Weather Forecasts provide the input to our scheme, in which an undersampling is used to alleviate the class imbalance problem, and SVMs are used to forecast lightning activities within a particular location and time interval. When the scheme was trained with the original input data, it could not predict any lightning. After undersampling, however, the scheme successfully detected about 38 % of the lightning strikes. Finally, we propose a selective discretization technique that automatically selects and discretizes suitable variables for discretization. Discretization is a preprocessing technique that converts continuous variables into categorical ones. Conventional discretization techniques apply discretization to all variables, which may lead to significant information loss. The selective discretization minimizes information loss by discretizing only variables that have nonlinear relationship with the dependent variable. We suggest a scheme for heavy rainfall forecast as an example of meteorological forecasting using the selective discretization. This scheme takes input from automatic weather stations, and predicts whether or not the heavy rain criterion will be met within the next three hours. The input variables are preprocessed to have a compressed yet efficient representation through the selective discretization and iiPCA. Logistic regression uses the preprocessed data to predict whether or not the heavy rain condition will be satisfied. The selective discretization selectively discretized continuous variables such as date and temperature, contributing to the improvement of predictive performance to a statistically significant level. We present effective machine learning techniques for short-range weather forecast, and provide case studies that apply machine learning to precipitation type forecast, lightning forecast, and heavy rainfall forecast. We combine appropriate techniques to solve each forecasting problem effectively, and the resulting prediction models were good enough to be used for operational forecasting system.기계 학습은 주어진 데이터를 통해 자동으로 프로그램을 생성해내는 기법으로서 인공지능 의 한 분야이다. 특정 업무를 수행하기 위해 일련의 구체적인 명령어를 직접 기입해야만 했던 종래의 프로그래밍과 구분되며, 자연어 처리나 컴퓨터 비전에서와 같이 효과적인 알고리즘을 개발하기 힘든 경우 기계 학습이 선호된다. 전통적으로 기상 예보는 수치 예보 기법을 통해 이루어진다. 수치 예보는 현재의 기상 정 보를 바탕으로 수학적 모델을 이용한 시뮬레이션을 통해 미래의 날씨를 예측한다. 하지만 수치 예보 기법은 초기 자료로 사용한 데이터에 오류가 있을 경우 시뮬레이션을 해나가며 그 오류가 증폭되고, 시공간적으로 비교적 낮은 해상도를 지니고 있으며, 일정 시간이 지나야만 예보가 안정화되기 때문에 국소적이면서 단기적인 기상 예측 문제에는 적합하지 않다. 이를 해결하기 위해 주어진 예측 문제에 적절한 기계 학습 기법을 사용하여 효과적으로 단기 기상 예측을 수행하는 방법들을 제안한다. 첫 번째로, 고차원의 입력 데이터를 가지고 효과적인 예측 모델을 만들기 위한 차원 축소 기법들을 소개한다. 입력 데이터의 차원이 증가함에 따라 기계학습 기법들이 필요로 하는 시간 이나 메모리 요구량이 폭발적으로 증가하는 차원의 저주가 발생하는데, 차원 축소 기술은 이를 완화하기 위한 기법들이다. 차원 축소 기술에는 특징 선택과 특징 추출이 있다. 특징 선택은 전체 입력 인자들 중 일부의 입력 인자들만을 선택하는 반면, 특징 추출은 고차원의 입력 데 이터를 저차원의 공간에 투영한다. 상관 관계 기반의 특징 선택과 대표적인 특징 추출 기법인 주성분 분석이 제시되며, 차원 축소 기술을 사용한 기상 예측 사례로서 강수 유형 예측 모델이 제안된다. 해당 모델은 단기 기상 예보에 포함된 93개의 기상 인자를 입력으로 받아 겨울철 강수 유형을 예측한다. 유효한 입력 인자 집합을 선택하기 위해 특징 선택 기법을 사용하며, 다중 로지스틱 회귀는 선택된 입력 인자들을 이용하여 비, 눈, 그리고 진눈깨비 중 어느 형태로 강수가 발생할 것인지 예측하기 위해 사용된다. 본 예측 모델은 강수유형 예측 정확도를 13 % 이상 개선했으며, 본 모델에서 특징 선택은 통계적으로 유의한 수준으로 정확도를 향상시켰다. 두 번째로, 흔치 않은 기상 이벤트를 예측하는 데에 도움을 주는 샘플링 기법들이 소개된다. 훈련 데이터에 클래스가 불균형하게 분포하는 경우 기계 학습 기법들은 전체 정확도를 높이고자 희귀한 예제들에 대한 예측 성능을 희생하는 경향이 있다. 이러한 클래스 불균형 학습 문제를 해결하기 위해 언더샘플링 기법은 흔한 예제의 숫자를 줄인다. 언더샘플링 기법을 사용한 기상 예측 사례로서 뇌전 예측 모델이 제시된다. 해당 모델은 유럽 중기 예보 센터로부터 단기 기상 예보를 입력으로 받아 뇌전 발생 유무를 예측한다. 클래스 불균형 학습 문제를 해결하기 위해 언더샘플링이 사용되며, 지지 벡터 기계를 사용하여 특정 시간대에 특정 지역에서의 뇌전 발생 유무를 예측한다. 원래의 입력 데이터에서는 뇌전을 하나도 예측하지 못했지만 언더샘플링을 통해 약 38 %의 뇌전을 성공적으로 감지해냈다. 마지막으로, 이산화하기에 적합한 인자를 자동으로 선별하여 이산화하는 선택적 이산화 기법이 소개된다. 이산화는 연속형 변수를 범주형 변수로 변환하는 전처리 기법이다. 종래의 이산화 기법은 모든 변수에 대해 이산화를 적용하는데 이 과정에서 정보 손실은 불가피하다. 선택적 이산화 기법은 종속 변수와 비선형 관계에 있는 변수만을 이산화하여 정보 손실을 최 소화한다. 이러한 선택적 이산화 기법을 사용한 기상 예측 사례로서 집중 호우 예측 모델이 제시된다. 본 모델은 자동 기상 관측 시스템으로부터 입력을 받아 세 시간 이내에 호우 주의보 조건이 충족될 것인지를 예측한다. 입력 데이터는 선택적 이산화 기법과 주성분 분석을 통해 응축된 양질의 정보를 담도록 전처리되고, 로지스틱 회귀는 전처리된 입력 데이터를 이용하 여 호우 주의보 조건이 만족될 것인지 예측한다. 선택적 이산화 기법은 일자나 기온과 같은 인자들을 선택적으로 이산화하여 통계적으로 유의한 수준으로 예측 성능향상에 기여했다. 본 논문은 단기 기상 예보를 위한 효과적인 기계 학습 기법들을 제시하고, 강수 유형, 뇌전, 그리고 집중 호우 예측에 기계 학습을 효과적으로 적용한 사례들을 제공한다. 각 사례에서는 해당 예측 문제를 효과적으로 풀 수 있는 기법들을 조합했으며, 우리가 만든 예측 모델들은 실제 운용 목적으로 사용할 수 있을 정도의 성공적인 예측 품질을 보여주었다.1 Introduction 1 1.1 Machine Learning 1 1.1.1 Data Preprocessing 2 1.1.2 Classification 3 1.2 Meteorological Forecasts 4 1.2.1 Precipitation Types 5 1.2.2 Lightning 5 1.2.3 Heavy Rainfall 6 1.3 Main Contributions . 6 1.4 Organization 8 2 Dimensional Reduction Techniques 9 2.1 Correlation-based Feature Selection 10 2.2 Principal Component Analysis 12 2.3 Case Study: Precipitation Type Forecast 14 2.3.1 Introduction 14 2.3.2 Forecast Model 16 2.3.3 Experiments 26 2.3.4 Discussions 37 3 Sampling Techniques 40 3.1 Undersampling 40 3.2 Oversampling 42 3.3 Case Study: Lightning Forecast 43 3.3.1 Introduction 44 3.3.2 Forecast Model 45 3.3.3 Experiments 54 3.3.4 Discussions 62 4 Discretization Techniques 65 4.1 Selective Discretization 66 4.2 Minimum Description Length Discretization 68 4.3 Case Study: Heavy Rainfall Forecast 70 4.3.1 Introduction 71 4.3.2 Early Warning System 73 4.3.3 Experiments 80 4.3.4 Discussions 92 5 Conclusions 95Docto

    Data Mining

    Get PDF
    Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment

    Coastal high-frequency radars in the Mediterranean ??? Part 2: Applications in support of science priorities and societal needs

    Get PDF
    International audienceThe Mediterranean Sea is a prominent climate-change hot spot, with many socioeconomically vital coastal areas being the most vulnerable targets for maritime safety, diverse met-ocean hazards and marine pollution. Providing an unprecedented spatial and temporal resolution at wide coastal areas, high-frequency radars (HFRs) have been steadily gaining recognition as an effective land-based remote sensing technology for continuous monitoring of the surface circulation, increasingly waves and occasionally winds. HFR measurements have boosted the thorough scientific knowledge of coastal processes, also fostering a broad range of applications, which has promoted their integration in coastal ocean observing systems worldwide, with more than half of the European sites located in the Mediterranean coastal areas. In this work, we present a review of existing HFR data multidisciplinary science-based applications in the Mediterranean Sea, primarily focused on meeting end-user and science-driven requirements, addressing regional challenges in three main topics: (i) maritime safety, (ii) extreme hazards and (iii) environmental transport process. Additionally, the HFR observing and monitoring regional capabilities in the Mediterranean coastal areas required to underpin the underlying science and the further development of applications are also analyzed. The outcome of this assessment has allowed us to provide a set of recommendations for future improvement prospects to maximize the contribution to extending science-based HFR products into societally relevant downstream services to support blue growth in the Mediterranean coastal areas, helping to meet the UN's Decade of Ocean Science for Sustainable Development and the EU's Green Deal goals
    corecore