4 research outputs found
MEMPSEP III. A machine learning-oriented multivariate data set for forecasting the Occurrence and Properties of Solar Energetic Particle Events using a Multivariate Ensemble Approach
We introduce a new multivariate data set that utilizes multiple spacecraft
collecting in-situ and remote sensing heliospheric measurements shown to be
linked to physical processes responsible for generating solar energetic
particles (SEPs). Using the Geostationary Operational Environmental Satellites
(GOES) flare event list from Solar Cycle (SC) 23 and part of SC 24 (1998-2013),
we identify 252 solar events (flares) that produce SEPs and 17,542 events that
do not. For each identified event, we acquire the local plasma properties at 1
au, such as energetic proton and electron data, upstream solar wind conditions,
and the interplanetary magnetic field vector quantities using various
instruments onboard GOES and the Advanced Composition Explorer (ACE)
spacecraft. We also collect remote sensing data from instruments onboard the
Solar Dynamic Observatory (SDO), Solar and Heliospheric Observatory (SoHO), and
the Wind solar radio instrument WAVES. The data set is designed to allow for
variations of the inputs and feature sets for machine learning (ML) in
heliophysics and has a specific purpose for forecasting the occurrence of SEP
events and their subsequent properties. This paper describes a dataset created
from multiple publicly available observation sources that is validated,
cleaned, and carefully curated for our machine-learning pipeline. The dataset
has been used to drive the newly-developed Multivariate Ensemble of Models for
Probabilistic Forecast of Solar Energetic Particles (MEMPSEP; see MEMPSEP I
(Chatterjee et al., 2023) and MEMPSEP II (Dayeh et al., 2023) for associated
papers)
Detection of malicious content in JSON structured data using multiple concurrent anomaly detection methods
Web applications and Web services often use a data format known as JavaScript Object Notation (JSON) to exchange information. An attacker can tamper with these exchanges to cause the Web service or application to malfunction in a way that is detrimental to the interests of the owners of the Web application or service. Many such applications or services are involved in processes critical to safety or are vital to business interests. Unfortunately, such critical applications cannot always be relied upon to validate the data sent to them. This creates a need for protection external to the applications themselves. This need has been addressed by researchers in other contexts, but there has been little specific focus on JSON and the use of multiple concurrent anomaly detection methods. Some previously proposed solutions involved the detection of known signatures of attacks, but this reduces the chance that new attacks will be recognized. To increase the ability to detect newly created attacks, this research focuses on anomaly detection using general characteristics, rather than the recognition of specific attacks. The detection method this research employs is the Random Forest ensemble algorithm. Metrics such as Shannon entropy, n-gram analysis, JSON structure similarity, character string length, and JSON attribute values are utilized. A goal of this research was the detection of attacks at a rate at least better than chance expectation. This goal was met and exceeded as experimental results using simulated attacks showed considerably better performance. Furthermore, a mathematical model of the interaction of classifier configuration parameters was developed