1,915 research outputs found

    Parallel wavelet transform for spatio-temporal outlier detection in large meteorological data

    Get PDF
    Abstract. This paper describes a state-of-the-art parallel data mining solution that employs wavelet analysis for scalable outlier detection in large complex spatio-temporal data. The algorithm has been implemented on multiprocessor architecture and evaluated on real-world meteorological data. Our solution on high-performance architecture can process massive and complex spatial data at reasonable time and yields improved prediction

    Predicting large scale fine grain energy consumption

    Get PDF
    Today a large volume of energy-related data have been continuously collected. Extracting actionable knowledge from such data is a multi-step process that opens up a variety of interesting and novel research issues across two domains: energy and computer science. The computer science aim is to provide energy scientists with cutting-edge and scalable engines to effectively support them in their daily research activities. This paper presents SPEC, a scalable and distributed predictor of fine grain energy consumption in buildings. SPEC exploits a data stream methodology analysis over a sliding time window to train a prediction model tailored to each building. The building model is then exploited to predict the upcoming energy consumption at a time instant in the near future. SPEC currently integrates the artificial neural networks technique and the random forest regression algorithm. The SPEC methodology exploits the computational advantages of distributed computing frameworks as the current implementation runs on Spark. As a case study, real data of thermal energy consumption collected in a major city have been exploited to preliminarily assess the SPEC accuracy. The initial results are promising and represent a first step towards predicting fine grain energy consumption over a sliding time window

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Self-organizing map algorithm for assessing spatial and temporal patterns of pollutants in environmental compartments: A review

    Get PDF
    The evaluation of the spatial and temporal distribution of pollutants is a crucial issue to assess the anthropogenic burden on the environment. Numerous chemometric approaches are available for data exploration and they have been applied for environmental health assessment purposes. Among the unsupervised methods, Self-Organizing Map (SOM) is an artificial neural network able to handle non-linear problems that can be used for exploratory data analysis, pattern recognition, and variable relationship assessment. Much more interpretation ability is gained when the SOMbased model is merged with clustering algorithms. This review comprises: (i) a description of the algorithm operation principle with a focus on the key parameters used for the SOM initialization; (ii) a description of the SOM output features and how they can be used for data mining; (iii) a list of available software tools for performing calculations; (iv) an overview of the SOM application for obtaining spatial and temporal pollution patterns in the environmental compartments with focus on model training and result visualization; (v) advice on reporting SOM model details in a pape

    Traffic Prediction using Artificial Intelligence: Review of Recent Advances and Emerging Opportunities

    Full text link
    Traffic prediction plays a crucial role in alleviating traffic congestion which represents a critical problem globally, resulting in negative consequences such as lost hours of additional travel time and increased fuel consumption. Integrating emerging technologies into transportation systems provides opportunities for improving traffic prediction significantly and brings about new research problems. In order to lay the foundation for understanding the open research challenges in traffic prediction, this survey aims to provide a comprehensive overview of traffic prediction methodologies. Specifically, we focus on the recent advances and emerging research opportunities in Artificial Intelligence (AI)-based traffic prediction methods, due to their recent success and potential in traffic prediction, with an emphasis on multivariate traffic time series modeling. We first provide a list and explanation of the various data types and resources used in the literature. Next, the essential data preprocessing methods within the traffic prediction context are categorized, and the prediction methods and applications are subsequently summarized. Lastly, we present primary research challenges in traffic prediction and discuss some directions for future research.Comment: Published in Transportation Research Part C: Emerging Technologies (TR_C), Volume 145, 202

    ENVIRONMENTAL MODEL ACCURACY IMPROVEMENT FRAMEWORK USING STATISTICAL TECHNIQUES AND A NOVEL TRAINING APPROACH

    Get PDF
    It is challenging to predict environmental behaviors because of extreme events, such as heatwaves, typhoons, droughts, tsunamis, torrential downpour, wind ramps, or hurricanes. In this thesis, we proposed a novel framework to improve environmental model accuracy with a novel training approach. Extreme event detection algorithms are surveyed, selected, and applied in our proposed framework. The application of statistics in extreme events detection is quite diverse and leads to diverse formulations, which need to be designed for a specific problem. Each formula needs to be tailored specially to work with the available data in the given situation. This diversity is one of the driving forces of this research towards identifying the most common mixture of components utilized in the analysis of extreme events detection. Besides the extreme event detection algorithm, we also integrated the sliding window approach to see how well our models predict future events. To test the proposed framework, we collected coastal data from various sources and obtained the results; we improved the predictive accuracy of various machine learning models by 20% to 25% increase in R2 value using our approach. Apart from that, we organized the discussion along with different extreme event detection types, presented a few outlier definitions, and briefly introduced their techniques. We also summarized the statistical methods involved in the detection of environmental extremes, such as wind ramps and climatic events

    Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

    Full text link
    With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean
    corecore