472 research outputs found

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    Engineering Project Management Modeling Using Artificial Neural Networks

    Get PDF
    Performance evaluation of the comprehensive management level of engineering projects is advantageous case of study. Benefited from constructive and fluctuant of artificial neural networks (ANN) and based on their self-study, self-adjustment and nonlinear mapping (activation) function of the ANN inputs to outputs the performance evaluation model of engineering project management was established. Compared with conventional method, the influence of human factor is eliminated, thus the correctness of the measured results is increased. Different model structures were discussed with different ANN parameters and satisfactory results were concluded giving a new approach to evaluate the engineering project management. Keywords: ANN structure, training rate, training time, activation function, performance evaluation

    Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

    Full text link
    With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean

    Energy Analytics for Infrastructure: An Application to Institutional Buildings

    Get PDF
    abstract: Commercial buildings in the United States account for 19% of the total energy consumption annually. Commercial Building Energy Consumption Survey (CBECS), which serves as the benchmark for all the commercial buildings provides critical input for EnergyStar models. Smart energy management technologies, sensors, innovative demand response programs, and updated versions of certification programs elevate the opportunity to mitigate energy-related problems (blackouts and overproduction) and guides energy managers to optimize the consumption characteristics. With increasing advancements in technologies relying on the ‘Big Data,' codes and certification programs such as the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE), and the Leadership in Energy and Environmental Design (LEED) evaluates during the pre-construction phase. It is mostly carried out with the assumed quantitative and qualitative values calculated from energy models such as Energy Plus and E-quest. However, the energy consumption analysis through Knowledge Discovery in Databases (KDD) is not commonly used by energy managers to perform complete implementation, causing the need for better energy analytic framework. The dissertation utilizes Interval Data (ID) and establishes three different frameworks to identify electricity losses, predict electricity consumption and detect anomalies using data mining, deep learning, and mathematical models. The process of energy analytics integrates with the computational science and contributes to several objectives which are to 1. Develop a framework to identify both technical and non-technical losses using clustering and semi-supervised learning techniques. 2. Develop an integrated framework to predict electricity consumption using wavelet based data transformation model and deep learning algorithms. 3. Develop a framework to detect anomalies using ensemble empirical mode decomposition and isolation forest algorithms. With a thorough research background, the first phase details on performing data analytics on the demand-supply database to determine the potential energy loss reduction potentials. Data preprocessing and electricity prediction framework in the second phase integrates mathematical models and deep learning algorithms to accurately predict consumption. The third phase employs data decomposition model and data mining techniques to detect the anomalies of institutional buildings.Dissertation/ThesisDoctoral Dissertation Civil, Environmental and Sustainable Engineering 201

    Automatic detection of boundary layer height from Doppler lidar using K-means algorithm.

    Get PDF
    Atmospheric boundary layer height is a parameter of first interest for both research and operational issues. The High Resolution Doppler Lidar from NOAA can measure both backscatter and wind profiles. From these data, this work propose to derive a single estimate of boundary layer height that can deal with unavailability of one type of data. This is done thanks to the K-means algorithm. It has been compared with 2 other methods and it is shown to have large success and availability rates

    Solar Power System Plaing & Design

    Get PDF
    Photovoltaic (PV) and concentrated solar power (CSP) systems for the conversion of solar energy into electricity are technologically robust, scalable, and geographically dispersed, and they possess enormous potential as sustainable energy sources. Systematic planning and design considering various factors and constraints are necessary for the successful deployment of PV and CSP systems. This book on solar power system planning and design includes 14 publications from esteemed research groups worldwide. The research and review papers in this Special Issue fall within the following broad categories: resource assessments, site evaluations, system design, performance assessments, and feasibility studies

    Multi-signal Anomaly Detection for Real-Time Embedded Systems

    Get PDF
    This thesis presents MuSADET, an anomaly detection framework targeting timing anomalies found in event traces from real-time embedded systems. The method leverages stationary event generators, signal processing, and distance metrics to classify inter-arrival time sequences as normal/anomalous. Experimental evaluation of traces collected from two real-time embedded systems provides empirical evidence of MuSADET’s anomaly detection performance. MuSADET is appropriate for embedded systems, where many event generators are intrinsically recurrent and generate stationary sequences of timestamp. To find timinganomalies, MuSADET compares the frequency domain features of an unknown trace to a normal model trained from well-behaved executions of the system. Each signal in the analysis trace receives a normal/anomalous score, which can help engineers isolate the source of the anomaly. Empirical evidence of anomaly detection performed on traces collected from an industrygrade hexacopter and the Controller Area Network (CAN) bus deployed in a real vehicle demonstrates the feasibility of the proposed method. In all case studies, anomaly detection did not require an anomaly model while achieving high detection rates. For some of the studied scenarios, the true positive detection rate goes above 99 %, with false-positive rates below one %. The visualization of classification scores shows that some timing anomalies can propagate to multiple signals within the system. Comparison to the similar method, Signal Processing for Trace Analysis (SiPTA), indicates that MuSADET is superior in detection performance and provides complementary information that can help link anomalies to the process where they occurred

    Using spatial outliers detection to assess balancing mechanisms in bike sharing systems

    Get PDF
    International audienceSpatial outliers are objects having a behavior significantly different from their spatial neighbors, in a context where neighbors are heavily correlated. Moran scatterplot is a well-known method that exploits similarity between neighbors in order to detect spatial outliers. In this paper, we proposed first an improved version of Moran scatterplot, using a robust distance metric called Gower's similarity. We used the new version of Moran scatterplot to study the homogeneity of the Parisian bike sharing system (Velib). We carried out different experiments on a real dataset issued from the Velib system. We identified many spatial outliers stations, very different from their neighboring stations (often with much more available bikes or with much more empty docks during the day). Then, we designed and tested a new method that globally improves the distribution of the resources (bikes and docks) among bike stations. This method is motivated by the existence of spatial outliers stations. It relies on a local small change in users behaviors, by adapting their trips to resources' availability around their departure and arrival stations. Results show that, even with a partial users collaboration, the proposed method enhances significantly the global homogeneity of the bike sharing system and therefore the users' satisfaction
    corecore