472 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Engineering Project Management Modeling Using Artificial Neural Networks
Performance evaluation of the comprehensive management level of engineering projects is advantageous case of study. Benefited from constructive and fluctuant of artificial neural networks (ANN) and based on their self-study, self-adjustment and nonlinear mapping (activation) function of the ANN inputs to outputs the performance evaluation model of engineering project management was established. Compared with conventional method, the influence of human factor is eliminated, thus the correctness of the measured results is increased. Different model structures were discussed with different ANN parameters and satisfactory results were concluded giving a new approach to evaluate the engineering project management. Keywords: ANN structure, training rate, training time, activation function, performance evaluation
Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities
With the increasing amount of spatial-temporal~(ST) ocean data, numerous
spatial-temporal data mining (STDM) studies have been conducted to address
various oceanic issues, e.g., climate forecasting and disaster warning.
Compared with typical ST data (e.g., traffic data), ST ocean data is more
complicated with some unique characteristics, e.g., diverse regionality and
high sparsity. These characteristics make it difficult to design and train STDM
models. Unfortunately, an overview of these studies is still missing, hindering
computer scientists to identify the research issues in ocean while discouraging
researchers in ocean science from applying advanced STDM techniques. To remedy
this situation, we provide a comprehensive survey to summarize existing STDM
studies in ocean. Concretely, we first summarize the widely-used ST ocean
datasets and identify their unique characteristics. Then, typical ST ocean data
quality enhancement techniques are discussed. Next, we classify existing STDM
studies for ocean into four types of tasks, i.e., prediction, event detection,
pattern mining, and anomaly detection, and elaborate the techniques for these
tasks. Finally, promising research opportunities are highlighted. This survey
will help scientists from the fields of both computer science and ocean science
have a better understanding of the fundamental concepts, key techniques, and
open challenges of STDM in ocean
Energy Analytics for Infrastructure: An Application to Institutional Buildings
abstract: Commercial buildings in the United States account for 19% of the total energy consumption annually. Commercial Building Energy Consumption Survey (CBECS), which serves as the benchmark for all the commercial buildings provides critical input for EnergyStar models. Smart energy management technologies, sensors, innovative demand response programs, and updated versions of certification programs elevate the opportunity to mitigate energy-related problems (blackouts and overproduction) and guides energy managers to optimize the consumption characteristics. With increasing advancements in technologies relying on the ‘Big Data,' codes and certification programs such as the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE), and the Leadership in Energy and Environmental Design (LEED) evaluates during the pre-construction phase. It is mostly carried out with the assumed quantitative and qualitative values calculated from energy models such as Energy Plus and E-quest. However, the energy consumption analysis through Knowledge Discovery in Databases (KDD) is not commonly used by energy managers to perform complete implementation, causing the need for better energy analytic framework.
The dissertation utilizes Interval Data (ID) and establishes three different frameworks to identify electricity losses, predict electricity consumption and detect anomalies using data mining, deep learning, and mathematical models. The process of energy analytics integrates with the computational science and contributes to several objectives which are to
1. Develop a framework to identify both technical and non-technical losses using clustering and semi-supervised learning techniques.
2. Develop an integrated framework to predict electricity consumption using wavelet based data transformation model and deep learning algorithms.
3. Develop a framework to detect anomalies using ensemble empirical mode decomposition and isolation forest algorithms.
With a thorough research background, the first phase details on performing data analytics on the demand-supply database to determine the potential energy loss reduction potentials. Data preprocessing and electricity prediction framework in the second phase integrates mathematical models and deep learning algorithms to accurately predict consumption. The third phase employs data decomposition model and data mining techniques to detect the anomalies of institutional buildings.Dissertation/ThesisDoctoral Dissertation Civil, Environmental and Sustainable Engineering 201
Automatic detection of boundary layer height from Doppler lidar using K-means algorithm.
Atmospheric boundary layer height is a parameter of first interest for both research and operational issues. The High Resolution Doppler Lidar from NOAA can measure both backscatter and wind profiles. From these data, this work propose to derive a single estimate of boundary layer height that can deal with unavailability of one type of data. This is done thanks to the K-means algorithm. It has been compared with 2 other methods and it is shown to have large success and availability rates
Solar Power System Plaing & Design
Photovoltaic (PV) and concentrated solar power (CSP) systems for the conversion of solar energy into electricity are technologically robust, scalable, and geographically dispersed, and they possess enormous potential as sustainable energy sources. Systematic planning and design considering various factors and constraints are necessary for the successful deployment of PV and CSP systems. This book on solar power system planning and design includes 14 publications from esteemed research groups worldwide. The research and review papers in this Special Issue fall within the following broad categories: resource assessments, site evaluations, system design, performance assessments, and feasibility studies
Multi-signal Anomaly Detection for Real-Time Embedded Systems
This thesis presents MuSADET, an anomaly detection framework targeting timing anomalies found in event traces from real-time embedded systems. The method leverages stationary event generators, signal processing, and distance metrics to classify inter-arrival time sequences as normal/anomalous. Experimental evaluation of traces collected from two real-time embedded systems provides empirical evidence of MuSADET’s anomaly detection performance.
MuSADET is appropriate for embedded systems, where many event generators are intrinsically recurrent and generate stationary sequences of timestamp. To find timinganomalies, MuSADET compares the frequency domain features of an unknown trace to a normal model trained from well-behaved executions of the system. Each signal in the analysis trace receives a normal/anomalous score, which can help engineers isolate the source of the anomaly.
Empirical evidence of anomaly detection performed on traces collected from an industrygrade hexacopter and the Controller Area Network (CAN) bus deployed in a real vehicle demonstrates the feasibility of the proposed method. In all case studies, anomaly detection did not require an anomaly model while achieving high detection rates. For some of the studied scenarios, the true positive detection rate goes above 99 %, with false-positive rates below one %. The visualization of classification scores shows that some timing anomalies can propagate to multiple signals within the system. Comparison to the similar method, Signal Processing for Trace Analysis (SiPTA), indicates that MuSADET is superior in detection performance and provides complementary information that can help link anomalies to the process where they occurred
Using spatial outliers detection to assess balancing mechanisms in bike sharing systems
International audienceSpatial outliers are objects having a behavior significantly different from their spatial neighbors, in a context where neighbors are heavily correlated. Moran scatterplot is a well-known method that exploits similarity between neighbors in order to detect spatial outliers. In this paper, we proposed first an improved version of Moran scatterplot, using a robust distance metric called Gower's similarity. We used the new version of Moran scatterplot to study the homogeneity of the Parisian bike sharing system (Velib). We carried out different experiments on a real dataset issued from the Velib system. We identified many spatial outliers stations, very different from their neighboring stations (often with much more available bikes or with much more empty docks during the day). Then, we designed and tested a new method that globally improves the distribution of the resources (bikes and docks) among bike stations. This method is motivated by the existence of spatial outliers stations. It relies on a local small change in users behaviors, by adapting their trips to resources' availability around their departure and arrival stations. Results show that, even with a partial users collaboration, the proposed method enhances significantly the global homogeneity of the bike sharing system and therefore the users' satisfaction
- …