9,975 research outputs found
Temporal Feature Selection with Symbolic Regression
Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic
Modifying the Symbolic Aggregate Approximation Method to Capture Segment Trend Information
The Symbolic Aggregate approXimation (SAX) is a very popular symbolic
dimensionality reduction technique of time series data, as it has several
advantages over other dimensionality reduction techniques. One of its major
advantages is its efficiency, as it uses precomputed distances. The other main
advantage is that in SAX the distance measure defined on the reduced space
lower bounds the distance measure defined on the original space. This enables
SAX to return exact results in query-by-content tasks. Yet SAX has an inherent
drawback, which is its inability to capture segment trend information. Several
researchers have attempted to enhance SAX by proposing modifications to include
trend information. However, this comes at the expense of giving up on one or
more of the advantages of SAX. In this paper we investigate three modifications
of SAX to add trend capturing ability to it. These modifications retain the
same features of SAX in terms of simplicity, efficiency, as well as the exact
results it returns. They are simple procedures based on a different
segmentation of the time series than that used in classic-SAX. We test the
performance of these three modifications on 45 time series datasets of
different sizes, dimensions, and nature, on a classification task and we
compare it to that of classic-SAX. The results we obtained show that one of
these modifications manages to outperform classic-SAX and that another one
slightly gives better results than classic-SAX.Comment: International Conference on Modeling Decisions for Artificial
Intelligence - MDAI 2020: Modeling Decisions for Artificial Intelligence pp
230-23
A data analytics-based energy information system (EIS) tool to perform meter-level anomaly detection and diagnosis in buildings
Recently, the spread of smart metering infrastructures has enabled the easier collection of building-related data. It has been proven that a proper analysis of such data can bring significant benefits for the characterization of building performance and spotting valuable saving opportunities. More and more researchers worldwide are focused on the development of more robust frameworks of analysis capable of extracting from meter-level data useful information to enhance the process of energy management in buildings, for instance, by detecting inefficiencies or anomalous energy behavior during operation. This paper proposes an innovative anomaly detection and diagnosis (ADD) methodology to automatically detect at whole-building meter level anomalous energy consumption and then perform a diagnosis on the sub-loads responsible for anomalous patterns. The process consists of multiple steps combining data analytics techniques. A set of evolutionary classification trees is developed to discover frequent and infrequent aggregated energy patterns, properly transformed through an adaptive symbolic aggregate approximation (aSAX) process. Then a post-mining analysis based on association rule mining (ARM) is performed to discover the main sub-loads which mostly affect the anomaly detected at the whole-building level. The methodology is developed and tested on monitored data of a medium voltage/low voltage (MV/LV) transformation cabin of a university campus
DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams
Similarity matching and join of time series data streams has gained a lot of
relevance in today's world that has large streaming data. This process finds
wide scale application in the areas of location tracking, sensor networks,
object positioning and monitoring to name a few. However, as the size of the
data stream increases, the cost involved to retain all the data in order to aid
the process of similarity matching also increases. We develop a novel framework
to addresses the following objectives. Firstly, Dimension reduction is
performed in the preprocessing stage, where large stream data is segmented and
reduced into a compact representation such that it retains all the crucial
information by a technique called Multi-level Segment Means (MSM). This reduces
the space complexity associated with the storage of large time-series data
streams. Secondly, it incorporates effective Similarity Matching technique to
analyze if the new data objects are symmetric to the existing data stream. And
finally, the Pruning Technique that filters out the pseudo data object pairs
and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction
Factor. We have performed exhaustive experimental trials to show that the
proposed framework is both efficient and competent in comparison with earlier
works.Comment: 20 pages,8 figures, 6 Table
A New Time Series Similarity Measurement Method Based on Fluctuation Features
Time series similarity measurement is one of the fundamental tasks in time series data mining, and there are many studies on time series similarity measurement methods. However, the majority of them only calculate the distance between equal-length time series, and also cannot adequately reflect the fluctuation features of time series. To solve this problem, a new time series similarity measurement method based on fluctuation features is proposed in this paper. Firstly, the fluctuation features extraction method of time series is introduced. By defining and identifying fluctuation points, the fluctuation points sequence is obtained to represent the original time series for subsequent analysis. Then, a new similarity measurement (D_SM) is put forward to calculate the distance between different fluctuation points sequences. This method can calculate the distance of unequal-length time series, and it includes two main steps: similarity matching and the distance calculation based on similarity matching. Finally, the experiments are performed on some public time series using agglomerative hierarchical clustering based on D_SM. Compared to some traditional time series similarity measurements, the clustering results show that the proposed method can effectively distinguish time series with similar shapes from different classes and get a visible improvement in clustering accuracy in terms of F-Measure
- …