7 research outputs found

    Self-organizing map algorithm as a tool for analysis, visualization and interpretation of electronic nose high dimensional raw data

    Get PDF
    Electronic noses used for outdoor ambient air characterization to assess odor impacts on population can produce large datasets since usually the sampling is conducted with high frequency (e.g. data per minute) for periods that can reach several months, with a number of sensors that ranges usually from four-six as a minimum, up to above thirty. The environmental analyst has thus to deal with large datasets (millions of data) that have to be properly elaborated for obtaining meaningful interpretation of the instrumental signals. A recent review questioned the capability of some classic statistical elaboration tools for application to e-noses, highlighting how very few in field application are present in scientific literature. In the present work we describe: (i) the use of Self-Organizing Map (SOM) algorithm as a tool for analysis and visualization of e-nose raw data collected at a receptor site near a bio-waste composting facility; (ii) a second level clusterization using k-means clustering algorithm to identify "air types" that can be detected at the receptor and (iii) the use of e-nose data related to the plant odour sources as well as odour measurements of ambient air collected at the receptor site, to classify the air types. Eventually we evaluate the frequency and duration of the air type/s identified as malodorous

    Identifying health status of wind turbines by using self organizing maps and interpretation-oriented post-processing tools

    Get PDF
    Identifying the health status of wind turbines becomes critical to reduce the impact of failures on generation costs (between 25–35%). This is a time-consuming task since a human expert has to explore turbines individually. Methods: To optimize this process, we present a strategy based on Self Organizing Maps, clustering and a further grouping of turbines based on the centroids of their SOM clusters, generating groups of turbines that have similar behavior for subsystem failure. The human expert can diagnose the wind farm health by the analysis of a small each group sample. By introducing post-processing tools like Class panel graphs and Traffic lights panels, the conceptualization of the clusters is enhanced, providing additional information of what kind of real scenarios the clusters point out contributing to a better diagnosis. Results: The proposed approach has been tested in real wind farms with different characteristics (number of wind turbines, manufacturers, power, type of sensors, ...) and compared with classical clustering. Conclusions: Experimental results show that the states healthy, unhealthy and intermediate have been detected. Besides, the operational modes identified for each wind turbine overcome those obtained with classical clustering techniques capturing the intrinsic stationarity of the data.Peer ReviewedPostprint (published version

    Novelty Detection And Cluster Analysis In Time Series Data Using Variational Autoencoder Feature Maps

    Get PDF
    The identification of atypical events and anomalies in complex data systems is an essential yet challenging task. The dynamic nature of these systems produces huge volumes of data that is often heterogeneous, and the failure to account for this will impede the detection of anomalies. Time series data encompass these issues and its high dimensional nature intensifies these challenges. This research presents a framework for the identification of anomalies in temporal data. A comparative analysis of Centroid, Density and Neural Network-based clustering techniques was performed and their scalability was assessed. This facilitated the development of a new algorithm called the Variational Autoencoder Feature Map (VAEFM) which is an ensemble method that is based on Kohonen’s Self-Organizing Maps (SOM) and Variational Autoencoders. The VAEFM is an unsupervised learning algorithm that models the distribution of temporal data without making a priori assumptions. It incorporates principles of novelty detection to enhance the representational capacity of SOMs neurons, which improves their ability to generalize with novel data. The VAEFM technique was demonstrated on a dataset of accumulated aircraft sensor recordings, to detect atypical events that transpired in the approach phase of flight. This is a proactive means of accident prevention and is therefore advantageous to the Aviation industry. Furthermore, accumulated aircraft data presents big data challenges, which requires scalable analytical solutions. The results indicated that VAEFM successfully identified temporal dependencies in the flight data and produced several clusters and outliers. It analyzed over 2500 flights in under 5 minutes and identified 12 clusters, two of which contained stabilized approaches. The remaining comprised of aborted approaches, excessively high/fast descent patterns and other contributory factors for unstabilized approaches. Outliers were detected which revealed oscillations in aircraft trajectories; some of which would have a lower detection rate using traditional flight safety analytical techniques. The results further indicated that VAEFM facilitates large-scale analysis and its scaling efficiency was demonstrated on a High Performance Computing System, by using an increased number of processors, where it achieved an average speedup of 70%

    Development of self-organizing methods for radio spectrum sensing

    Get PDF
    A problem of wide-band radio spectrum analysis in real time was solved and presented in the dissertation. The goal of the work was to develop a spectrum sensing method for primary user emission detection in radio spectrum by investigating new signal feature extraction and intelligent decision making techniques. A solution of this problem is important for application in cognitive radio systems, where radio spectrum is analyzed in real time. In thesis there are reviewed currently suggested spectrum analysis methods, which are used for cognitive radio. The main purpose of these methods is to optimize spectrum description feature estimation in real-time systems and to select suitable classification threshold. For signal spectrum description analyzed methods used signal energy estimation, analyzed energy statistical difference in time and frequency. In addition, the review has shown that the wavelet transform can be used for signal pre-processing in spectrum sensors. For classification threshold selection in literature most common methods are based on statistical noise estimate and energy statistical change analysis. However, there are no suggested efficient methods, which let classification threshold to change adaptively, when RF environment changes. It were suggested signal features estimation modifications, which let to increase the efficiency of algorithm implementation in embedded system, by decreasing the amount of required calculations and preserving the accuracy of spectrum analysis algorithms. For primary signal processing it is suggested to use wavelet transform based features extraction, which are used for spectrum sensors and lets to increase accuracy of noisy signal detection. All primary user signal emissions were detected with lower than 1% false alarm ratio. In dissertation, there are suggested artificial neural network based methods, which let adaptively select classification threshold for the spectrum sensors. During experimental tests, there was achieved full signals emissions detection with false alarm ratio lower than 1%. It was suggested self organizing map structure modification, which increases network self-training speed up to 32 times. This self-training speed is achieved due to additional inner weights, which are added in to self organizing map structure. In self-training stage network structure changes especially fast and when topology, which is suited for given task, is reached, in further self-training iterations it can be disordered. In order to avoid this over-training, self-training process monitoring algorithms must be used. There were suggested original methods for self-training process control, which let to avoid network over-training and decrease self-training iteration quantity

    Vector Quantization Techniques for Approximate Nearest Neighbor Search on Large-Scale Datasets

    Get PDF
    The technological developments of the last twenty years are leading the world to a new era. The invention of the internet, mobile phones and smart devices are resulting in an exponential increase in data. As the data is growing every day, finding similar patterns or matching samples to a query is no longer a simple task because of its computational costs and storage limitations. Special signal processing techniques are required in order to handle the growth in data, as simply adding more and more computers cannot keep up.Nearest neighbor search, or similarity search, proximity search or near item search is the problem of finding an item that is nearest or most similar to a query according to a distance or similarity measure. When the reference set is very large, or the distance or similarity calculation is complex, performing the nearest neighbor search can be computationally demanding. Considering today’s ever-growing datasets, where the cardinality of samples also keep increasing, a growing interest towards approximate methods has emerged in the research community.Vector Quantization for Approximate Nearest Neighbor Search (VQ for ANN) has proven to be one of the most efficient and successful methods targeting the aforementioned problem. It proposes to compress vectors into binary strings and approximate the distances between vectors using look-up tables. With this approach, the approximation of distances is very fast, while the storage space requirement of the dataset is minimized thanks to the extreme compression levels. The distance approximation performance of VQ for ANN has been shown to be sufficiently well for retrieval and classification tasks demonstrating that VQ for ANN techniques can be a good replacement for exact distance calculation methods.This thesis contributes to VQ for ANN literature by proposing five advanced techniques, which aim to provide fast and efficient approximate nearest neighbor search on very large-scale datasets. The proposed methods can be divided into two groups. The first group consists of two techniques, which propose to introduce subspace clustering to VQ for ANN. These methods are shown to give the state-of-the-art performance according to tests on prevalent large-scale benchmarks. The second group consists of three methods, which propose improvements on residual vector quantization. These methods are also shown to outperform their predecessors. Apart from these, a sixth contribution in this thesis is a demonstration of VQ for ANN in an application of image classification on large-scale datasets. It is shown that a k-NN classifier based on VQ for ANN performs on par with the k-NN classifiers, but requires much less storage space and computations
    corecore