3 research outputs found

    Qualitative Data Clustering to Detect Outliers

    Get PDF
    Detecting outliers is a widely studied problem in many disciplines, including statistics, data mining, and machine learning. All anomaly detection activities are aimed at identifying cases of unusual behavior compared to most observations. There are many methods to deal with this issue, which are applicable depending on the size of the data set, the way it is stored, and the type of attributes and their values. Most of them focus on traditional datasets with a large number of quantitative attributes. The multitude of solutions related to detecting outliers in quantitative sets, a large and still has a small number of research solutions is a problem detecting outliers in data containing only qualitative variables. This article was designed to compare three different categorical data clustering algorithms: K-modes algorithm taken from MacQueen’s K-means algorithm and the STIRR and ROCK algorithms. The comparison concerned the method of dividing the set into clusters and, in particular, the outliers detected by algorithms. During the research, the authors analyzed the clusters detected by the indicated algorithms, using several datasets that differ in terms of the number of objects and variables. They have conducted experiments on the parameters of the algorithms. The presented study made it possible to check whether the algorithms similarly detect outliers in the data and how much they depend on individual parameters and parameters of the set, such as the number of variables, tuples, and categories of a qualitative variable

    Detecting clusters in spatially correlated waveforms

    Get PDF
    Seismic networks often record signals characterized by similar shapes that provide important information according to their geographic positions. We propose an approach to identify homogeneous clusters of seismic waves, combining analysis of waveforms with metadata and spectrogram information. In waveforms clustering, cross-correlation measures between signals may presents some limitations, so we refer to more recent contributes relating data-depth based clustering analysis. The mechanism for alignment is also an important topic of the analysis: warping (or aligning) procedures identify nuisance effects in phase variation, that, if ignored, may result in a possible loss of information and the immediate consequence is that the underlying pattern could not be retained. The effectiveness of the approach is investigated by mean of real data. The data consist of a set of recordings of 21 earthquakes in the Centre of Italy with magnitude more than 5.5 mw, provided by the seismic network RAN (Rete Accelerometrica Nazionale) managed by the Italian Department of Civil Protection, are obtained from ESM/ITACA database (esm.mi.ing.it; itaca.mi.ingv.it).The signals were recorded by stations, whose distances from the epicenter are in the range from 50 to 100 km. The goal is dividing the spatial domain into homogeneous clusters and extracting information from the shapes of the underlying curves. This work is supported by National grant MIUR, PRIN-2015 program, Prot.20157PRZC4: Complex space-time modeling and functional analysis for probabilistic forecast of seismic events
    corecore