71,439 research outputs found

    Cluster analysis of financial time series

    Get PDF
    Mestrado em Mathematical FinanceEsta dissertação aplica o método da Signature como medida de similaridade entre dois objetos de séries temporais usando as propriedades de ordem 2 da Signature e aplicando-as a um método de Clustering Asimétrico. O método é comparado com uma abordagem de Clustering mais tradicional, onde a similaridade é medida usando Dynamic Time Warping, desenvolvido para trabalhar com séries temporais. O intuito é considerar a abordagem tradicional como benchmark e compará-la ao método da Signature através do tempo de computação, desempenho e algumas aplicações. Estes métodos são aplicados num conjunto de dados de séries temporais financeiras de Fundos Mútuos do Luxemburgo. Após a revisão da literatura, apresentamos o método Dynamic Time Warping e o método da Signature. Prossegue-se com a explicação das abordagens de Clustering Tradicional, nomeadamente k-Means, e Clustering Espectral Assimétrico, nomeadamente k-Axes, desenvolvido por Atev (2011). O último capítulo é dedicado à Investigação Prática onde os métodos anteriores são aplicados ao conjunto de dados. Os resultados confirmam que o método da Signature têm efectivamente potencial para Machine Learning e previsão, como sugerido por Levin, Lyons and Ni (2013).This thesis applies the Signature method as a measurement of similarities between two time-series objects, using the Signature properties of order 2, and its application to Asymmetric Spectral Clustering. The method is compared with a more Traditional Clustering approach where similarities are measured using Dynamic Time Warping, developed to work with time-series data. The intention for this is to consider the traditional approach as a benchmark and compare it to the Signature method through computation times, performance, and applications. These methods are applied to a financial time series data set of Mutual Exchange Funds from Luxembourg. After the literature review, we introduce the Dynamic Time Warping method and the Signature method. We continue with the explanation of Traditional Clustering approaches, namely k-Means, and Asymmetric Clustering techniques, namely the k-Axes algorithm, developed by Atev (2011). The last chapter is dedicated to Practical Research where the previous methods are applied to the data set. Results confirm that the Signature method has indeed potential for machine learning and prediction, as suggested by Levin, Lyons, and Ni (2013).info:eu-repo/semantics/publishedVersio

    Some contributions to k-means clustering problems

    Get PDF
    k-means clustering is the most common clustering technique for homogeneous data sets. In this thesis we introduced some contributions for problems related to k-means. The first topic, we developed a modification of the k-means algorithm to efficiently partition massive data sets in a semi-supervised framework, i.e. partial information is available. Our algorithms are designed to also work in cases where not all of the groups have representatives in the supervised part of the data set as well as when the total number of groups is not known in advance. We provide strategies for initializing our algorithm and for determining the number of clusters. The second contribution we develop a methodology to model the distribution function of the difference in residuals for a K-groups model against a K\u27 -groups model for assessing if more groups fit the model better (K\u27\u3e K). This leads us to estimate the distribution of a sum of random variables: We provide two possible approaches here, with our first method relying on the theory of non-parametric kernel estimation and a second approximate approach that uses the normal approximation for this tail probability. Finally, we introduce a new merging tool that does not require any distribution assumption. To achieve this we computed the normed residuals, for each cluster realization. These residuals form sample from a non-negative distribution using asymmetric kernel estimation we estimate the miss-classification probability. Further we extend this non-parametric estimation to merge clusters

    Clustering EU countries based on death probabilities

    Get PDF
    Our research is conducted to identify certain grouping of 24 European countries based on their death probabilities. Gathering 2014 data from Human Mortality Database our research objective was twofold. First, we wanted to find homogeneous groups of countries where mortality is similar and for a financial institution they could be grouped as risk communities. Second, we wanted to identify the optimal number of groups as a basis for strategy making. Two different clustering methods were used in our research, k-means and k-median clustering. We applied asymmetric measure (QDEV) in k-median method to handle the differences in country sizes and age groups. Our results are stable but different in k=3 clusters, k-means clustering resulted in a big Western-European cluster and two small-medium Eastern groups; however, k-median clustering gave a homogeneous Eastern group and besides a bigger Western cluster Spain, Italy, and France formed a separated group of countries

    Feature engineering vs. extraction: clustering Brazilian municipalities through spatial panel agricultural data via autoencoders.

    Get PDF
    This article compares the clustering of Brazilian municipalities according to their agricultural diversity using two approaches, one based on feature engineering and the other based on feature extraction using Deep Learning based on autoencoders and cluster analysis based on k-means and Self-Organizing Maps. The analyzes were conducted from panel data referring to IBGE?s annual estimates of Brazilian agricultural production between 1999 and 2018. Different structures of simple stacked undercomplete autoencoders were analyzed, varying the number of layers and neurons in each of them, including the latent layer. The asymmetric exponential linear loss function was also evaluated to cope with the sparse data. The results show that in comparison with the ground truth adopted, the autoencoder model combined with the k-means presented a superior result than the clustering of the raw data from the k-means, demonstrating the ability of simple autoencoders to represent from their latent layer important features of the data. Although the general accuracy is low, the results are promising, considering that we evaluated the most simple strategy for Deep Clustering

    Unsupervised machine learning algorithms as support tools in molecular dynamics simulations

    Get PDF
    Unsupervised Machine Learning algorithms such as clustering offer convenient features for data analysis tasks. When combined with other tools like visualization software, the possibilities of automated analysis may be greatly enhanced. In the context of Molecular Dynamics simulations, in particular asymmetric granular collisions which typically consist of thousands of particles, it is key to distinguish the fragments in which the system is divided after a collision for classification purposes. In this work we explore the unsupervised Machine Learning algorithms k-means and AGNES to distinguish groups of particles in molecular dynamics simulations, with encouraging results according to performance metrics such as accuracy and precision. We also report computational times for each algorithm, where k-means results faster than AGNES. Finally, we delineate the integration of these type of algorithms with a well-known analysis and visualization tool widely used in the physics community.Sociedad Argentina de Informática e Investigación Operativ

    Feature extraction of spatial panel data with autoencoders for clustering the Brazilian agricultural diversity.

    Get PDF
    ABSTRACT - Brazilian agricultural production presents a high degree of spatial diversity, which challenges designing territorial public policies to promote sustainable development. This article proposes a new approach to cluster Brazilian municipalities according to their agricultural production. It combines a feature extraction mechanism using Deep Learning based on Autoencoders and clustering based on k-means and Self-Organizing Maps. We used the panel data from IBGE?s annual estimates of Brazilian agricultural production between 1999 and 2018. Different structures of simple stacked undercomplete autoencoders were analyzed, varying the number of layers and neurons in each of them, including the latent layer. We evaluated the asymmetric exponential linear loss function to cope with the sparse data. The results show that in comparison with the ground truth adopted, the autoencoder model combined with the Self-Organizing Maps and the k-means algorithm presented a better result than the clustering of the raw data from the k-means, demonstrating the ability of simple stacked autoencoders to reduce the dimensionality and create a new space of features in their latent layer where the data can be analyzed and clustered. Although the general accuracy is low, the results are promising, considering that we can add new improvements to the Deep Clustering process.GEOINFO 2022

    Subtractive Approach to fuzzy c-means clustering method

    Get PDF
    Görüntü işleme, uzaktan algılama, veri madenciliği, örüntü tanıma ve benzeri konularda yaygın olarak kullanılan kümeleme yöntemleri, bir grup içindeki benzerliklerin gruplar arasındaki benzerliklerden daha büyük olmasını amaçlamaktadır. Farklı yoğunluklara sahip kümeler içeren veri uzayları için kümeleme işlemi zordur ve bu problemi çözmeye odaklanan birçok çalışma ileri sürülmüştür. K-means ve bulanık c-means kümeleme yöntemlerinin performansı küme merkezlerinin başlangıç değerlerine bağlıdır. Bu yüzden her iki algoritmanın da farklı küme merkezi başlangıç değeri için birçok defa çalıştırılması gerekir. Çıkarımlı kümeleme yöntemi ise veri noktalarının konumlarından veri uzayının yoğun bölgelerini tespit etmeye ve en çok komşuluğa sahip olan veri noktalarını küme merkezi olarak seçmeye dayanır. Bu özelliğiyle başlangıç koşulundan bağımsızdır ve algoritmanın bir kez çalıştırılması yeterlidir. Ancak, küme merkezleri veri noktalarından başka konumlardan saptanamadığı için bu yöntem her veri uzayına uygun olmayabilir. Bu makalede önerilen kümeleme yöntemi sayesinde genel kümeleme yöntemlerindeki başlangıç koşulu, ayrıca çıkarımlı kümeleme yönteminde küme merkezlerinin veri noktalarından seçilme zorunluluğu ortadan kaldırılmıştır. Dört yapay veri uzayı ile test edilen yeni yöntem, k-means, bulanık c-means ve çıkarımlı kümeleme yöntemleri ile karşılaştırılmıştır. Sonuç olarak bulanık c-means ve çıkarımlı kümeleme yöntemlerinin avantajlarını birleştiren yeni yöntem ile bulanık c-means  yönteminin başlangıç koşuluna bağımlılığı ve küme merkezlerinin veri noktalarından seçilmesi zorunluluğu ortadan kaldırılmıştır. Anahtar Kelimeler: başlangıç koşulu, k-means, bulanık c-means, çıkarımlı kümeleme. Data clustering is an important part of cluster analysis. Based on various theories, numerous clustering algorithms have been developed, and new algorithms continue to appear in the literature. The aim of clustering is to obtain the most different groups in a dataset. A clustering method finds the similar data points and puts them into groups. If the groups in a dataset are found, then the dataset can be represented by fewer symbols. In the literature, researchers have proposed many solutions for this issue based on different theories. But there are still some problems such as the optimum cluster centers and the initial condition. The best-known and earliest clustering method is K-means clustering algorithm. Its main advantageous is the capacity of fast converging. In spite of having many successful applications in several fields, it has many drawbacks. The membership values being only 0 or 1 may not always reflect the practical relationship between the data point and the cluster. In order to cope with this drawback, fuzzy c-means method employs fuzzy partitioning so that each data point can belong to several clusters with membership values between 0 and 1. Both clustering techniques try to group the data into given the number of clusters. Another method, subtractive clustering, finds the largest cluster by using the density function, then the second one, and so on.  Subtractive clustering method uses the locations of the data points to calculate the density function. K-means method tends to making homogenous distribution. Fuzzy c-means clustering method makes clusters with soft edges. Subtractive clustering usually tries to find the discreteness. The locations of cluster centers in K-means and fuzzy c-means clustering may not be same for each time because of depending on initial condition. Therefore, they should be run several times for all datasets. Subtractive clustering method has only one solution independent of initial condition; consequently, it is enough to run once. But the main problem of subtractive clustering method is that the cluster centers are selected among data points. Because the cluster centers selected among data points may not represent the clusters of dataset. In this paper, we offer a new approach which combines fuzzy c-means and subtractive clustering methods.  The novel approach takes account of both discreteness and soft edges distribution; so the result has a similar appearance to average of other methods. The three main contributions of new approach can be summarized as: it becomes a more sophisticated technique by taking advantages of fuzzy c-means and subtractive clustering methods; it removes the initial condition. It has also only one solution independent of initial condition as in subtractive clustering method. The novel algorithm consists of the following steps: Step1. Normalize the data points. Step2. Calculate the density value of each data point by Equation (10). Step3. Select the point having the highest density value as cluster center. Step4. Update the densities of each data point by Equation (11). If the number of detected cluster centers is less than the desired number, then go to Step3. Step5. Compute the membership matrix by Equation (7). Step6. Update the cluster centers by Equation (6). Step7. Calculate the cost function by Equation (5). If it is bigger than the selected threshold value, go to Step5. Clustering methods are usually evaluated and tested by using the artificial datasets. These methods must be able to analyze the datasets with different feature and sampling size. Artificial datasets used in the literature have some properties such as symmetric, discrete, and identical form. Therefore, we have used many special datasets in the numeric examples. Finally, the novel approach is successful for both symmetric-identical and asymmetric-non-identical datasets. It also removes dependence on the initial condition in contrast to common KM and FCM clustering methods. Keywords: K-means; fuzzy c-means; subtractive clustering
    corecore