3,414 research outputs found

    Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods

    Get PDF
    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing it with the underlying industrial activity structure. Specifically, we apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. In particular, by taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging.Comment: 31 pages, 17 figure

    Cluster validity in clustering methods

    Get PDF

    Exploratory analysis of textual data streams

    Get PDF
    In this paper, we address exploratory analysis of textual data streams and we propose a bootstrapping process based on a combination of keyword similarity and clustering techniques to: (i) classify documents into fine-grained similarity clusters, based on keyword commonalities; (ii) aggregate similar clusters into larger document collections sharing a richer, more user-prominent keyword set that we call topic; (iii) assimilate newly extracted topics of current bootstrapping cycle with existing topics resulting from previous bootstrapping cycles, by linking similar topics of different time periods, if any, to highlight topic trends and evolution. An analysis framework is also defined enabling the topic-based exploration of the underlying textual data stream according to a thematic perspective and a temporal perspective. The bootstrapping process is evaluated on a real data stream of about 330.000 newspaper articles about politics published by the New York Times from Jan 1st 1900 to Dec 31st 2015

    GraphClust: alignment-free structural clustering of local RNA secondary structures

    Get PDF
    Motivation: Clustering according to sequence–structure similarity has now become a generally accepted scheme for ncRNA annotation. Its application to complete genomic sequences as well as whole transcriptomes is therefore desirable but hindered by extremely high computational costs

    Clustering and hierarchy of financial markets data: advantages of the DBHT

    Get PDF
    We present a set of analyses aiming at quantifying the amount of information filtered by di↵erent hierarchical clustering methods on correlations between stock returns. In particular we apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree (DBHT), and we compare it with other methods including the Linkage and k-medoids. In particular by taking the industrial sector classification of stocks as a benchmark partition we evaluate how the di↵erent methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree outperforms the other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at di↵erent levels of the hierarchical structures depending on the clustering method. The dynamical analysis also reveals that the di↵erent methods show di↵erent degrees of sensitivity to financial events, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging

    Network module detection: Affinity search technique with the multi-node topological overlap measure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many clustering procedures only allow the user to input a <it>pairwise </it>dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high <it>multi-node </it>topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis.</p> <p>Findings</p> <p>We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering.</p> <p>Conclusion</p> <p>Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: <url>http://www.genetics.ucla.edu/labs/horvath/MTOM/</url></p

    Communities in temporal networks: from theoretical underpinnings to real-life applications

    Get PDF
    Static aggregations of network activity can unravel attributes of the complex systems they represent. However, they fall short when the structure of the systems changes over time. In some cases, changes are sluggish, such as in power grids, where lines enjoy a lengthy temporal permanence. In others, a high frequency of change is observed, such as on a network of online messages, social contacts, pathogen transmission or ball passing in a soccer game. In these cases, reducing what is inherently a temporal network to a static one, leads necessarily to a loss of information, such as causal relationships, precedence or reachability rules. Temporal networks are thus the main subject of this thesis, centered on the study of network evolution from the point of view of its clusters as significant meso-structures. The thesis has two interrelated parts. In the first, theoretical challenges are addressed and original algorithms, methods and tools are developed that can further the study of network theory. In the second, these developments are applied to the analysis of team invasion sports. A measurement of game dynamics was created based on a temporal network representation of a match, with nodes clustered by spatial proximity. These measurements were found to correlate with match events of known dynamics. Moreover, they reveal unique, multi-level, aspects of the game, from the individual players contributions, to the clusters of interacting players, to their teams and their matches, which is useful for game analysis, training and strategy development.As agregações estáticas das ligações de uma rede podem revelar atributos dos sistemas complexos que representam. Todavia, são insuficientes quando a estrutura dos sistemas se altera com o tempo. Em alguns casos, as transformações são lentas, tais como em redes de transmissão de eletricidade em que as linhas se mantêm inalteráveis por largos períodos de tempo. Noutras, regista-se uma taxa elevada de mudança, como por exemplo numa rede de mensagens em linha, contatos sociais, transmissão de patógenos ou passes num jogo de futebol. Nestes casos, reduzir o que é inerentemente uma rede temporal a uma rede estática, leva a uma perda de informação, tais como relações causais, regras de precedência ou de acessibilidade. Redes temporais são assim o tema desta tese, centrada nos seus agrupamentos, como meso-estruturas significantes. A tese está dividida em duas partes. Na primeira, são considerados problemas teóricos, e são desenvolvidos algoritmos, métodos e ferramentas que avançam o estudo da teoria de redes. Na segunda, estes desenvolvimentos são aplicados à análise de jogos desportivos coletivos de invasão. Foi criada uma medida de dinâmica do jogo, baseada na representação da partida através de uma rede temporal de nós agrupados por proximidade espacial. Os resultados obtidos correlacionam-se com eventos do jogo de dinâmica conhecida. Adicionalmente, esta medida revela aspetos únicos e multi-nível da dinâmica do jogo, desde a contribuição individual do jogador, até aos agrupamentos de jogadores, da equipa e das partidas, útil para a análise do jogo, de treino e de desenvolvimento estratégico

    Comparing hard and overlapping clusterings

    Get PDF
    Similarity measures for comparing clusterings is an important component, e.g., of evaluating clustering algorithms, for consensus clustering, and for clustering stability assessment. These measures have been studied for over 40 years in the domain of exclusive hard clusterings (exhaustive and mutually exclusive object sets). In the past years, the literature has proposed measures to handle more general clusterings (e.g., fuzzy/probabilistic clusterings). This paper provides an overview of these new measures and discusses their drawbacks. We ultimately develop a corrected-for-chance measure (13AGRI) capable of comparing exclusive hard, fuzzy/probabilistic, non-exclusive hard, and possibilistic clusterings. We prove that 13AGRI and the adjusted Rand index (ARI, by Hubert and Arabie) are equivalent in the exclusive hard domain. The reported experiments show that only 13AGRI could provide both a fine-grained evaluation across clusterings with different numbers of clusters and a constant evaluation between random clusterings, showing all the four desirable properties considered here. We identified a high correlation between 13AGRI applied to fuzzy clusterings and ARI applied to hard exclusive clusterings over 14 real data sets from the UCI repository, which corroborates the validity of 13AGRI fuzzy clustering evaluation. 13AGRI also showed good results as a clustering stability statistic for solutions produced by the expectation maximization algorithm for Gaussian mixture
    corecore