259 research outputs found

    Ultrametric and Generalized Ultrametric in Computational Logic and in Data Analysis

    Get PDF
    Following a review of metric, ultrametric and generalized ultrametric, we review their application in data analysis. We show how they allow us to explore both geometry and topology of information, starting with measured data. Some themes are then developed based on the use of metric, ultrametric and generalized ultrametric in logic. In particular we study approximation chains in an ultrametric or generalized ultrametric context. Our aim in this work is to extend the scope of data analysis by facilitating reasoning based on the data analysis; and to show how quantitative and qualitative data analysis can be incorporated into logic programming.Comment: 19 pp., 5 figures, 3 table

    Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement

    Full text link
    We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of data measurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.Comment: 17 pages, 45 citations, 2 figure

    Ultrametric Distance in Syntax

    Get PDF
    Phrase structure trees have a hierarchical structure. In many subjects, most notably in Taxonomy such tree structures have been studied using ultrametrics. Here syntactical hierarchical phrase trees are subject to a similar analysis, which is much simpler as the branching structure is more readily discernible and switched. The occurrence of hierarchical structure elsewhere in linguistics is mentioned. The phrase tree can be represented by a matrix and the elements of the matrix can be represented by triangles. The height at which branching occurs is not prescribed in previous syntactic models, but it is by using the ultrametric matrix. In other words the ultrametric approach gives a complete description of phrase trees, unlike previous approaches. The ambiguity of which branching height to choose, is resolved by postulating that branching occurs at the lowest height available. An ultrametric produces a measure of the complexity of sentences: presumably the complexity of sentences increases as a language is acquired so that this can be tested. All ultrametric triangles are equilateral or isoceles, here it is shown that \={X} structure implies that there are no equilateral triangles. Restricting attention to simple syntax a minimum ultrametric distance between lexical categories is calculated. This ultrametric distance is shown to be different than the matrix obtained from features. It is shown that the definition of {\sc c-command} can be replaced by an equivalent ultrametric definition. The new definition invokes a minimum distance between nodes and this is more aesthetically satisfying than previous varieties of definitions. From the new definition of {\sc c-command} follows a new definition of {\sc government}

    THE POLITICAL ROBUSTNESS IN INDONESIA

    Get PDF
    The result of Indonesian legislative election 2004 is analyzed with certain comparative with the previous one (1999). This analysis is constructed by using the graph theoretical analysis by finding the Euclidean distances among political parties. The distances are then treated in ultrametric spaces by using the minimum spanning tree algorithm. By having the Indonesian hierarchical taxonomy model of political parties we show some patterns emerging the pattern agrees with the classical anthropological analysis of socio-political system in Indonesia. This fact accentuates a character of robustness in Indonesian political society as a self-organized system evolves to critical state. Some small perturbations i.e.: different voting process resulting the same pattern and occasions statistically, emerges from the social structure based upon political streams: Islamic, secular, traditional, and some complements of all

    The Haar Wavelet Transform of a Dendrogram: Additional Notes

    Get PDF
    We consider the wavelet transform of a finite, rooted, node-ranked, pp-way tree, focusing on the case of binary (p=2p = 2) trees. We study a Haar wavelet transform on this tree. Wavelet transforms allow for multiresolution analysis through translation and dilation of a wavelet function. We explore how this works in our tree context.Comment: 37 pp, 1 fig. Supplementary material to "The Haar Wavelet Transform of a Dendrogram", http://arxiv.org/abs/cs.IR/060810

    Fast, Linear Time Hierarchical Clustering using the Baire Metric

    Get PDF
    The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.Comment: 27 pages, 6 tables, 10 figure

    On Folding and Twisting (and whatknot): towards a characterization of workspaces in syntax

    Full text link
    Syntactic theory has traditionally adopted a constructivist approach, in which a set of atomic elements are manipulated by combinatory operations to yield derived, complex elements. Syntactic structure is thus seen as the result or discrete recursive combinatorics over lexical items which get assembled into phrases, which are themselves combined to form sentences. This view is common to European and American structuralism (e.g., Benveniste, 1971; Hockett, 1958) and different incarnations of generative grammar, transformational and non-transformational (Chomsky, 1956, 1995; and Kaplan & Bresnan, 1982; Gazdar, 1982). Since at least Uriagereka (2002), there has been some attention paid to the fact that syntactic operations must apply somewhere, particularly when copying and movement operations are considered. Contemporary syntactic theory has thus somewhat acknowledged the importance of formalizing aspects of the spaces in which elements are manipulated, but it is still a vastly underexplored area. In this paper we explore the consequences of conceptualizing syntax as a set of topological operations applying over spaces rather than over discrete elements. We argue that there are empirical advantages in such a view for the treatment of long-distance dependencies and cross-derivational dependencies: constraints on possible configurations emerge from the dynamics of the system.Comment: Manuscript. Do not cite without permission. Comments welcom

    Quantitative Coding and Complexity Theory of Compact Metric Spaces

    Full text link
    Specifying a computational problem requires fixing encodings for input and output: encoding graphs as adjacency matrices, characters as integers, integers as bit strings, and vice versa. For such discrete data, the actual encoding is usually straightforward and/or complexity-theoretically inessential (up to polynomial time, say); but concerning continuous data, already real numbers naturally suggest various encodings with very different computational properties. With respect to qualitative computability, Kreitz and Weihrauch (1985) had identified ADMISSIBILITY as crucial property for 'reasonable' encodings over the Cantor space of infinite binary sequences, so-called representations [doi:10.1007/11780342_48]: For (precisely) these does the sometimes so-called MAIN THEOREM apply, characterizing continuity of functions in terms of continuous realizers. We rephrase qualitative admissibility as continuity of both the representation and its multivalued inverse, adopting from [doi:10.4115/jla.2013.5.7] a notion of sequential continuity for multifunctions. This suggests its quantitative refinement as criterion for representations suitable for complexity investigations. Higher-type complexity is captured by replacing Cantor's as ground space with Baire or any other (compact) ULTRAmetric space: a quantitative counterpart to equilogical spaces in computability [doi:10.1016/j.tcs.2003.11.012]
    • …
    corecore