259 research outputs found
Ultrametric and Generalized Ultrametric in Computational Logic and in Data Analysis
Following a review of metric, ultrametric and generalized ultrametric, we
review their application in data analysis. We show how they allow us to explore
both geometry and topology of information, starting with measured data. Some
themes are then developed based on the use of metric, ultrametric and
generalized ultrametric in logic. In particular we study approximation chains
in an ultrametric or generalized ultrametric context. Our aim in this work is
to extend the scope of data analysis by facilitating reasoning based on the
data analysis; and to show how quantitative and qualitative data analysis can
be incorporated into logic programming.Comment: 19 pp., 5 figures, 3 table
Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement
We describe many vantage points on the Baire metric and its use in clustering
data, or its use in preprocessing and structuring data in order to support
search and retrieval operations. In some cases, we proceed directly to clusters
and do not directly determine the distances. We show how a hierarchical
clustering can be read directly from one pass through the data. We offer
insights also on practical implications of precision of data measurement. As a
mechanism for treating multidimensional data, including very high dimensional
data, we use random projections.Comment: 17 pages, 45 citations, 2 figure
Ultrametric Distance in Syntax
Phrase structure trees have a hierarchical structure. In many subjects, most notably in Taxonomy such tree structures have been studied using ultrametrics. Here syntactical hierarchical phrase trees are subject to a similar analysis, which is much simpler as the branching structure is more readily discernible and switched. The occurrence of hierarchical structure elsewhere in linguistics is mentioned. The phrase tree can be represented by a matrix and the elements of the matrix can be represented by triangles. The height at which branching occurs is not prescribed in previous syntactic models, but it is by using the ultrametric matrix. In other words the ultrametric approach gives a complete description of phrase trees, unlike previous approaches. The ambiguity of which branching height to choose, is resolved by postulating that branching occurs at the lowest height available. An ultrametric produces a measure of the complexity of sentences: presumably the complexity of sentences increases as a language is acquired so that this can be tested. All ultrametric triangles are equilateral or isoceles, here it is shown that \={X} structure implies that there are no equilateral triangles. Restricting attention to simple syntax a minimum ultrametric distance between lexical categories is calculated. This ultrametric distance is shown to be different than the matrix obtained from features. It is shown that the definition of {\sc c-command} can be replaced by an equivalent ultrametric definition. The new definition invokes a minimum distance between nodes and this is more aesthetically satisfying than previous varieties of definitions. From the new definition of {\sc c-command} follows a new definition of {\sc government}
THE POLITICAL ROBUSTNESS IN INDONESIA
The result of Indonesian legislative election 2004 is analyzed with certain comparative with the previous one (1999). This analysis is constructed by using the graph theoretical analysis by finding the Euclidean distances among political parties. The distances are then treated in ultrametric spaces by using the minimum spanning tree algorithm. By having the Indonesian hierarchical taxonomy model of political parties we show some patterns emerging the pattern agrees with the classical anthropological analysis of socio-political system in Indonesia. This fact accentuates a character of robustness in Indonesian political society as a self-organized system evolves to critical state. Some small perturbations i.e.: different voting process resulting the same pattern and occasions statistically, emerges from the social structure based upon political streams: Islamic, secular, traditional, and some complements of all
The Haar Wavelet Transform of a Dendrogram: Additional Notes
We consider the wavelet transform of a finite, rooted, node-ranked, -way
tree, focusing on the case of binary () trees. We study a Haar wavelet
transform on this tree. Wavelet transforms allow for multiresolution analysis
through translation and dilation of a wavelet function. We explore how this
works in our tree context.Comment: 37 pp, 1 fig. Supplementary material to "The Haar Wavelet Transform
of a Dendrogram", http://arxiv.org/abs/cs.IR/060810
Fast, Linear Time Hierarchical Clustering using the Baire Metric
The Baire metric induces an ultrametric on a dataset and is of linear
computational complexity, contrasted with the standard quadratic time
agglomerative hierarchical clustering algorithm. In this work we evaluate
empirically this new approach to hierarchical clustering. We compare
hierarchical clustering based on the Baire metric with (i) agglomerative
hierarchical clustering, in terms of algorithm properties; (ii) generalized
ultrametrics, in terms of definition; and (iii) fast clustering through k-means
partititioning, in terms of quality of results. For the latter, we carry out an
in depth astronomical study. We apply the Baire distance to spectrometric and
photometric redshifts from the Sloan Digital Sky Survey using, in this work,
about half a million astronomical objects. We want to know how well the (more
costly to determine) spectrometric redshifts can predict the (more easily
obtained) photometric redshifts, i.e. we seek to regress the spectrometric on
the photometric redshifts, and we use clusterwise regression for this.Comment: 27 pages, 6 tables, 10 figure
On Folding and Twisting (and whatknot): towards a characterization of workspaces in syntax
Syntactic theory has traditionally adopted a constructivist approach, in
which a set of atomic elements are manipulated by combinatory operations to
yield derived, complex elements. Syntactic structure is thus seen as the result
or discrete recursive combinatorics over lexical items which get assembled into
phrases, which are themselves combined to form sentences. This view is common
to European and American structuralism (e.g., Benveniste, 1971; Hockett, 1958)
and different incarnations of generative grammar, transformational and
non-transformational (Chomsky, 1956, 1995; and Kaplan & Bresnan, 1982; Gazdar,
1982). Since at least Uriagereka (2002), there has been some attention paid to
the fact that syntactic operations must apply somewhere, particularly when
copying and movement operations are considered. Contemporary syntactic theory
has thus somewhat acknowledged the importance of formalizing aspects of the
spaces in which elements are manipulated, but it is still a vastly
underexplored area. In this paper we explore the consequences of
conceptualizing syntax as a set of topological operations applying over spaces
rather than over discrete elements. We argue that there are empirical
advantages in such a view for the treatment of long-distance dependencies and
cross-derivational dependencies: constraints on possible configurations emerge
from the dynamics of the system.Comment: Manuscript. Do not cite without permission. Comments welcom
Quantitative Coding and Complexity Theory of Compact Metric Spaces
Specifying a computational problem requires fixing encodings for input and
output: encoding graphs as adjacency matrices, characters as integers, integers
as bit strings, and vice versa. For such discrete data, the actual encoding is
usually straightforward and/or complexity-theoretically inessential (up to
polynomial time, say); but concerning continuous data, already real numbers
naturally suggest various encodings with very different computational
properties. With respect to qualitative computability, Kreitz and Weihrauch
(1985) had identified ADMISSIBILITY as crucial property for 'reasonable'
encodings over the Cantor space of infinite binary sequences, so-called
representations [doi:10.1007/11780342_48]: For (precisely) these does the
sometimes so-called MAIN THEOREM apply, characterizing continuity of functions
in terms of continuous realizers.
We rephrase qualitative admissibility as continuity of both the
representation and its multivalued inverse, adopting from
[doi:10.4115/jla.2013.5.7] a notion of sequential continuity for
multifunctions. This suggests its quantitative refinement as criterion for
representations suitable for complexity investigations. Higher-type complexity
is captured by replacing Cantor's as ground space with Baire or any other
(compact) ULTRAmetric space: a quantitative counterpart to equilogical spaces
in computability [doi:10.1016/j.tcs.2003.11.012]
- …