4 research outputs found
Exploratory data analysis using self-organising maps defined in up to three dimensions
The SOM is an artificial neural network based on an unsupervised learning process that performs a nonlinear mapping of high dimensional input data onto an ordered and structured array of nodes, designated as the SOM output space. Being simultaneously a quantization algorithm and a projection algorithm, the SOM is able to summarize and map the data, allowing its visualization. Because using the most common visualization methods it is very difficult or even impossible to visualize the SOM defined with more than two dimensions, the SOM output space is generally a regular two dimensional grid of nodes. However, there are no theoretical problems in generating SOMs with higher dimensional output spaces. In this thesis we present evidence that the SOM output space defined in up to three dimensions can be used successfully for the exploratory analysis of spatial data, two-way data and three-way data. Although the differences between the methods that are proposed to visualize each group of data, the approach adopted is commonly based in the projection of colour codes, which are obtained from the output space of 3D SOMs, in some specific bi-dimensional surface, where data can be represented according to its own characteristics. This approach is, in some cases, also complemented with the simultaneous use of SOMs defined in one and two dimensions, so that patterns in data can be properly revealed. The results obtained by using this visualization strategy indicates not only the benefits of using the SOM defined in up to three dimensions but also shows the relevance of the combined and simultaneous use of different models of the SOM in exploratory data analysis
Classifying the suras by their lexical semantics :an exploratory multivariate analysis approach to understanding the Qur'an
PhD ThesisThe Qur'an is at the heart of Islamic culture. Careful, well-informed interpretation of
it is fundamental both to the faith of millions of Muslims throughout the world, and
also to the non-Islamic world's understanding of their religion. There is a long and
venerable tradition of Qur'anic interpretation, and it has necessarily been based on
literary-historical methods for exegesis of hand-written and printed text.
Developments in electronic text representation and analysis since the second half of
the twentieth century now offer the opportunity to supplement traditional techniques
by applying the newly-emergent computational technology of exploratory
multivariate analysis to interpretation of the Qur'an. The general aim of the present
discussion is to take up that opportunity.
Specifically, the discussion develops and applies a methodology for discovering the
thematic structure of the Qur'an based on a fundamental idea in a range of
computationally oriented disciplines: that, with respect to some collection of texts, the
lexical frequency profiles of the individual texts are a good indicator of their semantic
content, and thus provide a reliable criterion for their conceptual categorization
relative to one another. This idea is applied to the discovery of thematic
interrelationships among the suras that constitute the Qur'an by abstracting lexical
frequency data from them and then analyzing that data using exploratory multivariate
methods in the hope that this will generate hypotheses about the thematic structure of
the Qur'an.
The discussion is in eight main parts. The first part introduces the discussion. The
second gives an overview of the structure and thematic content of the Qur'an and of
the tradition of Qur'anic scholarship devoted to its interpretation. The third part
xvi
defines the research question to be addressed together with a methodology for doing
so. The fourth reviews the existing literature on the research question. The fifth
outlines general principles of data creation and applies them to creation of the data on
which the analysis of the Qur'an in this study is based. The sixth outlines general
principles of exploratory multivariate analysis, describes in detail the analytical
methods selected for use, and applies them to the data created in part five. The
seventh part interprets the results of the analyses conducted in part six with reference
to the existing results in Qur'anic interpretation described in part two. And, finally, the
eighth part draws conclusions relative to the research question and identifies
directions along which the work presented in this study can be developed
The anonymous 1821 translation of Goethe's Faust :a cluster analytic approach
PhD ThesisThis study tests the hypothesis proposed by Frederick Burwick and James McKusick in
2007 that Samuel Taylor Coleridge was the author of the anonymous translation of
Goethe's Faust published by Thomas Boosey in 1821. The approach to hypothesis testing
is stylometric. Specifically, function word usage is selected as the stylometric criterion,
and 80 function words are used to define a 73-dimensional function word frequency
profile vector for each text in the corpus of Coleridge's literary works and for a selection
of works by a range of contemporary English authors. Each profile vector is a point in 80-
dimensional vector space, and cluster analytic methods are used to determine the
distribution of profile vectors in the space. If the hypothesis being tested is valid, then the
profile for the 1821 translation should be closer in the space to works known to be by
Coleridge than to works by the other authors. The cluster analytic results show, however,
that this is not the case, and the conclusion is that the Burwick and McKusick hypothesis
is falsified relative to the stylometric criterion and analytic methodology used