30,420 research outputs found
Relational visual cluster validity
The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to correctly estimate the results of clustering algorithms. Although the visual cluster validity (VCV) method of Hathaway and Bezdek can successfully solve this problem, it can only be applied for object data, i.e. feature measurements. There are very few validity methods that can be used to analyze the validity of data where only a similarity or dissimilarity relation exists – relational data. To tackle this problem, this paper presents a relational visual cluster validity (RVCV) method to assess the validity of clustering relational data. This is done by combining the results of the non-Euclidean relational fuzzy c-means (NERFCM) algorithm with a modification of the VCV method to produce a visual representation of cluster validity. RVCV can cluster complete and incomplete relational data and adds to the visual cluster validity theory. Numeric examples using synthetic and real data are presente
Clustering in relational data and ontologies
Title from PDF of title page (University of Missouri--Columbia, viewed on August 20, 2010).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Dissertation advisor: Dr. James M. Keller.Vita.Ph. D. University of Missouri--Columbia 2010.This dissertation studies the problem of clustering objects represented by relational data. This is a pertinent problem as many real-world data sets can only be represented by relational data for which object-based clustering algorithms are not designed. Relational data are encountered in many fields including biology, management, industrial engineering, and social sciences. Unlike numerical object data, which are represented by a set of feature values (e.g. height, weight, shoe size) of an object, relational object data are the numerical values of (dis) similarity between objects. For this reason, conventional cluster analysis methods such as k-means and fuzzy c-means cannot be used directly with relational data. I focus on three main problems of cluster analysis of relational data: (i) tendency prior to clustering -- how many clusters are there?; (ii) partitioning of objects -- which objects belong to which cluster?; and (iii) validity of the resultant clusters -- are the partitions \good"?Analyses are included in this dissertation that prove that the Visual Assessment of cluster Tendency (VAT) algorithm has a direct relation to single-linkage hierarchical clustering and Dunn's cluster validity index. These analyses are important to the development of two novel clustering algorithms, CLODD-CLustering in Ordered Dissimilarity Data and ReSL-Rectangular Single-Linkage clustering. Last, this dissertation addresses clustering in ontologies; examples include the Gene Ontology, the MeSH ontology, patient medical records, and web documents. I apply an extension to the Self-Organizing Map (SOM) to produce a new algorithm, the OSOM-Ontological Self-Organizing Map. OSOM provides visualization and linguistic summarization of ontology-based data.Includes bibliographical references
Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information
In this paper, we provide an approach to clustering relational matrices whose
entries correspond to either similarities or dissimilarities between objects.
Our approach is based on the value of information, a parameterized,
information-theoretic criterion that measures the change in costs associated
with changes in information. Optimizing the value of information yields a
deterministic annealing style of clustering with many benefits. For instance,
investigators avoid needing to a priori specify the number of clusters, as the
partitions naturally undergo phase changes, during the annealing process,
whereby the number of clusters changes in a data-driven fashion. The
global-best partition can also often be identified.Comment: Submitted to the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP
Numeric Input Relations for Relational Learning with Applications to Community Structure Analysis
Most work in the area of statistical relational learning (SRL) is focussed on
discrete data, even though a few approaches for hybrid SRL models have been
proposed that combine numerical and discrete variables. In this paper we
distinguish numerical random variables for which a probability distribution is
defined by the model from numerical input variables that are only used for
conditioning the distribution of discrete response variables. We show how
numerical input relations can very easily be used in the Relational Bayesian
Network framework, and that existing inference and learning methods need only
minor adjustments to be applied in this generalized setting. The resulting
framework provides natural relational extensions of classical probabilistic
models for categorical data. We demonstrate the usefulness of RBN models with
numeric input relations by several examples.
In particular, we use the augmented RBN framework to define probabilistic
models for multi-relational (social) networks in which the probability of a
link between two nodes depends on numeric latent feature vectors associated
with the nodes. A generic learning procedure can be used to obtain a
maximum-likelihood fit of model parameters and latent feature values for a
variety of models that can be expressed in the high-level RBN representation.
Specifically, we propose a model that allows us to interpret learned latent
feature values as community centrality degrees by which we can identify nodes
that are central for one community, that are hubs between communities, or that
are isolated nodes. In a multi-relational setting, the model also provides a
characterization of how different relations are associated with each community
EXPERIENTIAL VALUE: A HIERARCHICAL MODEL, THE IMPACT ON E-LOYALTY AND A CUSTOMER TYPOLOGY
The main objective of this study is to empirically test a fourth-order hierarchical model of experiential value in an online book and CD setting. In addition, we provide empirical evidence for the role of hedonic and utilitarian value components in creating attitudinal and behavioral loyalty. Finally, we develop an online customer typology, based on the underlying value sources. Based on a sample of 190 visitors of online book and CD retailers, we used PLS to test a third and fourth order hierarchical model of experiential value, emphasizing a hedonic (intrinsic) and utilitarian (extrinsic) value component and the existence of the holistic concept of experiential value. Our results demonstrate that experiential value consists of the third order components hedonic (intrinsic) and utilitarian (extrinsic) value. Both value aspects impact attitudinal loyalty ultimately leading to behavioral loyalty which is also directly affected by utilitarian value. Finally, a nonhierarchical (k-means) cluster analysis identified four segments of online visitors: hedonists, utilitarians, active negativists, and reactive positivists.marketing ;
Using Fuzzy Linguistic Representations to Provide Explanatory Semantics for Data Warehouses
A data warehouse integrates large amounts of extracted and summarized data from multiple sources for direct querying and analysis. While it provides decision makers with easy access to such historical and aggregate data, the real meaning of the data has been ignored. For example, "whether a total sales amount 1,000 items indicates a good or bad sales performance" is still unclear. From the decision makers' point of view, the semantics rather than raw numbers which convey the meaning of the data is very important. In this paper, we explore the use of fuzzy technology to provide this semantics for the summarizations and aggregates developed in data warehousing systems. A three layered data warehouse semantic model, consisting of quantitative (numerical) summarization, qualitative (categorical) summarization, and quantifier summarization, is proposed for capturing and explicating the semantics of warehoused data. Based on the model, several algebraic operators are defined. We also extend the SQL language to allow for flexible queries against such enhanced data warehouses
- …