5 research outputs found
Cluster Evaluation of Density Based Subspace Clustering
Clustering real world data often faced with curse of dimensionality, where
real world data often consist of many dimensions. Multidimensional data
clustering evaluation can be done through a density-based approach. Density
approaches based on the paradigm introduced by DBSCAN clustering. In this
approach, density of each object neighbours with MinPoints will be calculated.
Cluster change will occur in accordance with changes in density of each object
neighbours. The neighbours of each object typically determined using a distance
function, for example the Euclidean distance. In this paper SUBCLU, FIRES and
INSCY methods will be applied to clustering 6x1595 dimension synthetic
datasets. IO Entropy, F1 Measure, coverage, accurate and time consumption used
as evaluation performance parameters. Evaluation results showed SUBCLU method
requires considerable time to process subspace clustering; however, its value
coverage is better. Meanwhile INSCY method is better for accuracy comparing
with two other methods, although consequence time calculation was longer.Comment: 6 pages, 15 figure
A Novel Statistical Approach for Clustering Positive Data Based on Finite Inverted Beta-Liouville Mixture Models
Nowadays, a great number of positive data has been occurred naturally in many applications, however, it was not adequately analyzed. In this article, we propose a novel statistical approach for clustering multivariate positive data. Our approach is based on a finite mixture model of inverted Beta-Liouville (IBL) distributions, which is proper choice for modeling and analysis of positive vector data. We develop two different approaches to learn the proposed mixture model. Firstly, the maximum likelihood (ML) is utilized to estimate parameters of the finite inverted Beta-Liouville mixture model in which the right number of mixture components is determined according to the minimum message length (MML) criterion. Secondly, the variational Bayes (VB) is adopted to learn our model where the parameters and the number of mixture components can be determined simultaneously in a unified framework, without the requirement of using information criteria. We investigate the effectiveness of our model by conducting a series of experiments on both synthetic and real data sets
Recursive Parameter Estimation of Non-Gaussian Hidden Markov Models for Occupancy Estimation in Smart Buildings
A significant volume of data has been produced in this era. Therefore, accurately modeling these
data for further analysis and extraction of meaningful patterns is becoming a major concern in a
wide variety of real-life applications. Smart buildings are one of these areas urgently demanding
analysis of data. Managing the intelligent systems in smart homes, will reduce energy consumption
as well as enhance users’ comfort. In this context, Hidden Markov Model (HMM) as a learnable
finite stochastic model has consistently been a powerful tool for data modeling. Thus, we have been
motivated to propose occupancy estimation frameworks for smart buildings through HMM due to
the importance of indoor occupancy estimations in automating environmental settings. One of the
key factors in modeling data with HMM is the choice of the emission probability. In this thesis, we
have proposed novel HMMs extensions through Generalized Dirichlet (GD), Beta-Liouville (BL),
Inverted Dirichlet (ID), Generalized Inverted Dirichlet (GID), and Inverted Beta-Liouville (IBL)
distributions as emission probability distributions. These distributions have been investigated due
to their capabilities in modeling a variety of non-Gaussian data, overcoming the limited covariance
structures of other distributions such as the Dirichlet distribution. The next step after determining
the emission probability is estimating an optimized parameter of the distribution. Therefore, we
have developed a recursive parameter estimation based on maximum likelihood estimation approach
(MLE). Due to the linear complexity of the proposed recursive algorithm, the developed models can
successfully model real-time data, this allowed the models to be used in an extensive range of
practical applications
Two-Level Text Classification Using Hybrid Machine Learning Techniques
Nowadays, documents are increasingly being associated with multi-level
category hierarchies rather than a flat category scheme. To access these
documents in real time, we need fast automatic methods to navigate these
hierarchies. Today’s vast data repositories such as the web also contain many
broad domains of data which are quite distinct from each other e.g. medicine,
education, sports and politics. Each domain constitutes a subspace of the data
within which the documents are similar to each other but quite distinct from the
documents in another subspace. The data within these domains is frequently
further divided into many subcategories.
Subspace Learning is a technique popular with non-text domains such as
image recognition to increase speed and accuracy. Subspace analysis lends
itself naturally to the idea of hybrid classifiers. Each subspace can be
processed by a classifier best suited to the characteristics of that particular
subspace. Instead of using the complete set of full space feature dimensions,
classifier performances can be boosted by using only a subset of the
dimensions.
This thesis presents a novel hybrid parallel architecture using separate
classifiers trained on separate subspaces to improve two-level text
classification. The classifier to be used on a particular input and the relevant
feature subset to be extracted is determined dynamically by using a novel
method based on the maximum significance value. A novel vector
representation which enhances the distinction between classes within the
subspace is also developed. This novel system, the Hybrid Parallel Classifier,
was compared against the baselines of several single classifiers such as the
Multilayer Perceptron and was found to be faster and have higher two-level
classification accuracies. The improvement in performance achieved was even
higher when dealing with more complex category hierarchies
Two-level text classification using hybrid machine learning techniques
Nowadays, documents are increasingly being associated with multi-level category hierarchies rather than a flat category scheme. To access these documents in real time, we need fast automatic methods to navigate these hierarchies. Today’s vast data repositories such as the web also contain many broad domains of data which are quite distinct from each other e.g. medicine, education, sports and politics. Each domain constitutes a subspace of the data within which the documents are similar to each other but quite distinct from the documents in another subspace. The data within these domains is frequently further divided into many subcategories. Subspace Learning is a technique popular with non-text domains such as image recognition to increase speed and accuracy. Subspace analysis lends itself naturally to the idea of hybrid classifiers. Each subspace can be processed by a classifier best suited to the characteristics of that particular subspace. Instead of using the complete set of full space feature dimensions, classifier performances can be boosted by using only a subset of the dimensions. This thesis presents a novel hybrid parallel architecture using separate classifiers trained on separate subspaces to improve two-level text classification. The classifier to be used on a particular input and the relevant feature subset to be extracted is determined dynamically by using a novel method based on the maximum significance value. A novel vector representation which enhances the distinction between classes within the subspace is also developed. This novel system, the Hybrid Parallel Classifier, was compared against the baselines of several single classifiers such as the Multilayer Perceptron and was found to be faster and have higher two-level classification accuracies. The improvement in performance achieved was even higher when dealing with more complex category hierarchies.EThOS - Electronic Theses Online ServiceGBUnited Kingdo