6,888 research outputs found
Application of k Means Clustering algorithm for prediction of Students Academic Performance
The ability to monitor the progress of students academic performance is a
critical issue to the academic community of higher learning. A system for
analyzing students results based on cluster analysis and uses standard
statistical algorithms to arrange their scores data according to the level of
their performance is described. In this paper, we also implemented k mean
clustering algorithm for analyzing students result data. The model was combined
with the deterministic model to analyze the students results of a private
Institution in Nigeria which is a good benchmark to monitor the progression of
academic performance of students in higher Institution for the purpose of
making an effective decision by the academic planners.Comment: IEEE format, International Journal of Computer Science and
Information Security, IJCSIS January 2010, ISSN 1947 5500,
http://sites.google.com/site/ijcsis
Clustering Time Series from Mixture Polynomial Models with Discretised Data
Clustering time series is an active research area with applications in many fields. One common feature of time series is the likely presence of outliers. These uncharacteristic data can significantly effect the quality of clusters formed. This paper evaluates a method of over-coming the detrimental effects of outliers. We describe some of the alternative approaches to clustering time series, then specify a particular class of model for experimentation with k-means clustering and a correlation based distance metric. For data derived from this class of model we demonstrate that discretising the data into a binary series of above and below the median improves the clustering when the data has outliers. More specifically, we show that firstly discretisation does not significantly effect the accuracy of the clusters when there are no outliers and secondly it significantly increases the accuracy in the presence of outliers, even when the probability of outlier is very low
Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches
We implemented three recently proposed approaches to the identification of
overlapping and hierarchical substructures in graphs and applied the
corresponding algorithms to a network of 492 information-science papers coupled
via their cited sources. The thematic substructures obtained and overlaps
produced by the three hierarchical cluster algorithms were compared to a
content-based categorisation, which we based on the interpretation of titles
and keywords. We defined sets of papers dealing with three topics located on
different levels of aggregation: h-index, webometrics, and bibliometrics. We
identified these topics with branches in the dendrograms produced by the three
cluster algorithms and compared the overlapping topics they detected with one
another and with the three pre-defined paper sets. We discuss the advantages
and drawbacks of applying the three approaches to paper networks in research
fields.Comment: 18 pages, 9 figure
Techniques for clustering gene expression data
Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
Model fusion using fuzzy aggregation: Special applications to metal properties
To improve the modelling performance, one should either propose a new modelling methodology or make the best of existing models. In this paper, the study is concentrated on the latter solution, where a structure-free modelling paradigm is proposed. It does not rely on a fixed structure and can combine various modelling techniques in ‘symbiosis’ using a ‘master fuzzy system’. This approach is shown to be able to include the advantages of different modelling techniques altogether by requiring less training and by minimising the efforts relating optimisation of the final structure. The proposed approach is then successfully applied to the industrial problems of predicting machining induced residual stresses for aerospace alloy components as well as modelling the mechanical properties of heat-treated alloy steels, both representing complex, non-linear and multi-dimensional environments
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
- …