54,400 research outputs found
An Efficient Visual Analysis Method for Cluster Tendency Evaluation, Data Partitioning and Internal Cluster Validation
Visual methods have been extensively studied and performed in cluster data analysis. Given a pairwise dissimilarity matrix D of a set of n objects, visual methods such as Enhanced-Visual Assessment Tendency (E-VAT) algorithm generally represent D as an n times n image I( overlineD) where the objects are reordered to expose the hidden cluster structure as dark blocks along the diagonal of the image. A major constraint of such methods is their lack of ability to highlight cluster structure when D contains composite shaped datasets. This paper addresses this limitation by proposing an enhanced visual analysis method for cluster tendency assessment, where D is mapped to D' by graph based analysis and then reordered to overlineD' using E-VAT resulting graph based Enhanced Visual Assessment Tendency (GE-VAT). An Enhanced Dark Block Extraction (E-DBE) for automatic determination of the number of clusters in I( overlineD') is then proposed as well as a visual data partitioning method for cluster formation from I( overlineD') based on the disparity between diagonal and off-diagonal blocks using permuted indices of GE-VAT. Cluster validation measures are also performed to evaluate the cluster formation. Extensive experimental results on several complex synthetic, UCI and large real-world data sets are analyzed to validate our algorithm
Recommended from our members
Bond-Order Time Series Analysis for Detecting Reaction Events in Ab Initio Molecular Dynamics Simulations.
Ab initio molecular dynamics is able to predict novel reaction mechanisms by directly observing the individual reaction events that occur in simulation trajectories. In this article, we describe an approach for detecting reaction events from simulation trajectories using a physically motivated model based on time series analysis of ab initio bond orders. We found that applying a threshold to the bond order was insufficient for accurate detection, whereas peak finding on the first time derivative resulted in significantly improved accuracy. The model is trained on a reference set of reaction events representing the ideal result given unlimited computing resources. Our study includes two model systems: a heptanylium carbocation that undergoes hydride shifts and an unsaturated iron carbonyl cluster that features CO ligand migration and bridging behavior. The results indicate a high level of promise for this analysis approach to be used in mechanistic analysis of reactive AIMD simulations more generally
Applying psychometric modeling to aid feature engineering in predictive log-data analytics. The NAEP EDM Competition
The NAEP EDM Competition required participants to predict efficient test-taking behavior based on log data. This paper describes our top-down approach for engineering features by means of psychometric modeling, aiming at machine learning for the predictive classification task. For feature engineering, we employed, among others, the Log-Normal Response Time Model for estimating latent person speed, and the Generalized Partial Credit Model for estimating latent person ability. Additionally, we adopted an n-gram feature approach for event sequences. Furthermore, instead of using the provided binary target label, we distinguished inefficient test takers who were going too fast and those who were going too slow for training a multi-label classifier. Our best-performing ensemble classifier comprised three sets of low-dimensional classifiers, dominated by test-taker speed. While our classifier reached moderate performance, relative to the competition leaderboard, our approach makes two important contributions. First, we show how classifiers that contain features engineered through literature-derived domain knowledge can provide meaningful predictions if results can be contextualized to test administrators who wish to intervene or take action. Second, our re-engineering of test scores enabled us to incorporate person ability into the models. However, ability was hardly predictive of efficient behavior, leading to the conclusion that the target label\u27s validity needs to be questioned. Beyond competition-related findings, we furthermore report a state sequence analysis for demonstrating the viability of the employed tools. The latter yielded four different test-taking types that described distinctive differences between test takers, providing relevant implications for assessment practice. (DIPF/Orig.
Quality of life in the regions: An exploratory spatial data analysis for West German labor markets
Which of Germanys regions is the most attractive? Where is it best to live and work - on objective grounds? These questions are summed up in the concept quality of life. This paper uses recent research projects that determine this parameter to examine the spatial distribution of quality of life in Germany. For this purpose, an Exploratory Spatial Data Analysis is conducted which focuses on identifying statistically significant (dis-)similarities in space. An initial result of this research is that it is important to choose the aggregation level of administrative units carefully when considering a spatial analysis. The level plays a crucial role in the strength and impact of spatial effects. In concentrating on various labor market areas, this paper identifies a significant spatial autocorrelation in the quality of life, which seems to be characterized by a North-Mid-South divide. In addition, the ESDA results are used to augment the regression specifications, which helps to avoid the occurrence of spatial dependencies in the residuals. --Quality of Life,Exploratory Spatial Data Analysis,Functional Economic Areas,Spatial Econometrics,LISA Dummies
Cognitive network structure: an experimental study
In this paper we present first experimental results about a small group of
people exchanging private and public messages in a virtual community. Our goal
is the study of the cognitive network that emerges during a chat seance. We
used the Derrida coefficient and the triangle structure under the working
assumption that moods and perceived mutual affinity can produce results
complementary to a full semantic analysis. The most outstanding outcome is the
difference between the network obtained considering publicly exchanged messages
and the one considering only privately exchanged messages: in the former case,
the network is very homogeneous, in the sense that each individual interacts in
the same way with all the participants, whilst in the latter the interactions
among different agents are very heterogeneous, and are based on "the enemy of
my enemy is my friend" strategy. Finally a recent characterization of the
triangular cliques has been considered in order to describe the intimate
structure of the network. Experimental results confirm recent theoretical
studies indicating that certain 3-vertex structures can be used as indicators
for the network aging and some relevant dynamical features.Comment: 15 pages, 5 figures, 3 table
- …