54,400 research outputs found

    An Efficient Visual Analysis Method for Cluster Tendency Evaluation, Data Partitioning and Internal Cluster Validation

    Get PDF
    Visual methods have been extensively studied and performed in cluster data analysis. Given a pairwise dissimilarity matrix D of a set of n objects, visual methods such as Enhanced-Visual Assessment Tendency (E-VAT) algorithm generally represent D as an n times n image I( overlineD) where the objects are reordered to expose the hidden cluster structure as dark blocks along the diagonal of the image. A major constraint of such methods is their lack of ability to highlight cluster structure when D contains composite shaped datasets. This paper addresses this limitation by proposing an enhanced visual analysis method for cluster tendency assessment, where D is mapped to D' by graph based analysis and then reordered to overlineD' using E-VAT resulting graph based Enhanced Visual Assessment Tendency (GE-VAT). An Enhanced Dark Block Extraction (E-DBE) for automatic determination of the number of clusters in I( overlineD') is then proposed as well as a visual data partitioning method for cluster formation from I( overlineD') based on the disparity between diagonal and off-diagonal blocks using permuted indices of GE-VAT. Cluster validation measures are also performed to evaluate the cluster formation. Extensive experimental results on several complex synthetic, UCI and large real-world data sets are analyzed to validate our algorithm

    Applying psychometric modeling to aid feature engineering in predictive log-data analytics. The NAEP EDM Competition

    Get PDF
    The NAEP EDM Competition required participants to predict efficient test-taking behavior based on log data. This paper describes our top-down approach for engineering features by means of psychometric modeling, aiming at machine learning for the predictive classification task. For feature engineering, we employed, among others, the Log-Normal Response Time Model for estimating latent person speed, and the Generalized Partial Credit Model for estimating latent person ability. Additionally, we adopted an n-gram feature approach for event sequences. Furthermore, instead of using the provided binary target label, we distinguished inefficient test takers who were going too fast and those who were going too slow for training a multi-label classifier. Our best-performing ensemble classifier comprised three sets of low-dimensional classifiers, dominated by test-taker speed. While our classifier reached moderate performance, relative to the competition leaderboard, our approach makes two important contributions. First, we show how classifiers that contain features engineered through literature-derived domain knowledge can provide meaningful predictions if results can be contextualized to test administrators who wish to intervene or take action. Second, our re-engineering of test scores enabled us to incorporate person ability into the models. However, ability was hardly predictive of efficient behavior, leading to the conclusion that the target label\u27s validity needs to be questioned. Beyond competition-related findings, we furthermore report a state sequence analysis for demonstrating the viability of the employed tools. The latter yielded four different test-taking types that described distinctive differences between test takers, providing relevant implications for assessment practice. (DIPF/Orig.

    Quality of life in the regions: An exploratory spatial data analysis for West German labor markets

    Get PDF
    Which of Germanys regions is the most attractive? Where is it best to live and work - on objective grounds? These questions are summed up in the concept quality of life. This paper uses recent research projects that determine this parameter to examine the spatial distribution of quality of life in Germany. For this purpose, an Exploratory Spatial Data Analysis is conducted which focuses on identifying statistically significant (dis-)similarities in space. An initial result of this research is that it is important to choose the aggregation level of administrative units carefully when considering a spatial analysis. The level plays a crucial role in the strength and impact of spatial effects. In concentrating on various labor market areas, this paper identifies a significant spatial autocorrelation in the quality of life, which seems to be characterized by a North-Mid-South divide. In addition, the ESDA results are used to augment the regression specifications, which helps to avoid the occurrence of spatial dependencies in the residuals. --Quality of Life,Exploratory Spatial Data Analysis,Functional Economic Areas,Spatial Econometrics,LISA Dummies

    Cognitive network structure: an experimental study

    Get PDF
    In this paper we present first experimental results about a small group of people exchanging private and public messages in a virtual community. Our goal is the study of the cognitive network that emerges during a chat seance. We used the Derrida coefficient and the triangle structure under the working assumption that moods and perceived mutual affinity can produce results complementary to a full semantic analysis. The most outstanding outcome is the difference between the network obtained considering publicly exchanged messages and the one considering only privately exchanged messages: in the former case, the network is very homogeneous, in the sense that each individual interacts in the same way with all the participants, whilst in the latter the interactions among different agents are very heterogeneous, and are based on "the enemy of my enemy is my friend" strategy. Finally a recent characterization of the triangular cliques has been considered in order to describe the intimate structure of the network. Experimental results confirm recent theoretical studies indicating that certain 3-vertex structures can be used as indicators for the network aging and some relevant dynamical features.Comment: 15 pages, 5 figures, 3 table
    • …
    corecore