140,494 research outputs found
Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to aoShakespeare-De Vere Authorship Question
In my previous article entitled Hierarchical and Non- Hierarchical Linear and Non-Linear Clustering Methods to Shakespeare Authorship Question I used Mean Proximity as a linear hierarchical clustering method and Principal Components Analysis as a non-hierarchical linear clustering method Self-Organizing Map U-matrix and Voronoi Map as non-linear clustering methods to examine various works and plays assumed to have been written by Shakespeare and Sir Francis Bacon Christopher Marlowe John Fletcher and Thomas Kyd to determine which of them wrote some of Shakespeare s disputed plays based on similarities in the use of function words word-bi grams and character-tri grams The article showed that Shakespeare is not the author of all the disputed plays traditionally attributed to him according to the validated cluster analytic results and the stylistic criteria used The article also indicated that the author did not consider it fair to include Edward de Vere the strongest candidate in the Shakespeare authorship debate and compare his poemsto Shakespeare s disputed plays because poetry tends to have a particular style and a different structure than plays and additional test was promised The present article provides that test In this article I examined the 154 sonnets traditionally attributed to Shakespeare and 38 surviving poems attributed to Edward de Vere The purpose is to give a hypothesis whether de Vere has an identifiable self-similarity and a measure of how far from similar to Shakespeare based on the use of function words word bi-grams character bi-grams and character tri-grams applying four different clustering methods four hierarchical linear methods using Euclidean distance Single Average Complete and Ward non-hierarchical linear multidimensional Scaling MDS and Kernel K-means clustering and Voronoi mapas non-linear methods The cophenetic correlation coefficient is used to select the best result obtained from a set o
Hierarchical spectral clustering reveals brain size and shape changes in asymptomatic carriers of <i>C9orf72</i>
Traditional methods for detecting asymptomatic brain changes in neurodegenerative diseases such as Alzheimer's disease or frontotemporal degeneration typically evaluate changes in volume at a predefined level of granularity, e.g. voxel-wise or in a priori defined cortical volumes of interest. Here, we apply a method based on hierarchical spectral clustering, a graph-based partitioning technique. Our method uses multiple levels of segmentation for detecting changes in a data-driven, unbiased, comprehensive manner within a standard statistical framework. Furthermore, spectral clustering allows for detection of changes in shape along with changes in size. We performed tensor-based morphometry to detect changes in the Genetic Frontotemporal dementia Initiative asymptomatic and symptomatic frontotemporal degeneration mutation carriers using hierarchical spectral clustering and compared the outcome to that obtained with a more conventional voxel-wise tensor- and voxel-based morphometric analysis. In the symptomatic groups, the hierarchical spectral clustering-based method yielded results that were largely in line with those obtained with the voxel-wise approach. In asymptomatic C9orf72 expansion carriers, spectral clustering detected changes in size in medial temporal cortex that voxel-wise methods could only detect in the symptomatic phase. Furthermore, in the asymptomatic and the symptomatic phases, the spectral clustering approach detected changes in shape in the premotor cortex in C9orf72. In summary, the present study shows the merit of hierarchical spectral clustering for data-driven segmentation and detection of structural changes in the symptomatic and asymptomatic stages of monogenic frontotemporal degeneration.© The Author(s) 2022. Published by Oxford University Press on behalf of the Guarantors of Brain
Finding hens in a haystack: Consistency of movement patterns within and across individual laying hens maintained in large groups
\ua9 2018, The Author(s). We sought to objectively quantify and compare the recorded movement and location patterns of laying hens within a commercial system. Using a custom tracking system, we monitored the location within five zones of a commercial aviary for 13 hens within a flock of 225 animals for a contiguous period of 11 days. Most hens manifested a hen-specific pattern that was (visually) highly consistent across days, though, within that consistency, manifested stark differences between hens. Three different methods were used to classify individual daily datasets into groups based on their similarity: (i) Linear Discriminant Analysis based on six summary variables (transitions into each zone) and total transitions; (ii) Hierarchical Clustering, a na\uefve clustering analysis technique, applied to summary variables and iii) Hierarchical Clustering applied to dissimilarity matrices produced by Dynamic Time Warping. The three methods correctly classified more than 85% of the hen days and provided a unique means to assess behaviour of a system indicating a considerable degree of complexity and structure. We believe the current effort is the first to document these location and movement patterns within a large, complex commercial system with a large potential to influence the assessment of animal welfare, health, and productivity
Selection of informative clusters from hierarchical cluster tree with gene classes
BACKGROUND: A common clustering method in the analysis of gene expression data has been hierarchical clustering. Usually the analysis involves selection of clusters by cutting the tree at a suitable level and/or analysis of a sorted gene list that is obtained with the tree. Cutting of the hierarchical tree requires the selection of a suitable level and it results in the loss of information on the other level. Sorted gene lists depend on the sorting method of the joined clusters. Author proposes that the clusters should be selected using the gene classifications. RESULTS: This article presents a simple method for searching for clusters with the strongest enrichment of gene classes from a cluster tree. The clusters found are presented in the estimated order of importance. The method is demonstrated with a yeast gene expression data set and with two database classifications. The obtained clusters demonstrated a very strong enrichment of functional classes. The obtained clusters are also able to present similar gene groups to those that were observed from the data set in the original analysis and also many gene groups that were not reported in the original analysis. Visualization of the results on top of a cluster tree shows that the method finds informative clusters from several levels of the cluster tree and indicates that the clusters found could not have been obtained by simply cutting the cluster tree. Results were also used in the comparison of cluster trees from different clustering methods. CONCLUSION: The presented method should facilitate the exploratory analysis of big data sets when the associated categorical data is available
Recommended from our members
The role of human factors in stereotyping behavior and perception of digital library users: A robust clustering approach
To deliver effective personalization for digital library users, it is necessary to identify which human factors are most relevant in determining the behavior and perception of these users. This paper examines three key human factors: cognitive styles, levels of expertise and gender differences, and utilizes three individual clustering techniques: k-means, hierarchical clustering and fuzzy clustering to understand user behavior and perception. Moreover, robust clustering, capable of correcting the bias of individual clustering techniques, is used to obtain a deeper understanding. The robust clustering approach produced results that highlighted the relevance of cognitive style for user behavior, i.e., cognitive style dominates and justifies each of the robust clusters created. We also found that perception was mainly determined by the level of expertise of a user. We conclude that robust clustering is an effective technique to analyze user behavior and perception
Recommended from our members
Clustering Scatter Plots Using Data Depth Measures.
Clustering is rapidly becoming a powerful data mining technique, and has been broadly applied to many domains such as bioinformatics and text mining. However, the existing methods can only deal with a data matrix of scalars. In this paper, we introduce a hierarchical clustering procedure that can handle a data matrix of scatter plots. To more accurately reflect the nature of data, we introduce a dissimilarity statistic based on "data depth" to measure the discrepancy between two bivariate distributions without oversimplifying the nature of the underlying pattern. We then combine hypothesis testing with hierarchical clustering to simultaneously cluster the rows and columns of the data matrix of scatter plots. We also propose novel painting metrics and construct heat maps to allow visualization of the clusters. We demonstrate the utility and power of our new clustering method through simulation studies and application to a microbe-host-interaction study
Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network
Bibliographic analysis considers the author's research areas, the citation
network and the paper content among other things. In this paper, we combine
these three in a topic model that produces a bibliographic model of authors,
topics and documents, using a nonparametric extension of a combination of the
Poisson mixed-topic link model and the author-topic model. This gives rise to
the Citation Network Topic Model (CNTM). We propose a novel and efficient
inference algorithm for the CNTM to explore subsets of research publications
from CiteSeerX. The publication datasets are organised into three corpora,
totalling to about 168k publications with about 62k authors. The queried
datasets are made available online. In three publicly available corpora in
addition to the queried datasets, our proposed model demonstrates an improved
performance in both model fitting and document clustering, compared to several
baselines. Moreover, our model allows extraction of additional useful knowledge
from the corpora, such as the visualisation of the author-topics network.
Additionally, we propose a simple method to incorporate supervision into topic
modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin
- …