140,494 research outputs found

    Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to aoShakespeare-De Vere Authorship Question

    Get PDF
    In my previous article entitled Hierarchical and Non- Hierarchical Linear and Non-Linear Clustering Methods to Shakespeare Authorship Question I used Mean Proximity as a linear hierarchical clustering method and Principal Components Analysis as a non-hierarchical linear clustering method Self-Organizing Map U-matrix and Voronoi Map as non-linear clustering methods to examine various works and plays assumed to have been written by Shakespeare and Sir Francis Bacon Christopher Marlowe John Fletcher and Thomas Kyd to determine which of them wrote some of Shakespeare s disputed plays based on similarities in the use of function words word-bi grams and character-tri grams The article showed that Shakespeare is not the author of all the disputed plays traditionally attributed to him according to the validated cluster analytic results and the stylistic criteria used The article also indicated that the author did not consider it fair to include Edward de Vere the strongest candidate in the Shakespeare authorship debate and compare his poemsto Shakespeare s disputed plays because poetry tends to have a particular style and a different structure than plays and additional test was promised The present article provides that test In this article I examined the 154 sonnets traditionally attributed to Shakespeare and 38 surviving poems attributed to Edward de Vere The purpose is to give a hypothesis whether de Vere has an identifiable self-similarity and a measure of how far from similar to Shakespeare based on the use of function words word bi-grams character bi-grams and character tri-grams applying four different clustering methods four hierarchical linear methods using Euclidean distance Single Average Complete and Ward non-hierarchical linear multidimensional Scaling MDS and Kernel K-means clustering and Voronoi mapas non-linear methods The cophenetic correlation coefficient is used to select the best result obtained from a set o

    Hierarchical spectral clustering reveals brain size and shape changes in asymptomatic carriers of <i>C9orf72</i>

    Full text link
    Traditional methods for detecting asymptomatic brain changes in neurodegenerative diseases such as Alzheimer's disease or frontotemporal degeneration typically evaluate changes in volume at a predefined level of granularity, e.g. voxel-wise or in a priori defined cortical volumes of interest. Here, we apply a method based on hierarchical spectral clustering, a graph-based partitioning technique. Our method uses multiple levels of segmentation for detecting changes in a data-driven, unbiased, comprehensive manner within a standard statistical framework. Furthermore, spectral clustering allows for detection of changes in shape along with changes in size. We performed tensor-based morphometry to detect changes in the Genetic Frontotemporal dementia Initiative asymptomatic and symptomatic frontotemporal degeneration mutation carriers using hierarchical spectral clustering and compared the outcome to that obtained with a more conventional voxel-wise tensor- and voxel-based morphometric analysis. In the symptomatic groups, the hierarchical spectral clustering-based method yielded results that were largely in line with those obtained with the voxel-wise approach. In asymptomatic C9orf72 expansion carriers, spectral clustering detected changes in size in medial temporal cortex that voxel-wise methods could only detect in the symptomatic phase. Furthermore, in the asymptomatic and the symptomatic phases, the spectral clustering approach detected changes in shape in the premotor cortex in C9orf72. In summary, the present study shows the merit of hierarchical spectral clustering for data-driven segmentation and detection of structural changes in the symptomatic and asymptomatic stages of monogenic frontotemporal degeneration.© The Author(s) 2022. Published by Oxford University Press on behalf of the Guarantors of Brain

    Finding hens in a haystack: Consistency of movement patterns within and across individual laying hens maintained in large groups

    Get PDF
    \ua9 2018, The Author(s). We sought to objectively quantify and compare the recorded movement and location patterns of laying hens within a commercial system. Using a custom tracking system, we monitored the location within five zones of a commercial aviary for 13 hens within a flock of 225 animals for a contiguous period of 11 days. Most hens manifested a hen-specific pattern that was (visually) highly consistent across days, though, within that consistency, manifested stark differences between hens. Three different methods were used to classify individual daily datasets into groups based on their similarity: (i) Linear Discriminant Analysis based on six summary variables (transitions into each zone) and total transitions; (ii) Hierarchical Clustering, a na\uefve clustering analysis technique, applied to summary variables and iii) Hierarchical Clustering applied to dissimilarity matrices produced by Dynamic Time Warping. The three methods correctly classified more than 85% of the hen days and provided a unique means to assess behaviour of a system indicating a considerable degree of complexity and structure. We believe the current effort is the first to document these location and movement patterns within a large, complex commercial system with a large potential to influence the assessment of animal welfare, health, and productivity

    Selection of informative clusters from hierarchical cluster tree with gene classes

    Get PDF
    BACKGROUND: A common clustering method in the analysis of gene expression data has been hierarchical clustering. Usually the analysis involves selection of clusters by cutting the tree at a suitable level and/or analysis of a sorted gene list that is obtained with the tree. Cutting of the hierarchical tree requires the selection of a suitable level and it results in the loss of information on the other level. Sorted gene lists depend on the sorting method of the joined clusters. Author proposes that the clusters should be selected using the gene classifications. RESULTS: This article presents a simple method for searching for clusters with the strongest enrichment of gene classes from a cluster tree. The clusters found are presented in the estimated order of importance. The method is demonstrated with a yeast gene expression data set and with two database classifications. The obtained clusters demonstrated a very strong enrichment of functional classes. The obtained clusters are also able to present similar gene groups to those that were observed from the data set in the original analysis and also many gene groups that were not reported in the original analysis. Visualization of the results on top of a cluster tree shows that the method finds informative clusters from several levels of the cluster tree and indicates that the clusters found could not have been obtained by simply cutting the cluster tree. Results were also used in the comparison of cluster trees from different clustering methods. CONCLUSION: The presented method should facilitate the exploratory analysis of big data sets when the associated categorical data is available

    Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network

    Full text link
    Bibliographic analysis considers the author's research areas, the citation network and the paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents, using a nonparametric extension of a combination of the Poisson mixed-topic link model and the author-topic model. This gives rise to the Citation Network Topic Model (CNTM). We propose a novel and efficient inference algorithm for the CNTM to explore subsets of research publications from CiteSeerX. The publication datasets are organised into three corpora, totalling to about 168k publications with about 62k authors. The queried datasets are made available online. In three publicly available corpora in addition to the queried datasets, our proposed model demonstrates an improved performance in both model fitting and document clustering, compared to several baselines. Moreover, our model allows extraction of additional useful knowledge from the corpora, such as the visualisation of the author-topics network. Additionally, we propose a simple method to incorporate supervision into topic modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin
    • …
    corecore