2,344 research outputs found

    Dimensionality reduction for clustering with deep neural networks

    Get PDF
    Treballs Finals de Grau en Estadística UB-UPC, Facultat d'Economia i Empresa (UB) i Facultat de Matemàtiques i Estadística (UPC), Curs: 2019-2020 Tutors: Ferran Reverter; Esteban Vegas[eng] Nowadays, high dimensional data is ubiquitous: you can think for example in images, videos or texts. Unfortunately, this property can harm seriously the performance of some algorithms. In this project, I analyse how dimensionality reduction can help clustering improve its performance. In order to do that, I distinguish three di erent clustering strategies: Traditional, two-stages and deep clustering. In the rst one, the clustering is applied to the raw data while in the other two it is applied to a low dimensional representation. I focus especially on the latter approach, which has shown promising performance in the last years. The di erences between these approaches are illustrated doing a series of experiments and visualisations and comparing the results.[cat] Actualment, les dades d'alta dimensi o s on omnipresents: es pot pensar, per exemple, en imatges, v deos o textos. Malauradament, aquesta propietat pot perjudicar greument el rendiment d'alguns algorismes. En aquest projecte, analitzo com la reducci o de dimensionalitat pot ajudar a que el clustering millori el seu rendiment. Per fer-ho, distingeixo tres estrat egies de clusteritzaci o diferents: la tradicional, la de dues etapes i el Deep clustering. En la primera, l'agrupaci o s'aplica a les dades brutes mentre que en les altres dos s'aplica a una representaci o de baixa dimensi o. En el treball em centro especialment en aquest ultim enfocament, que ha demostrat un rendiment prometedor en els darrers anys. Les difer encies entre aquestes estrat egies s'il·lustren fent una s erie d'experiments i visualitzacions i comparant els resultats

    Investigating human-perceptual properties of "shapes" using 3D shapes and 2D fonts

    Get PDF
    Shapes are generally used to convey meaning. They are used in video games, films and other multimedia, in diverse ways. 3D shapes may be destined for virtual scenes or represent objects to be constructed in the real-world. Fonts add character to an otherwise plain block of text, allowing the writer to make important points more visually prominent or distinct from other text. They can indicate the structure of a document, at a glance. Rather than studying shapes through traditional geometric shape descriptors, we provide alternative methods to describe and analyse shapes, from a lens of human perception. This is done via the concepts of Schelling Points and Image Specificity. Schelling Points are choices people make when they aim to match with what they expect others to choose but cannot communicate with others to determine an answer. We study whole mesh selections in this setting, where Schelling Meshes are the most frequently selected shapes. The key idea behind image Specificity is that different images evoke different descriptions; but ‘Specific’ images yield more consistent descriptions than others. We apply Specificity to 2D fonts. We show that each concept can be learned and predict them for fonts and 3D shapes, respectively, using a depth image-based convolutional neural network. Results are shown for a range of fonts and 3D shapes and we demonstrate that font Specificity and the Schelling meshes concept are useful for visualisation, clustering, and search applications. Overall, we find that each concept represents similarities between their respective type of shape, even when there are discontinuities between the shape geometries themselves. The ‘context’ of these similarities is in some kind of abstract or subjective meaning which is consistent among different people

    From Free Text to Clusters of Content in Health Records: An Unsupervised Graph Partitioning Approach

    Full text link
    Electronic Healthcare records contain large volumes of unstructured data in different forms. Free text constitutes a large portion of such data, yet this source of richly detailed information often remains under-used in practice because of a lack of suitable methodologies to extract interpretable content in a timely manner. Here we apply network-theoretical tools to the analysis of free text in Hospital Patient Incident reports in the English National Health Service, to find clusters of reports in an unsupervised manner and at different levels of resolution based directly on the free text descriptions contained within them. To do so, we combine recently developed deep neural network text-embedding methodologies based on paragraph vectors with multi-scale Markov Stability community detection applied to a similarity graph of documents obtained from sparsified text vector similarities. We showcase the approach with the analysis of incident reports submitted in Imperial College Healthcare NHS Trust, London. The multiscale community structure reveals levels of meaning with different resolution in the topics of the dataset, as shown by relevant descriptive terms extracted from the groups of records, as well as by comparing a posteriori against hand-coded categories assigned by healthcare personnel. Our content communities exhibit good correspondence with well-defined hand-coded categories, yet our results also provide further medical detail in certain areas as well as revealing complementary descriptors of incidents beyond the external classification. We also discuss how the method can be used to monitor reports over time and across different healthcare providers, and to detect emerging trends that fall outside of pre-existing categories.Comment: 25 pages, 2 tables, 8 figures and 5 supplementary figure

    Guidelines for the use of machine learning to predict student project group academic performance

    Get PDF
    Education plays a crucial role in the growth and development of a country. However, in South Africa, there is a limited capacity and an increasing demand of students seeking an education. In an attempt to address this demand, universities are pressured into accepting more students to increase their throughput. This pressure leads to educators having less time to give students individual attention. This study aims to address this problem by demonstrating how machine learning can be used to predict student group academic performance so that educators may allocate more resources and attention to students and groups at risk. The study focused on data obtained from the third-year capstone project for the diploma in Information Technology at the Nelson Mandela University. Learning analytics and educational data mining and their processes were discussed with an in-depth look at the machine learning techniques involved therein. Artificial neural networks, decision trees and naïve Bayes classifiers were proposed and motivated for prediction modelling. An experiment was performed resulting in proposed guidelines, which give insight and recommendations for the use of machine learning to predict student group academic performance
    • …
    corecore