64 research outputs found

    Learning Rich Geographical Representations: Predicting Colorectal Cancer Survival in the State of Iowa

    Full text link
    Neural networks are capable of learning rich, nonlinear feature representations shown to be beneficial in many predictive tasks. In this work, we use these models to explore the use of geographical features in predicting colorectal cancer survival curves for patients in the state of Iowa, spanning the years 1989 to 2012. Specifically, we compare model performance using a newly defined metric -- area between the curves (ABC) -- to assess (a) whether survival curves can be reasonably predicted for colorectal cancer patients in the state of Iowa, (b) whether geographical features improve predictive performance, and (c) whether a simple binary representation or richer, spectral clustering-based representation perform better. Our findings suggest that survival curves can be reasonably estimated on average, with predictive performance deviating at the five-year survival mark. We also find that geographical features improve predictive performance, and that the best performance is obtained using richer, spectral analysis-elicited features.Comment: 8 page

    Magnitude Sensitive Competitive Neural Networks

    Get PDF
    En esta Tesis se presentan un conjunto de redes neuronales llamadas Magnitude Sensitive Competitive Neural Networks (MSCNNs). Se trata de un conjunto de algoritmos de Competitive Learning que incluyen un término de magnitud como un factor de modulación de la distancia usada en la competición. Al igual que otros métodos competitivos, MSCNNs realizan la cuantización vectorial de los datos, pero el término de magnitud guía el entrenamiento de los centroides de modo que se representan con alto detalle las zonas deseadas, definidas por la magnitud. Estas redes se han comparado con otros algoritmos de cuantización vectorial en diversos ejemplos de interpolación, reducción de color, modelado de superficies, clasificación, y varios ejemplos sencillos de demostración. Además se introduce un nuevo algoritmo de compresión de imágenes, MSIC (Magnitude Sensitive Image Compression), que hace uso de los algoritmos mencionados previamente, y que consigue una compresión de la imagen variable según una magnitud definida por el usuario. Los resultados muestran que las nuevas redes neuronales MSCNNs son más versátiles que otros algoritmos de aprendizaje competitivo, y presentan una clara mejora en cuantización vectorial sobre ellos cuando el dato está sopesado por una magnitud que indica el ¿interés¿ de cada muestra

    A Novel Hybrid Approach Using Kmeans Clustering and Threshold filter for Brain Tumor Detection

    Get PDF
    Abstract-Medical imaging makes use of the technology to disclose the internal structure of the human body. By means of medical imaging modalities patient's life can be better through a accurate and quick treatment without any side effects. The foremost purpose of this paper is to develop an automated framework that can accurately classify a tumor from abnormal tissues. In this paper, we put forward a hybrid framework that uses the K-means clustering followed by Threshold filter to track down the tumor objects in magnetic resonance (MR) brain images. The main concept in this hybrid framework is to separate the position of tumor objects from other items of an MR image by using Kmeans clustering and Threshold filter. Experiments reveal that the method can successfully achieve segmentation for MR brain images to help pathologists distinguish exactly lesion size and region

    Reconocimiento de notación matemática escrita a mano fuera de línea

    Get PDF
    El reconocimiento automático de expresiones matemáticas es uno de los problemas de reconocimiento de patrones, debido a que las matemáticas representan una fuente valiosa de información en muchos a ́reas de investigación. La escritura de expresiones matemáticas a mano es un medio de comunicación utilizado para la transmisión de información y conocimiento, con la cual se pueden generar de una manera sencilla escritos que contienen notación matemática. Este proceso puede volverse tedioso al ser escrito en lenguaje de composición tipográfica que pueda ser procesada por una computadora, tales como LATEX, MathML, entre otros. En los sistemas de reconocimiento de expresiones matem ́aticas existen dos m ́etodos diferentes a saber: fuera de l ́ınea y en l ́ınea. En esta tesis, se estudia el desempen ̃o de un sistema fuera de l ́ınea en donde se describen los pasos b ́asicos para lograr una mejor precisio ́n en el reconocimiento, las cuales esta ́n divididas en dos pasos principales: recono- cimiento de los s ́ımbolos de las ecuaciones matema ́ticas y el ana ́lisis de la estructura en que est ́an compuestos. Con el fin de convertir una expresi ́on matema ́tica escrita a mano en una expresio ́n equivalente en un sistema de procesador de texto, tal como TEX

    Oversampling for Imbalanced Learning Based on K-Means and SMOTE

    Full text link
    Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 71 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation is made available in the python programming language.Comment: 19 pages, 8 figure

    Diverse personalized recommendations with uncertainty from implicit preference data with the Bayesian Mallows Model

    Full text link
    Clicking data, which exists in abundance and contains objective user preference information, is widely used to produce personalized recommendations in web-based applications. Current popular recommendation algorithms, typically based on matrix factorizations, often have high accuracy and achieve good clickthrough rates. However, diversity of the recommended items, which can greatly enhance user experiences, is often overlooked. Moreover, most algorithms do not produce interpretable uncertainty quantifications of the recommendations. In this work, we propose the Bayesian Mallows for Clicking Data (BMCD) method, which augments clicking data into compatible full ranking vectors by enforcing all the clicked items to be top-ranked. User preferences are learned using a Mallows ranking model. Bayesian inference leads to interpretable uncertainties of each individual recommendation, and we also propose a method to make personalized recommendations based on such uncertainties. With a simulation study and a real life data example, we demonstrate that compared to state-of-the-art matrix factorization, BMCD makes personalized recommendations with similar accuracy, while achieving much higher level of diversity, and producing interpretable and actionable uncertainty estimation.Comment: 27 page
    corecore