23,933 research outputs found

    Taming Wild High Dimensional Text Data with a Fuzzy Lash

    Full text link
    The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy

    Gray Image extraction using Fuzzy Logic

    Full text link
    Fuzzy systems concern fundamental methodology to represent and process uncertainty and imprecision in the linguistic information. The fuzzy systems that use fuzzy rules to represent the domain knowledge of the problem are known as Fuzzy Rule Base Systems (FRBS). On the other hand image segmentation and subsequent extraction from a noise-affected background, with the help of various soft computing methods, are relatively new and quite popular due to various reasons. These methods include various Artificial Neural Network (ANN) models (primarily supervised in nature), Genetic Algorithm (GA) based techniques, intensity histogram based methods etc. providing an extraction solution working in unsupervised mode happens to be even more interesting problem. Literature suggests that effort in this respect appears to be quite rudimentary. In the present article, we propose a fuzzy rule guided novel technique that is functional devoid of any external intervention during execution. Experimental results suggest that this approach is an efficient one in comparison to different other techniques extensively addressed in literature. In order to justify the supremacy of performance of our proposed technique in respect of its competitors, we take recourse to effective metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), Peak Signal to Noise Ratio (PSNR).Comment: 8 pages, 5 figures, Fuzzy Rule Base, Image Extraction, Fuzzy Inference System (FIS), Membership Functions, Membership values,Image coding and Processing, Soft Computing, Computer Vision Accepted and published in IEEE. arXiv admin note: text overlap with arXiv:1206.363

    Fuzzy Clustering of Histopathological Images Using Deep Learning Embeddings

    Get PDF
    Metric learning is a machine learning approach that aims to learn a new distance metric by increasing (reducing) the similarity of examples belonging to the same (different) classes. The output of these approaches are embeddings, where the input data are mapped to improve a crisp or fuzzy classification process. The deep metric learning approaches regard metric learning, implemented by using deep neural networks. Such models have the advantage to discover very representative nonlinear embeddings. In this work, we propose a triplet network deep metric learning approach, based on ResNet50, to find a representative embedding for the unsupervised fuzzy classification of benign and malignant histopathological images of breast cancer tissues. Experiments computed on the BreakHis benchmark dataset, using Fuzzy C-Means Clustering, show the benefit of using very low dimensional embeddings found by the deep metric learning approach

    A Fully Unsupervised Texture Segmentation Algorithm

    No full text
    This paper presents a fully unsupervised texture segmentation algorithm by using a modified discrete wavelet frames decomposition and a mean shift algorithm. By fully unsupervised, we mean the algorithm does not require any knowledge of the type of texture present nor the number of textures in the image to be segmented. The basic idea of the proposed method is to use the modified discrete wavelet frames to extract useful information from the image. Then, starting from the lowest level, the mean shift algorithm is used together with the fuzzy c-means clustering to divide the data into an appropriate number of clusters. The data clustering process is then refined at every level by taking into account the data at that particular level. The final crispy segmentation is obtained at the root level. This approach is applied to segment a variety of composite texture images into homogeneous texture areas and very good segmentation results are reported

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust
    corecore