40 research outputs found

    Comparison of clustering algorithms for thyroid database

    Get PDF
    The main idea of this paper is to propose a methodology for analyzing, visualizing and clustering data of patients with different symptoms from a thyroid database. In previous work, the thyroid data were analyzed using the WITT algorithm. This clustering method properly formed the clusters of a control group and hypothyroid patients but failed to cluster the hyperthyroid patients. In this paper we analyzed the data using several algorithms: K-means, hierarchical clustering, EM algorithm, DBSCAN and Cobweb algorithm. The main idea is to determine the degree of matching between the clusters produced and the class labels in order to determine which algorithms give better results. Classification-oriented measures are used to validate the clustering results. We propose several preprocessing steps to overcome the problems with the large amount of noise and unbalanced classes in the given data set

    The structure and formation of natural categories

    Get PDF
    Categorization and concept formation are critical activities of intelligence. These processes and the conceptual structures that support them raise important issues at the interface of cognitive psychology and artificial intelligence. The work presumes that advances in these and other areas are best facilitated by research methodologies that reward interdisciplinary interaction. In particular, a computational model is described of concept formation and categorization that exploits a rational analysis of basic level effects by Gluck and Corter. Their work provides a clean prescription of human category preferences that is adapted to the task of concept learning. Also, their analysis was extended to account for typicality and fan effects, and speculate on how the concept formation strategies might be extended to other facets of intelligence, such as problem solving

    Detección de roles en sistemas de gestión de procesos

    Get PDF
    Los sistemas de gestión de procesos se han hecho populares en muchos dominios ya que permiten la gestión íntegra de los procesos dentro de las organizaciones. A pesar de las ventajas que su utilización representa, se han encontrado muchos problemas. Para ayudar a aminorar estos problemas ha surgido lo que se conoce como Minería de Procesos. Su objetivo es extraer información de los logs de ejecución con el fin de generar conocimiento adicional que ayude a mejorar el diseño y la ejecución de los procesos. En este trabajo se presenta un enfoque de la Minería de Procesos aplicado a la detección de roles organizacionales utilizando el algoritmo EM para clustering, el cual resultó propicio para este fin luego de compararlo con otras técnicas.Sociedad Argentina de Informática e Investigación Operativ

    A review of clustering techniques and developments

    Full text link
    © 2017 Elsevier B.V. This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering, are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted

    Morphological variability of phenotypic traits in of oregano samples

    Get PDF
    The purpose of the research was to study the morphological variability of collection samples of oregano of the Crimean Peninsula. The experiments were carried out in 2016–2018 in the Foothill Zone of Crimea. The plant material consisted of 41 samples of origanum collected on the Crimean Peninsula territory. The degree of identification reliability of oregano collection samples by morphological traits was checked. The construction of relationship dendrograms was carried out by the Ward’s method based on the Manhattan distances. It was found that qualitative traits (coloration of corolla, leaf, bract, stalk and male fertility) showed themselves more consistently than quantitative ones. It was recommended to use the most polymorphic traits (entropy, H > 1.50 bits) for reliable identification of oregano samples from the Crimean peninsula: coloration of bract, stem, leaf and corolla, as well as the number of shoots and mass fraction of essential oil. The structure of the association differed by the years of study when constructing dendrograms (r = 0.58).Nevertheless, a fairly clear correspondence of the clusters of different years’ clusters to each other was established (78% of the samples). The established correspondence indicates the reliability of the genotypes combination into separate groups (clusters) and their similar reaction to environmental conditions. The most interesting combinations of samples for further breeding work were identified – these are clusters 2 and 5 (according to the 2018 data). In 42.7% of genotypes from the second cluster, the mass fraction of essential oil was at the level of 0.25–0.55% of the absolute dry mass (4–6 points). The samples from the second cluster could be used as high-oil sources, whereas samples from fifth cluster – as sources of high productivity of ‘green’ raw materials (up to 1,200 g plant-1 ). It is advisable to select parental forms from these two clusters for hybridization. The grouping of origanum samples used in the work divides the samples quite accurately separated them not only on qualitative, but also on economically valuable traits

    Computerized cancer malignancy grading of fine needle aspirates

    Get PDF
    According to the World Health Organization, breast cancer is a leading cause of death among middle-aged women. Precise diagnosis and correct treatment significantly reduces the high number of deaths caused by breast cancer. Being successful in the treatment strictly relies on the diagnosis. Specifically, the accuracy of the diagnosis and the stage at which a cancer was diagnosed. Precise and early diagnosis has a major impact on the survival rate, which indicates how many patients will live after the treatment. For many years researchers in medical and computer science fields have been working together to find the approach for precise diagnosis. For this thesis, precise diagnosis means finding a cancer at as early a stage as possible by developing new computer aided diagnostic tools. These tools differ depending on the type of cancer and the type of the examination that is used for diagnosis. This work concentrates on cytological images of breast cancer that are produced during fine needle aspiration biopsy examination. This kind of examination allows pathologists to estimate the malignancy of the cancer with very high accuracy. Malignancy estimation is very important when assessing a patients survival rate and the type of treatment. To achieve precise malignancy estimation, a classification framework is presented. This framework is able to classify breast cancer malignancy into two malignancy classes and is based on features calculated according to the Bloom-Richardson grading scheme. This scheme is commonly used by pathologists when grading breast cancer tissue. In Bloom-Richardson scheme two types of features are assessed depending on the magnification. Low magnification images are used for examining the dispersion of the cells in the image while the high magnification images are used for precise analysis of the cells' nuclear features. In this thesis, different types of segmentation algorithms were compared to estimate the algorithm that allows for relatively fast and accurate nuclear segmentation. Based on that segmentation a set of 34 features was extracted for further malignancy classification. For classification purposes 6 different classifiers were compared. From all of the tests a set of the best preforming features were chosen. The presented system is able to classify images of fine needle aspiration biopsy slides with high accurac
    corecore