1,281 research outputs found

    Clustering Algorithms: Their Application to Gene Expression Data

    Get PDF
    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

    Spatial analysis for the distribution of cells in tissue sections

    Get PDF
    Spatial analysis, playing an essential role in data mining, is applied in a considerable number of fields. It is because of its broad applicability that dealing with the interdisciplinary issues is becoming more prevalent. It aims at exploring the underlying patterns of the data. In this project, we will employ the methodology that we utilize to tackle spatial problems to investigate how the cells distribute in the infected tissue sections and if there are clusters existing among the cells. The cells that are neighboring to the viruses are of interest. The data were provided by the Medetect Company in the form of 2-dimensional point data. We firstly adopted two common spatial analysis methods, clustering methods and proximity methods. In addition, a method for constructing a 2-dimensional hull was developed in order to delineate the compartments in tissue sections. A binomial test was conducted to evaluate the results. It is detectable that the clusters do exist among cells. The immune cells would accumulate around the viruses. We also found different patterns near and far away from viruses. This study implicates that the cells are interactive with each other and thus present the spatial patterns. However, our analyses are restricted in a planar circumstance instead of treating them in 3-dimensional space. For the further study, the spatial analysis could be carried out in three dimensions.It has been popular to utilize the heuristic methods or the existing methods to discover new findings and explain the mysterious phenomena in other subjects. And it is known that everything in nature relates to each other. In this sense, we could assume that the entire distribution of objects is relative to the locations of individuals. The idea of my work is attempting to explore this spatial relationship existing among cells. In my project, the relationships between individual cells or groups of cells are interesting. Our data is presented like the point cloud. It is doubted that if there are any groups existing among these points and if the viruses have neighbors. The methods are mainly categorized into three parts. The first method is to integrate the similar objects into groups. Here the similar objects could be the ones that are close to each other. The second method analyzes the degree of closeness between objects and looks for the neighbors of viruses. The last method can be used to draw the border of a point cloud, which seems like constructing the boundary of districts. Within each method, we carried out the corresponding case studies. Since similar objects can be grouped together, it is interesting to look into the details of each group. Thus we can know which two objects are similar in the same group. Basically, different types of cells in the same group can be checked and studied. In the closeness analysis, we found that some cells are indeed closer to each other. The constructed border could help us know the shape of point clouds. It can be concluded that the spatial relationship does exist among the cells. Groups of cells can be identified at a large extent. And one certain type of cells could be more attracted by some cells from a local level. However, this study is carried out in a 2D space. Actually, we neglect the real shape of cells which have heights. This could be a more interesting topic in the future

    The usefulness of robust multivariate methods: A case study with the menu items of a fast food restaurant chain

    Get PDF
    Multivariate statistical methods have been playing an important role in statistics and data analysis for a very long time. Nowadays, with the increase in the amounts of data collected every day in many disciplines, and with the raise of data science, machine learning and applied statistics, that role is even more important. Two of the most widely used multivariate statistical methods are cluster analysis and principal component analysis. These, similarly to many other models and algorithms, are adequate when the data satisfies certain assumptions. However, when the distribution of the data is not normal and/or it shows heavy tails and outlying observations, the classic models and algorithms might produce erroneous conclusions. Robust statistical methods such as algorithms for robust cluster analysis and for robust principal component analysis are of great usefulness when analyzing contaminated data with outlying observations. In this paper we consider a data set containing the products available in a fast food restaurant chain together with their respective nutritional information, and discuss the usefulness of robust statistical methods for classification, clustering and data visualization

    Data Patterns Discovery Using Unsupervised Learning

    Get PDF
    Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child\u27s self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in children and comprehensive experimental results on clustering binary healthcare data. By using various distances and linkage methods, resampling techniques of imbalanced data, and feature selection preprocessing in a clustering framework, we find associations among patients and an Adjusted Rand Index (ARI) of 76.26\

    An overview of clustering methods with guidelines for application in mental health research

    Get PDF
    Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and librarie

    Finding groups in data: Cluster analysis with ants

    Get PDF
    Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

    Development of a R package to facilitate the learning of clustering techniques

    Get PDF
    This project explores the development of a tool, in the form of a R package, to ease the process of learning clustering techniques, how they work and what their pros and cons are. This tool should provide implementations for several different clustering techniques with explanations in order to allow the student to get familiar with the characteristics of each algorithm by testing them against several different datasets while deepening their understanding of them through the explanations. Additionally, these explanations should adapt to the input data, making the tool not only adept for self-regulated learning but for teaching too.Grado en IngenierĂ­a InformĂĄtic

    A review of clustering techniques and developments

    Full text link
    © 2017 Elsevier B.V. This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering, are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted

    Performance Assessment of The Extended Gower Coefficient on Mixed Data with Varying Types of Functional Data.

    Get PDF
    Clustering is a widely used technique in data mining applications to source, manage, analyze and extract vital information from large amounts of data. Most clustering procedures are limited in their performance when it comes to data with mixed attributes. In recent times, mixed data have evolved to include directional and functional data. In this study, we will give an introduction to clustering with an eye towards the application of the extended Gower coefficient by Hendrickson (2014). We will conduct a simulation study to assess the performance of this coefficient on mixed data whose functional component has strictly-decreasing signal curves and also those whose functional component has a mixture of strictly-decreasing signal curves and periodic tendencies. We will assess how four different hierarchical clustering algorithms perform on mixed data simulated under varying conditions with and without weights. The comparison of the various clustering solutions will be done using the Rand Index
    • 

    corecore