9 research outputs found

    Automatic Clustering with Single Optimal Solution

    Get PDF
    Determining optimal number of clusters in a dataset is a challenging task. Though some methods are available, there is no algorithm that produces unique clustering solution. The paper proposes an Automatic Merging for Single Optimal Solution (AMSOS) which aims to generate unique and nearly optimal clusters for the given datasets automatically. The AMSOS is iteratively merges the closest clusters automatically by validating with cluster validity measure to find single and nearly optimal clusters for the given data set. Experiments on both synthetic and real data have proved that the proposed algorithm finds single and nearly optimal clustering structure in terms of number of clusters, compactness and separation.Comment: 13 pages,4 Tables, 3 figure

    Relational visual cluster validity

    Get PDF
    The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to correctly estimate the results of clustering algorithms. Although the visual cluster validity (VCV) method of Hathaway and Bezdek can successfully solve this problem, it can only be applied for object data, i.e. feature measurements. There are very few validity methods that can be used to analyze the validity of data where only a similarity or dissimilarity relation exists – relational data. To tackle this problem, this paper presents a relational visual cluster validity (RVCV) method to assess the validity of clustering relational data. This is done by combining the results of the non-Euclidean relational fuzzy c-means (NERFCM) algorithm with a modification of the VCV method to produce a visual representation of cluster validity. RVCV can cluster complete and incomplete relational data and adds to the visual cluster validity theory. Numeric examples using synthetic and real data are presente

    Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.</p> <p>Results</p> <p>Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using <it>Drosophila </it>embryos [Additional files <supplr sid="S1">1</supplr>, <supplr sid="S2">2</supplr>], dataset for cell cycle phase identification using HeLa cells [Additional files <supplr sid="S1">1</supplr>, <supplr sid="S3">3</supplr>, <supplr sid="S4">4</supplr>] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a <it>Drosophila </it>genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.</p> <p>Conclusion</p> <p>We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.</p

    PREDICTION OF RESPIRATORY MOTION

    Get PDF
    Radiation therapy is a cancer treatment method that employs high-energy radiation beams to destroy cancer cells by damaging the ability of these cells to reproduce. Thoracic and abdominal tumors may change their positions during respiration by as much as three centimeters during radiation treatment. The prediction of respiratory motion has become an important research area because respiratory motion severely affects precise radiation dose delivery. This study describes recent radiotherapy technologies including tools for measuring target position during radiotherapy and tracking-based delivery systems. In the first part of our study we review three prediction approaches of respiratory motion, i.e., model-based methods, model-free heuristic learning algorithms, and hybrid methods. In the second part of our work we propose respiratory motion estimation with hybrid implementation of extended Kalman filter. The proposed method uses the recurrent neural network as the role of the predictor and the extended Kalman filter as the role of the corrector. In the third part of our work we further extend our research work to present customized prediction of respiratory motion with clustering from multiple patient interactions. For the customized prediction we construct the clustering based on breathing patterns of multiple patients using the feature selection metrics that are composed of a variety of breathing features. In the fourth part of our work we retrospectively categorize breathing data into several classes and propose a new approach to detect irregular breathing patterns using neural networks. We have evaluated the proposed new algorithm by comparing the prediction overshoot and the tracking estimation value. The experimental results of 448 patients’ breathing patterns validated the proposed irregular breathing classifier

    Particle flux transformation in the mesopelagic water column: process analysis and global balance

    Get PDF
    Marine aggregates are an important means of carbon transfers downwards to the deep ocean as well as an important nutritional source for benthic organism communities that are the ultimate recipients of the flux. During these last 10 years, data on size distribution of particulate matter have been collected in different oceanic provinces using an Underwater Video Profiler. The cruise data include simultaneous analyses of particle size distributions as well as additional physical and biological measurements of water properties through the water column. First, size distributions of large aggregates have been compared to simultaneous measurements of particle flux observed in sediment traps. We related sediment trap compositional data to particle size (d) distributions to estimate their vertical fluxes (F) using simple power relationships (F=Ad^b). The spatial resolution of sedimentation processes allowed by the use of in situ particle sizing instruments lead to a more detailed study of the role of physical processes in vertical flux. Second, evolution of the aggregate size distributions with depth was related to overlying primary production and phytoplankton size-distributions on a global scale. A new clustering technique was developed to partition the profiles of aggregate size distributions. Six clusters were isolated. Profiles with a high proportion of large aggregates were found in high-productivity waters while profiles with a high proportion of small aggregates were located in low-productivity waters. The aggregate size and mass flux in the mesopelagic layer were correlated to the nature of primary producers (micro-, nano-, picophytoplankton fractions) and to the amount of integrated chlorophyll a in the euphotic layer using a multiple regression technique on principal components. Finally, a mesoscale area in the North Atlantic Ocean was studied to emphasize the importance of the physical structure of the water column on the horizontal and vertical distribution of particulate matter. The seasonal change in the abundance of aggregates in the upper 1000 m was consistent with changes in the composition and intensity of the particulate flux recorded in sediment traps. In an area dominated by eddies, surface accumulation of aggregates and export down to 1000 m occured at mesoscale distances (<100 km)
    corecore