51,444 research outputs found
Shape Modeling with Spline Partitions
Shape modelling (with methods that output shapes) is a new and important task
in Bayesian nonparametrics and bioinformatics. In this work, we focus on
Bayesian nonparametric methods for capturing shapes by partitioning a space
using curves. In related work, the classical Mondrian process is used to
partition spaces recursively with axis-aligned cuts, and is widely applied in
multi-dimensional and relational data. The Mondrian process outputs
hyper-rectangles. Recently, the random tessellation process was introduced as a
generalization of the Mondrian process, partitioning a domain with non-axis
aligned cuts in an arbitrary dimensional space, and outputting polytopes.
Motivated by these processes, in this work, we propose a novel parallelized
Bayesian nonparametric approach to partition a domain with curves, enabling
complex data-shapes to be acquired. We apply our method to HIV-1-infected human
macrophage image dataset, and also simulated datasets sets to illustrate our
approach. We compare to support vector machines, random forests and
state-of-the-art computer vision methods such as simple linear iterative
clustering super pixel image segmentation. We develop an R package that is
available at
\url{https://github.com/ShufeiGe/Shape-Modeling-with-Spline-Partitions}
Fuzzy Clustering for Image Segmentation Using Generic Shape Information
The performance of clustering algorithms for image segmentation are highly sensitive to the features used and types of objects in the image, which ultimately limits their generalization capability. This provides strong motivation to investigate integrating shape information into the clustering framework to improve the generality of these algorithms. Existing shape-based clustering techniques mainly focus on circular and elliptical clusters and so are unable to segment arbitrarily-shaped objects. To address this limitation, this paper presents a new shape-based algorithm called fuzzy clustering for image segmentation using generic shape information (FCGS), which exploits the B-spline representation of an object's shape in combination with the Gustafson-Kessel clustering algorithm. Qualitative and quantitative results for FCGS confirm its superior segmentation performance consistently compared to well-established shape-based clustering techniques, for a wide range of test images comprising various regular and arbitrary-shaped objects
Multiorder neurons for evolutionary higher-order clustering and growth
This letter proposes to use multiorder neurons for clustering irregularly shaped data arrangements. Multiorder neurons are an evolutionary extension of the use of higher-order neurons in clustering. Higher-order neurons parametrically model complex neuron shapes by replacing the classic synaptic weight by higher-order tensors. The multiorder neuron goes one step further and eliminates two problems associated with higher-order neurons. First, it uses evolutionary algorithms to select the best neuron order for a given problem. Second, it obtains more information about the underlying data distribution by identifying the correct order for a given cluster of patterns. Empirically we observed that when the correlation of clusters found with ground truth information is used in measuring clustering accuracy, the proposed evolutionary multiorder neurons method can be shown to outperform other related clustering methods. The simulation results from the Iris, Wine, and Glass data sets show significant improvement when compared to the results obtained using self-organizing maps and higher-order neurons. The letter also proposes an intuitive model by which multiorder neurons can be grown, thereby determining the number of clusters in data
Recommended from our members
A hybrid methodology for data clustering
This thesis introduces and evaluates a new hybrid method for the searching for groups in data - a process referred to as cluster analysis. The Agglomerative - Partitional Clustering methodology (APC) proposed in this work is a novel solution to the clustering problem intended for use with large, noisy data sets and capable of recovering clusters of arbitrary shape.
Large sample size, noise and nonhyperellipsoidal cluster shapes can create difficulties for many clustering algorithms. Many commonly used clustering techniques are too inefficient to handle large data sets found in many data analysis problems or are limited by the fact that they implicitly or explicitly define clusters as being hyperellipsoidal (i.e. “globular” in shape) and can therefore fail to recover other types of cluster structure. Moreover, the presence of noise can also make detection of cluster structures problematic, particularly for clustering techniques that are explicitly designed to handle nonhyperellipsoidal cluster structures.
APC is able to circumvent these difficulties by hybridising a number of diverse approaches to clustering. Large data sets are dealt with by hybridising fast pattern partitioning techniques with hierarchical and density search methods. Arbitrary cluster shapes are handled by a unique linked line segment representation of cluster shape. In short, rather than representing clusters with their centroids, the clusters are represented via a piecewise linear approximation of the cluster structure. This enables APC to represent any cluster shape that is piecewise linearly approximatable.
The purpose of this thesis, therefore, is to introduce APC and to evaluate the ability of APC to recover cluster structure under the conditions described above. First, it is argued that there is a dearth of clustering techniques that can process large, noisy data sets where there exists arbitrarily shaped clusters. Next, the APC approach to clustering is described in detail. Here it is discussed how APC is able to handle voluminous and noisy data without being constrained to any particular cluster shapes. Moreover, as APC represents a hybridisation of clustering strategies and techniques, different ways of implementing APC are also evaluated.
The remainder of this thesis is concerned with the evaluation of APC. First, APC is empirically compared to other clustering methods via Monte Carlo simulation on a number of complex data sets. A wide variety of experimental conditions examining cluster shape, dispersion, noise and dimensionality are covered. The use of APC as a data reduction method is also examined. This final experiment also highlights the utility of the linked line segment representation of cluster shape proposed in this thesis.
Finally, the concluding chapter summarises the results and limitations of this thesis and discusses some future directions this research could take
A computational framework to emulate the human perspective in flow cytometric data analysis
Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation.
<p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods.
<p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics
- …