113,616 research outputs found
Recommended from our members
A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation
open access articleThis article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in the dataset, and it is based on a hybrid structure using deterministic clustering methods and stochastic optimisation approaches to optimally centre the clusters. Similar to other state-of-the-art methods available in the literature, it uses “microclusters” and other established techniques, such as density based clustering. Unlike other methods, it makes use of metaheuristic optimisation to maximise performances during the initialisation phase, which precedes the classic online phase. Experimental results show that OpStream outperforms the state-of-the-art methods in several cases, and it is always competitive against other comparison algorithms regardless of the chosen optimisation method. Three variants of OpStream, each coming with a different optimisation algorithm, are presented in this study. A thorough sensitive analysis is performed by using the best variant to point out OpStream’s robustness to noise and resiliency to parameter changes
On the non-local geometry of turbulence
A multi-scale methodology for the study of the non-local geometry of eddy structures in turbulence is developed. Starting from a given three-dimensional field, this consists of three main steps: extraction, characterization and classification of structures. The extraction step is done in two stages. First, a multi-scale decomposition based on the curvelet transform is applied to the full three-dimensional field, resulting in a finite set of component three-dimensional fields, one per scale. Second, by iso-contouring each component field at one or more iso-contour levels, a set of closed iso-surfaces is obtained that represents the structures at that scale. The characterization stage is based on the joint probability density function (p.d.f.), in terms of area coverage on each individual iso-surface, of two differential-geometry properties, the shape index and curvedness, plus the stretching parameter, a dimensionless global invariant of the surface. Taken together, this defines the geometrical signature of the iso-surface. The classification step is based on the construction of a finite set of parameters, obtained from algebraic functions of moments of the joint p.d.f. of each structure, that specify its location as a point in a multi-dimensional ‘feature space’. At each scale the set of points in feature space represents all structures at that scale, for the specified iso-contour value. This then allows the application, to the set, of clustering techniques that search for groups of structures with a common geometry. Results are presented of a first application of this technique to a passive scalar field obtained from 5123 direct numerical simulation of scalar mixing by forced, isotropic turbulence (Reλ = 265). These show transition, with decreasing scale, from blob-like structures in the larger scales to blob- and tube-like structures with small or moderate stretching in the inertial range of scales, and then toward tube and, predominantly, sheet-like structures with high level of stretching in the dissipation range of scales. Implications of these results for the dynamical behaviour of passive scalar stirring and mixing by turbulence are discussed
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
A computational framework to emulate the human perspective in flow cytometric data analysis
Background: In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation.
<p/>Results: To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods.
<p/>Conclusions: The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics
Image mining: trends and developments
[Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining
- …