10 research outputs found
Image patch analysis and clustering of sunspots: a dimensionality reduction approach
Sunspots, as seen in white light or continuum images, are associated with
regions of high magnetic activity on the Sun, visible on magnetogram images.
Their complexity is correlated with explosive solar activity and so classifying
these active regions is useful for predicting future solar activity. Current
classification of sunspot groups is visually based and suffers from bias.
Supervised learning methods can reduce human bias but fail to optimally
capitalize on the information present in sunspot images. This paper uses two
image modalities (continuum and magnetogram) to characterize the spatial and
modal interactions of sunspot and magnetic active region images and presents a
new approach to cluster the images. Specifically, in the framework of image
patch analysis, we estimate the number of intrinsic parameters required to
describe the spatial and modal dependencies, the correlation between the two
modalities and the corresponding spatial patterns, and examine the phenomena at
different scales within the images. To do this, we use linear and nonlinear
intrinsic dimension estimators, canonical correlation analysis, and
multiresolution analysis of intrinsic dimension.Comment: 5 pages, 7 figures, accepted to ICIP 201
The intrinsic value of HFO features as a biomarker of epileptic activity
High frequency oscillations (HFOs) are a promising biomarker of epileptic
brain tissue and activity. HFOs additionally serve as a prototypical example of
challenges in the analysis of discrete events in high-temporal resolution,
intracranial EEG data. Two primary challenges are 1) dimensionality reduction,
and 2) assessing feasibility of classification. Dimensionality reduction
assumes that the data lie on a manifold with dimension less than that of the
feature space. However, previous HFO analyses have assumed a linear manifold,
global across time, space (i.e. recording electrode/channel), and individual
patients. Instead, we assess both a) whether linear methods are appropriate and
b) the consistency of the manifold across time, space, and patients. We also
estimate bounds on the Bayes classification error to quantify the distinction
between two classes of HFOs (those occurring during seizures and those
occurring due to other processes). This analysis provides the foundation for
future clinical use of HFO features and buides the analysis for other discrete
events, such as individual action potentials or multi-unit activity.Comment: 5 pages, 5 figure
Coarse Graining of Data via Inhomogeneous Diffusion Condensation
Big data often has emergent structure that exists at multiple levels of
abstraction, which are useful for characterizing complex interactions and
dynamics of the observations. Here, we consider multiple levels of abstraction
via a multiresolution geometry of data points at different granularities. To
construct this geometry we define a time-inhomogeneous diffusion process that
effectively condenses data points together to uncover nested groupings at
larger and larger granularities. This inhomogeneous process creates a deep
cascade of intrinsic low pass filters on the data affinity graph that are
applied in sequence to gradually eliminate local variability while adjusting
the learned data geometry to increasingly coarser resolutions. We provide
visualizations to exhibit our method as a continuously-hierarchical clustering
with directions of eliminated variation highlighted at each step. The utility
of our algorithm is demonstrated via neuronal data condensation, where the
constructed multiresolution data geometry uncovers the organization, grouping,
and connectivity between neurons.Comment: 14 pages, 7 figure
An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery
We propose a multi-agent algorithm able to automatically discover relevant
regularities in a given dataset, determining at the same time the set of
configurations of the adopted parametric dissimilarity measure yielding compact
and separated clusters. Each agent operates independently by performing a
Markovian random walk on a suitable weighted graph representation of the input
dataset. Such a weighted graph representation is induced by the specific
parameter configuration of the dissimilarity measure adopted by the agent,
which searches and takes decisions autonomously for one cluster at a time.
Results show that the algorithm is able to discover parameter configurations
that yield a consistent and interpretable collection of clusters. Moreover, we
demonstrate that our algorithm shows comparable performances with other similar
state-of-the-art algorithms when facing specific clustering problems
Image patch analysis of sunspots and active regions. II. Clustering via matrix factorization
Separating active regions that are quiet from potentially eruptive ones is a
key issue in Space Weather applications. Traditional classification schemes
such as Mount Wilson and McIntosh have been effective in relating an active
region large scale magnetic configuration to its ability to produce eruptive
events. However, their qualitative nature prevents systematic studies of an
active region's evolution for example. We introduce a new clustering of active
regions that is based on the local geometry observed in Line of Sight
magnetogram and continuum images. We use a reduced-dimension representation of
an active region that is obtained by factoring the corresponding data matrix
comprised of local image patches. Two factorizations can be compared via the
definition of appropriate metrics on the resulting factors. The distances
obtained from these metrics are then used to cluster the active regions. We
find that these metrics result in natural clusterings of active regions. The
clusterings are related to large scale descriptors of an active region such as
its size, its local magnetic field distribution, and its complexity as measured
by the Mount Wilson classification scheme. We also find that including data
focused on the neutral line of an active region can result in an increased
correspondence between our clustering results and other active region
descriptors such as the Mount Wilson classifications and the value. We
provide some recommendations for which metrics, matrix factorization
techniques, and regions of interest to use to study active regions.Comment: Accepted for publication in the Journal of Space Weather and Space
Climate (SWSC). 33 pages, 12 figure
Applying an automated method of classifying lip morphological traits
Objective:
To apply an automated computerised method to categorise and determine the prevalence of different types of lip traits, and to explore associations between lip traits and sex differences.
Design:
Observational descriptive study utilising an automated method of facial assessment.
Setting and participants:
A total of 4747 children from the Avon Longitudinal Study of Parents and Children (ALSPAC) who each had 3D facial scans carried out at 15 years of age.
Methods:
Each of the participants was automatically categorised regarding predetermined lip morphological traits. Descriptive statistics were applied to report the prevalence of the different types of each trait, and chi-square tests were used to investigate sex differences and associations between traits.
Results:
A total of 4730 individuals were assessed (47% male, 53% female). Eight predetermined lip traits have been reported previously. There were differences in prevalence for all lip traits in male and female patients (all P ⩽ 0.0002), with differences between the sexes described for each trait. For example, a deeply grooved philtrum of average width was more prevalent in boys, and an indentation near the upper vermilion border was more prevalent in girls. Each of the traits was significantly associated with the other traits (all P < 0.0001), with particularly strong associations seen between traits in the same region (e.g. upper lip). Individual associations between traits are reported; for example, a straight lip contour was found to be associated with no true vermilion border in both the upper and lower lip regions.
Conclusion:
The automated computerised method described is an invaluable tool for the categorisation of lip morphological traits. The prevalence of various types of traits has been described. Sexual dimorphism exists for all the lip traits assessed. Generally, each of the traits are associated with all other traits, with individual associations reported
Clustering Algorithms: Their Application to Gene Expression Data
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure
An automatic approach for classification and categorisation of lip morphological traits
Classification of facial traits (e.g., lip shape) is an important area of medical research, for example, in determining associations between lip traits and genetic variants which may lead to a cleft lip. In clinical situations, classification of facial traits is usually performed subjectively directly on the individual or recorded later from a three-dimensional image, which is time consuming and prone to operator errors. The present study proposes, for the first time, an automatic approach for the classification and categorisation of lip area traits. Our approach uses novel three-dimensional geometric features based on surface curvatures measured along geodesic paths between anthropometric landmarks. Different combinations of geodesic features are analysed and compared. The effect of automatically identified categories on the face is visualised using a partial least squares method. The method was applied to the classification and categorisation of six lip shape traits (philtrum, Cupid’s bow, lip contours, lip-chin, and lower lip tone) in a large sample of 4747 faces of normal British Western European descents. The proposed method demonstrates correct automatic classification rate of up to 90%
Clustering with a new distance measure based on a dual-rooted tree
39 pagesInternational audienceThis paper introduces a novel distance measure for clustering high dimensional data based on the hitting time of two Minimal Spanning Trees (MST) grown sequentially from a pair of points by Prim's algorithm. When the proposed measure is used in conjunction with spectral clustering, we obtain a powerful clustering algorithm that is able to separate neighboring non-convex shaped clusters and to account for local as well as global geometric features of the data set. Remarkably, the new distance measure is a true metric even if the Prim algorithm uses a non-metric dissimilarity measure to compute the edges of the MST. This metric property brings added flexibility to the proposed method. In particular, the method is applied to clustering non Euclidean quantities, such as probability distributions or spectra, using the Kullback-Liebler divergence as a base measure. We reduce computational complexity by applying consensus clustering to a small ensemble of dual rooted MSTs. We show that the resultant consensus spectral clustering with dual rooted MST is competitive with other clustering methods, both in terms of clustering performance and computational complexity. We illustrate the proposed clustering algorithm on public domain benchmark data for which the ground truth is known, on one hand, and on real-world astrophysical data on the other hand