10 research outputs found

    Image patch analysis and clustering of sunspots: a dimensionality reduction approach

    Full text link
    Sunspots, as seen in white light or continuum images, are associated with regions of high magnetic activity on the Sun, visible on magnetogram images. Their complexity is correlated with explosive solar activity and so classifying these active regions is useful for predicting future solar activity. Current classification of sunspot groups is visually based and suffers from bias. Supervised learning methods can reduce human bias but fail to optimally capitalize on the information present in sunspot images. This paper uses two image modalities (continuum and magnetogram) to characterize the spatial and modal interactions of sunspot and magnetic active region images and presents a new approach to cluster the images. Specifically, in the framework of image patch analysis, we estimate the number of intrinsic parameters required to describe the spatial and modal dependencies, the correlation between the two modalities and the corresponding spatial patterns, and examine the phenomena at different scales within the images. To do this, we use linear and nonlinear intrinsic dimension estimators, canonical correlation analysis, and multiresolution analysis of intrinsic dimension.Comment: 5 pages, 7 figures, accepted to ICIP 201

    The intrinsic value of HFO features as a biomarker of epileptic activity

    Full text link
    High frequency oscillations (HFOs) are a promising biomarker of epileptic brain tissue and activity. HFOs additionally serve as a prototypical example of challenges in the analysis of discrete events in high-temporal resolution, intracranial EEG data. Two primary challenges are 1) dimensionality reduction, and 2) assessing feasibility of classification. Dimensionality reduction assumes that the data lie on a manifold with dimension less than that of the feature space. However, previous HFO analyses have assumed a linear manifold, global across time, space (i.e. recording electrode/channel), and individual patients. Instead, we assess both a) whether linear methods are appropriate and b) the consistency of the manifold across time, space, and patients. We also estimate bounds on the Bayes classification error to quantify the distinction between two classes of HFOs (those occurring during seizures and those occurring due to other processes). This analysis provides the foundation for future clinical use of HFO features and buides the analysis for other discrete events, such as individual action potentials or multi-unit activity.Comment: 5 pages, 5 figure

    Coarse Graining of Data via Inhomogeneous Diffusion Condensation

    Full text link
    Big data often has emergent structure that exists at multiple levels of abstraction, which are useful for characterizing complex interactions and dynamics of the observations. Here, we consider multiple levels of abstraction via a multiresolution geometry of data points at different granularities. To construct this geometry we define a time-inhomogeneous diffusion process that effectively condenses data points together to uncover nested groupings at larger and larger granularities. This inhomogeneous process creates a deep cascade of intrinsic low pass filters on the data affinity graph that are applied in sequence to gradually eliminate local variability while adjusting the learned data geometry to increasingly coarser resolutions. We provide visualizations to exhibit our method as a continuously-hierarchical clustering with directions of eliminated variation highlighted at each step. The utility of our algorithm is demonstrated via neuronal data condensation, where the constructed multiresolution data geometry uncovers the organization, grouping, and connectivity between neurons.Comment: 14 pages, 7 figure

    An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery

    Full text link
    We propose a multi-agent algorithm able to automatically discover relevant regularities in a given dataset, determining at the same time the set of configurations of the adopted parametric dissimilarity measure yielding compact and separated clusters. Each agent operates independently by performing a Markovian random walk on a suitable weighted graph representation of the input dataset. Such a weighted graph representation is induced by the specific parameter configuration of the dissimilarity measure adopted by the agent, which searches and takes decisions autonomously for one cluster at a time. Results show that the algorithm is able to discover parameter configurations that yield a consistent and interpretable collection of clusters. Moreover, we demonstrate that our algorithm shows comparable performances with other similar state-of-the-art algorithms when facing specific clustering problems

    Image patch analysis of sunspots and active regions. II. Clustering via matrix factorization

    Full text link
    Separating active regions that are quiet from potentially eruptive ones is a key issue in Space Weather applications. Traditional classification schemes such as Mount Wilson and McIntosh have been effective in relating an active region large scale magnetic configuration to its ability to produce eruptive events. However, their qualitative nature prevents systematic studies of an active region's evolution for example. We introduce a new clustering of active regions that is based on the local geometry observed in Line of Sight magnetogram and continuum images. We use a reduced-dimension representation of an active region that is obtained by factoring the corresponding data matrix comprised of local image patches. Two factorizations can be compared via the definition of appropriate metrics on the resulting factors. The distances obtained from these metrics are then used to cluster the active regions. We find that these metrics result in natural clusterings of active regions. The clusterings are related to large scale descriptors of an active region such as its size, its local magnetic field distribution, and its complexity as measured by the Mount Wilson classification scheme. We also find that including data focused on the neutral line of an active region can result in an increased correspondence between our clustering results and other active region descriptors such as the Mount Wilson classifications and the RR value. We provide some recommendations for which metrics, matrix factorization techniques, and regions of interest to use to study active regions.Comment: Accepted for publication in the Journal of Space Weather and Space Climate (SWSC). 33 pages, 12 figure

    Applying an automated method of classifying lip morphological traits

    Get PDF
    Objective: To apply an automated computerised method to categorise and determine the prevalence of different types of lip traits, and to explore associations between lip traits and sex differences. Design: Observational descriptive study utilising an automated method of facial assessment. Setting and participants: A total of 4747 children from the Avon Longitudinal Study of Parents and Children (ALSPAC) who each had 3D facial scans carried out at 15 years of age. Methods: Each of the participants was automatically categorised regarding predetermined lip morphological traits. Descriptive statistics were applied to report the prevalence of the different types of each trait, and chi-square tests were used to investigate sex differences and associations between traits. Results: A total of 4730 individuals were assessed (47% male, 53% female). Eight predetermined lip traits have been reported previously. There were differences in prevalence for all lip traits in male and female patients (all P ⩽ 0.0002), with differences between the sexes described for each trait. For example, a deeply grooved philtrum of average width was more prevalent in boys, and an indentation near the upper vermilion border was more prevalent in girls. Each of the traits was significantly associated with the other traits (all P < 0.0001), with particularly strong associations seen between traits in the same region (e.g. upper lip). Individual associations between traits are reported; for example, a straight lip contour was found to be associated with no true vermilion border in both the upper and lower lip regions. Conclusion: The automated computerised method described is an invaluable tool for the categorisation of lip morphological traits. The prevalence of various types of traits has been described. Sexual dimorphism exists for all the lip traits assessed. Generally, each of the traits are associated with all other traits, with individual associations reported

    Clustering Algorithms: Their Application to Gene Expression Data

    Get PDF
    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

    An automatic approach for classification and categorisation of lip morphological traits

    Get PDF
    Classification of facial traits (e.g., lip shape) is an important area of medical research, for example, in determining associations between lip traits and genetic variants which may lead to a cleft lip. In clinical situations, classification of facial traits is usually performed subjectively directly on the individual or recorded later from a three-dimensional image, which is time consuming and prone to operator errors. The present study proposes, for the first time, an automatic approach for the classification and categorisation of lip area traits. Our approach uses novel three-dimensional geometric features based on surface curvatures measured along geodesic paths between anthropometric landmarks. Different combinations of geodesic features are analysed and compared. The effect of automatically identified categories on the face is visualised using a partial least squares method. The method was applied to the classification and categorisation of six lip shape traits (philtrum, Cupid’s bow, lip contours, lip-chin, and lower lip tone) in a large sample of 4747 faces of normal British Western European descents. The proposed method demonstrates correct automatic classification rate of up to 90%

    Clustering with a new distance measure based on a dual-rooted tree

    Get PDF
    39 pagesInternational audienceThis paper introduces a novel distance measure for clustering high dimensional data based on the hitting time of two Minimal Spanning Trees (MST) grown sequentially from a pair of points by Prim's algorithm. When the proposed measure is used in conjunction with spectral clustering, we obtain a powerful clustering algorithm that is able to separate neighboring non-convex shaped clusters and to account for local as well as global geometric features of the data set. Remarkably, the new distance measure is a true metric even if the Prim algorithm uses a non-metric dissimilarity measure to compute the edges of the MST. This metric property brings added flexibility to the proposed method. In particular, the method is applied to clustering non Euclidean quantities, such as probability distributions or spectra, using the Kullback-Liebler divergence as a base measure. We reduce computational complexity by applying consensus clustering to a small ensemble of dual rooted MSTs. We show that the resultant consensus spectral clustering with dual rooted MST is competitive with other clustering methods, both in terms of clustering performance and computational complexity. We illustrate the proposed clustering algorithm on public domain benchmark data for which the ground truth is known, on one hand, and on real-world astrophysical data on the other hand
    corecore