109,552 research outputs found

    The Influence of the Zonation Effect on a System of Hierarchical Functional Regions

    Get PDF
    Background: Hierarchical functional regions (FRs) can be calculated using data on interactions between basic spatial units (BSUs) and a hierarchical aggregation procedure. However, the results depend on the selected system of initial BSUs. In spatial sciences, this is known as the zonation effect, which is one of the effects of the Modifiable Areal Unit Problem (MAUP). Objectives: In this paper, we analyse the influence of the zonation effect on a system of hierarchical functional regions. Methods/Approach: We compared two systems of hierarchical functional regions of Slovenia modelled by the Intramax aggregation procedure using the inter-municipal labour commuting flows for the same year, but for two different initial sets of municipalities. Besides, we have introduced a new measure to compare systems of hierarchical FRs. Results: The results show that the zonation effect has an influence on hierarchical functional regions. The clustering comparison measure suggested here is a metric measure, which is appropriate for comparing hierarchical FRs. Conclusions: The zonation effect has influence on hierarchical FRs. The clustering comparison measure suggested in this paper is easy to interpret, but it should be adjusted for the number of clusterings

    Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data

    Get PDF
    P. 860-866Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.S

    Selection of informative clusters from hierarchical cluster tree with gene classes

    Get PDF
    BACKGROUND: A common clustering method in the analysis of gene expression data has been hierarchical clustering. Usually the analysis involves selection of clusters by cutting the tree at a suitable level and/or analysis of a sorted gene list that is obtained with the tree. Cutting of the hierarchical tree requires the selection of a suitable level and it results in the loss of information on the other level. Sorted gene lists depend on the sorting method of the joined clusters. Author proposes that the clusters should be selected using the gene classifications. RESULTS: This article presents a simple method for searching for clusters with the strongest enrichment of gene classes from a cluster tree. The clusters found are presented in the estimated order of importance. The method is demonstrated with a yeast gene expression data set and with two database classifications. The obtained clusters demonstrated a very strong enrichment of functional classes. The obtained clusters are also able to present similar gene groups to those that were observed from the data set in the original analysis and also many gene groups that were not reported in the original analysis. Visualization of the results on top of a cluster tree shows that the method finds informative clusters from several levels of the cluster tree and indicates that the clusters found could not have been obtained by simply cutting the cluster tree. Results were also used in the comparison of cluster trees from different clustering methods. CONCLUSION: The presented method should facilitate the exploratory analysis of big data sets when the associated categorical data is available

    A tiny glimpse into the human brain using model-free analysis for resting-state fMRI data

    Get PDF
    Resting-state functional Magnetic Resonance Imaging (fMRI) acquires four dimensional data that indirectly depicts human brain activity. Within these four dimensional datasets reside resting-state functional connectivity networks (RFNs), depicting how the human brain is organized functionally. This series of studies delve into the use of data-driven analysis methods for resting-state fMRI data. Their strengths were explored and their weaknesses tackled, both in their methodologies and applications, all in hope to gain a better understanding of the data, and thereby how the brain function. The journey begins through the usage of one of the most common data-driven analysis methods in use today: Independent Component Analysis (ICA). ICA requires no user input parameter apart from the input dataset and the number of output Independent Components (NIC). The requirement of the NIC, a priori, is troubling as the inherent number of Independent Components (ICs) that exists within non-simulated datasets is unknown, due to the existence of various noise and artefact sources to differing degrees. Furthermore, comparing datasets using ICA is problematic because of the inherently different dimensionality of different datasets. To investigate the effects of NIC on the ICA output results, a classification framework based on Support Vector Machines (SVM) was implemented to automatically classify ICs as either potential RFNs, or noise/artefact signal. This feature-optimized classification of ICs with SVM, or FOCIS, framework uses features derived from verbal instructions for manual visual inspection of ICs. With only few significant features selected through iterative feature-selection and a small training set, the classification framework performed well with over 98% in overall accuracy for group ICA output results. Analysis of different resting-state fMRI datasets using FOCIS indicated that the specification of NIC can critically affect the ICA results on restingstate fMRI data. These changes are complex and are individually different from one another, irrespective whether the IC is a potential RFN or artefact/noise signals. Applying this knowledge on group comparison studies, ICA was used to study migraine patients undergo kinetic oscillation stimulation treatment. The immediate effects of the treatment allows direct correlation of a patient’s pain levels with changes in their RFNs. Differences in RFNs that include areas in the midbrain and limbic system regulating the central nervous system were discovered in migraine patients compared to healthy control group. Overlapping areas were also shown to be affected by the treatment. These results provide supporting evidence for the hypothesis that the treatment affects and regulates the parasympathetic autonomic reflex, alleviating migraine symptoms. Hierarchical clustering is another data-driven analysis method that is almost devoid of all userinput parameters. The algorithm naturally stratifies data into a hierarchical structure. It is believed that brain function is hierarchically organized, so an algorithm which reflects this aspect is a seemingly excellent choice to use for analyzing the resting-state fMRI data. A hierarchical clustering analysis framework was developed to extract RFNs from resting-state fMRI data with full brain coverage at voxel level. The RFNs identified using hierarchical clustering conforms to those identified previously using other data processing techniques, such as ICA. An innate ability of the clustering algorithm is to naturally organize data into a hierarchical tree (dendrogram). This was fully utilized though extensions in the framework for cluster evaluation. Extending the hierarchical clustering framework with the cluster evaluation pipeline allowed extraction of functional subdivisions of known RFNs. This demonstrated that not only can hierarchical clustering be used to extract the modular organization at the scale of large systems for entire RFNs, but can also be used to derive the functional subdivision of RFNs and provide a consistent method of analysis at different levels of detail. The subnetworks extracted using hierarchical clustering reveals the intrinsic functional connectivity amongst the subnetworks within RFNs and provide clues for further exploring the potential for currently unknown functional junctions within RFNs

    Element-centric clustering comparison unifies overlaps and hierarchy

    Full text link
    Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science

    Resolving structural variability in network models and the brain

    Get PDF
    Large-scale white matter pathways crisscrossing the cortex create a complex pattern of connectivity that underlies human cognitive function. Generative mechanisms for this architecture have been difficult to identify in part because little is known about mechanistic drivers of structured networks. Here we contrast network properties derived from diffusion spectrum imaging data of the human brain with 13 synthetic network models chosen to probe the roles of physical network embedding and temporal network growth. We characterize both the empirical and synthetic networks using familiar diagnostics presented in statistical form, as scatter plots and distributions, to reveal the full range of variability of each measure across scales in the network. We focus on the degree distribution, degree assortativity, hierarchy, topological Rentian scaling, and topological fractal scaling---in addition to several summary statistics, including the mean clustering coefficient, shortest path length, and network diameter. The models are investigated in a progressive, branching sequence, aimed at capturing different elements thought to be important in the brain, and range from simple random and regular networks, to models that incorporate specific growth rules and constraints. We find that synthetic models that constrain the network nodes to be embedded in anatomical brain regions tend to produce distributions that are similar to those extracted from the brain. We also find that network models hardcoded to display one network property do not in general also display a second, suggesting that multiple neurobiological mechanisms might be at play in the development of human brain network architecture. Together, the network models that we develop and employ provide a potentially useful starting point for the statistical inference of brain network structure from neuroimaging data.Comment: 24 pages, 11 figures, 1 table, supplementary material

    Vertical wind profile characterization and identification of patterns based on a shape clustering algorithm

    Get PDF
    Wind power plants are becoming a generally accepted resource in the generation mix of many utilities. At the same time, the size and the power rating of individual wind turbines have increased considerably. Under these circumstances, the sector is increasingly demanding an accurate characterization of vertical wind speed profiles to estimate properly the incoming wind speed at the rotor swept area and, consequently, assess the potential for a wind power plant site. The present paper describes a shape-based clustering characterization and visualization of real vertical wind speed data. The proposed solution allows us to identify the most likely vertical wind speed patterns for a specific location based on real wind speed measurements. Moreover, this clustering approach also provides characterization and classification of such vertical wind profiles. This solution is highly suitable for a large amount of data collected by remote sensing equipment, where wind speed values at different heights within the rotor swept area are available for subsequent analysis. The methodology is based on z-normalization, shape-based distance metric solution and the Ward-hierarchical clustering method. Real vertical wind speed profile data corresponding to a Spanish wind power plant and collected by using a commercialWindcube equipment during several months are used to assess the proposed characterization and clustering process, involving more than 100000 wind speed data values. All analyses have been implemented using open-source R-software. From the results, at least four different vertical wind speed patterns are identified to characterize properly over 90% of the collected wind speed data along the day. Therefore, alternative analytical function criteria should be subsequently proposed for vertical wind speed characterization purposes.The authors are grateful for the financial support from the Spanish Ministry of the Economy and Competitiveness and the European Union —ENE2016-78214-C2-2-R—and the Spanish Education, Culture and Sport Ministry —FPU16/042

    Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects

    Full text link
    Time course microarray data provide insight about dynamic biological processes. While several clustering methods have been proposed for the analysis of these data structures, comparison and selection of appropriate clustering methods are seldom discussed. We compared 33 probabilistic based clustering methods and 33 distance based clustering methods for time course microarray data. Among probabilistic methods, we considered: smoothing spline clustering also known as model based functional data analysis (MFDA), functional clustering models for sparsely sampled data (FCM) and model-based clustering (MCLUST). Among distance based methods, we considered: weighted gene co-expression network analysis (WGCNA), clustering with dynamic time warping distance (DTW) and clustering with autocorrelation based distance (ACF). We studied these algorithms in both simulated settings and case study data. Our investigations showed that FCM performed very well when gene curves were short and sparse. DTW and WGCNA performed well when gene curves were medium or long (>=10>=10 observations). SSC performed very well when there were clusters of gene curves similar to one another. Overall, ACF performed poorly in these applications. In terms of computation time, FCM, SSC and DTW were considerably slower than MCLUST and WGCNA. WGCNA outperformed MCLUST by generating more accurate and biological meaningful clustering results. WGCNA and MCLUST are the best methods among the 6 methods compared, when performance and computation time are both taken into account. WGCNA outperforms MCLUST, but MCLUST provides model based inference and uncertainty measure of clustering results

    Consensus clustering and functional interpretation of gene-expression data

    Get PDF
    Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas
    corecore