107,138 research outputs found
Spatial clustering of array CGH features in combination with hierarchical multiple testing
We propose a new approach for clustering DNA features using array CGH data
from multiple tumor samples. We distinguish data-collapsing: joining contiguous
DNA clones or probes with extremely similar data into regions, from clustering:
joining contiguous, correlated regions based on a maximum likelihood principle.
The model-based clustering algorithm accounts for the apparent spatial patterns
in the data. We evaluate the randomness of the clustering result by a cluster
stability score in combination with cross-validation. Moreover, we argue that
the clustering really captures spatial genomic dependency by showing that
coincidental clustering of independent regions is very unlikely. Using the
region and cluster information, we combine testing of these for association
with a clinical variable in an hierarchical multiple testing approach. This
allows for interpreting the significance of both regions and clusters while
controlling the Family-Wise Error Rate simultaneously. We prove that in the
context of permutation tests and permutation-invariant clusters it is allowed
to perform clustering and testing on the same data set. Our procedures are
illustrated on two cancer data sets
Family names as indicators of Britainâs changing regional geography
In recent years the geography of surnames has become increasingly researched in genetics, epidemiology, linguistics and geography. Surnames provide a useful data source for the analysis of population structure, migrations, genetic relationships and levels of cultural diffusion and interaction between communities. The Worldnames database (www.publicprofiler.org/worldnames) of 300 million people from 26 countries georeferenced in many cases to the equivalent of UK Postcode level provides a rich source of surname data. This work has focused on the UK component of this dataset, that is the 2001 Enhanced Electoral Role, georeferenced to Output Area level. Exploratory analysis of the distribution of surnames across the UK shows that clear regions exist, such as Cornwall, Central Wales and Scotland, in agreement with anecdotal evidence. This study is concerned with applying a wide range of methods to the UK dataset to test their sensitivity and consistency to surname regions. Methods used thus far are hierarchical and non-hierarchical clustering, barrier algorithms, such as the Monmonier Algorithm, and Multidimensional Scaling. These, to varying degrees, have highlighted the regionality of UK surnames and provide strong foundations to future work and refinement in the UK context. Establishing a firm methodology has enabled comparisons to be made with data from the Great British 1881 census, developing insights into population movements from within and outside Great Britain
A General Spatio-Temporal Clustering-Based Non-local Formulation for Multiscale Modeling of Compartmentalized Reservoirs
Representing the reservoir as a network of discrete compartments with
neighbor and non-neighbor connections is a fast, yet accurate method for
analyzing oil and gas reservoirs. Automatic and rapid detection of coarse-scale
compartments with distinct static and dynamic properties is an integral part of
such high-level reservoir analysis. In this work, we present a hybrid framework
specific to reservoir analysis for an automatic detection of clusters in space
using spatial and temporal field data, coupled with a physics-based multiscale
modeling approach. In this work a novel hybrid approach is presented in which
we couple a physics-based non-local modeling framework with data-driven
clustering techniques to provide a fast and accurate multiscale modeling of
compartmentalized reservoirs. This research also adds to the literature by
presenting a comprehensive work on spatio-temporal clustering for reservoir
studies applications that well considers the clustering complexities, the
intrinsic sparse and noisy nature of the data, and the interpretability of the
outcome.
Keywords: Artificial Intelligence; Machine Learning; Spatio-Temporal
Clustering; Physics-Based Data-Driven Formulation; Multiscale Modelin
Variance and Skewness in the FIRST survey
We investigate the large-scale clustering of radio sources in the FIRST
1.4-GHz survey by analysing the distribution function (counts in cells). We
select a reliable sample from the the FIRST catalogue, paying particular
attention to the problem of how to define single radio sources from the
multiple components listed. We also consider the incompleteness of the
catalogue. We estimate the angular two-point correlation function ,
the variance , and skewness of the distribution for the
various sub-samples chosen on different criteria. Both and
show power-law behaviour with an amplitude corresponding a spatial correlation
length of Mpc. We detect significant skewness in the
distribution, the first such detection in radio surveys. This skewness is found
to be related to the variance through , with
, consistent with the non-linear gravitational growth of
perturbations from primordial Gaussian initial conditions. We show that the
amplitude of variance and skewness are consistent with realistic models of
galaxy clustering.Comment: 13 pages, 21 inline figures, to appear in MNRA
Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation.
Spatial relationships within the eukaryotic nucleus are essential for proper nuclear function. In Plasmodium falciparum, the repositioning of chromosomes has been implicated in the regulation of the expression of genes responsible for antigenic variation, and the formation of a single, peri-nuclear nucleolus results in the clustering of rDNA. Nevertheless, the precise spatial relationships between chromosomes remain poorly understood, because, until recently, techniques with sufficient resolution have been lacking. Here we have used chromosome conformation capture and second-generation sequencing to study changes in chromosome folding and spatial positioning that occur during switches in var gene expression. We have generated maps of chromosomal spatial affinities within the P.âfalciparum nucleus at 25âKb resolution, revealing a structured nucleolus, an absence of chromosome territories, and confirming previously identified clustering of heterochromatin foci. We show that switches in var gene expression do not appear to involve interaction with a distant enhancer, but do result in local changes at the active locus. These maps reveal the folding properties of malaria chromosomes, validate known physical associations, and characterize the global landscape of spatial interactions. Collectively, our data provide critical information for a better understanding of gene expression regulation and antigenic variation in malaria parasites
A supervised clustering approach for fMRI-based inference of brain states
We propose a method that combines signals from many brain regions observed in
functional Magnetic Resonance Imaging (fMRI) to predict the subject's behavior
during a scanning session. Such predictions suffer from the huge number of
brain regions sampled on the voxel grid of standard fMRI data sets: the curse
of dimensionality. Dimensionality reduction is thus needed, but it is often
performed using a univariate feature selection procedure, that handles neither
the spatial structure of the images, nor the multivariate nature of the signal.
By introducing a hierarchical clustering of the brain volume that incorporates
connectivity constraints, we reduce the span of the possible spatial
configurations to a single tree of nested regions tailored to the signal. We
then prune the tree in a supervised setting, hence the name supervised
clustering, in order to extract a parcellation (division of the volume) such
that parcel-based signal averages best predict the target information.
Dimensionality reduction is thus achieved by feature agglomeration, and the
constructed features now provide a multi-scale representation of the signal.
Comparisons with reference methods on both simulated and real data show that
our approach yields higher prediction accuracy than standard voxel-based
approaches. Moreover, the method infers an explicit weighting of the regions
involved in the regression or classification task
Clustering multivariate spatial data based on local measures of spatial autocorrelation.
A growing interest in clustering spatial data is emerging in several areas, from local economic development to epidemiology, from remote sensing data to environment analyses. However, methods and procedures to face such problem are still lacking. Local measures of spatial autocorrelation aim at identifying patterns of spatial dependence within the study region. Mapping these measures provide the basic building block for identifying spatial clusters of units. If this may work satisfactorily in the univariate case, most of the real problems have a multidimensional nature. Thus, we need a clustering method based on both the multivariate data information and the spatial distribution of units. In this paper we propose a procedure for exploring and discover patterns of spatial clustering. We discuss an implementation of the popular partitioning algorithm known as K-means which incorporates the spatial structure of the data through the use of local measures of spatial autocorrelation. An example based on a set of variables related to the labour market of the Italian region Umbria is presented and deeply discussed.
- âŠ