Search CORE

107,138 research outputs found

Spatial clustering of array CGH features in combination with hierarchical multiple testing

Author: Kim Kyung In
Roquain Etienne
Van De Wiel Mark
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2010
Field of study

We propose a new approach for clustering DNA features using array CGH data from multiple tumor samples. We distinguish data-collapsing: joining contiguous DNA clones or probes with extremely similar data into regions, from clustering: joining contiguous, correlated regions based on a maximum likelihood principle. The model-based clustering algorithm accounts for the apparent spatial patterns in the data. We evaluate the randomness of the clustering result by a cluster stability score in combination with cross-validation. Moreover, we argue that the clustering really captures spatial genomic dependency by showing that coincidental clustering of independent regions is very unlikely. Using the region and cluster information, we combine testing of these for association with a clinical variable in an hierarchical multiple testing approach. This allows for interpreting the significance of both regions and clusters while controlling the Family-Wise Error Rate simultaneously. We prove that in the context of permutation tests and permutation-invariant clusters it is allowed to perform clustering and testing on the same data set. Our procedures are illustrated on two cancer data sets

arXiv.org e-Print Archive

Crossref

VU Research Portal

Hal-Diderot

Family names as indicators of Britain’s changing regional geography

Author: Cheshire JA
Longley PA
Mateos P
Publication venue: Centre for Advanced Spatial Analysis, UCL
Publication date: 01/01/2009
Field of study

In recent years the geography of surnames has become increasingly researched in genetics, epidemiology, linguistics and geography. Surnames provide a useful data source for the analysis of population structure, migrations, genetic relationships and levels of cultural diffusion and interaction between communities. The Worldnames database (www.publicprofiler.org/worldnames) of 300 million people from 26 countries georeferenced in many cases to the equivalent of UK Postcode level provides a rich source of surname data. This work has focused on the UK component of this dataset, that is the 2001 Enhanced Electoral Role, georeferenced to Output Area level. Exploratory analysis of the distribution of surnames across the UK shows that clear regions exist, such as Cornwall, Central Wales and Scotland, in agreement with anecdotal evidence. This study is concerned with applying a wide range of methods to the UK dataset to test their sensitivity and consistency to surname regions. Methods used thus far are hierarchical and non-hierarchical clustering, barrier algorithms, such as the Monmonier Algorithm, and Multidimensional Scaling. These, to varying degrees, have highlighted the regionality of UK surnames and provide strong foundations to future work and refinement in the UK context. Establishing a firm methodology has enabled comparisons to be made with data from the Great British 1881 census, developing insights into population movements from within and outside Great Britain

UCL Discovery

A General Spatio-Temporal Clustering-Based Non-local Formulation for Multiscale Modeling of Compartmentalized Reservoirs

Author: Castineira David
Darabi Hamed
Esmaeilzadeh Soheil
Hetz Gill
Olalotiti-lawal Feyisayo
Salehi Amir
Publication venue: 'Society of Petroleum Engineers (SPE)'
Publication date: 01/01/2019
Field of study

Representing the reservoir as a network of discrete compartments with neighbor and non-neighbor connections is a fast, yet accurate method for analyzing oil and gas reservoirs. Automatic and rapid detection of coarse-scale compartments with distinct static and dynamic properties is an integral part of such high-level reservoir analysis. In this work, we present a hybrid framework specific to reservoir analysis for an automatic detection of clusters in space using spatial and temporal field data, coupled with a physics-based multiscale modeling approach. In this work a novel hybrid approach is presented in which we couple a physics-based non-local modeling framework with data-driven clustering techniques to provide a fast and accurate multiscale modeling of compartmentalized reservoirs. This research also adds to the literature by presenting a comprehensive work on spatio-temporal clustering for reservoir studies applications that well considers the clustering complexities, the intrinsic sparse and noisy nature of the data, and the interpretability of the outcome. Keywords: Artificial Intelligence; Machine Learning; Spatio-Temporal Clustering; Physics-Based Data-Driven Formulation; Multiscale Modelin

arXiv.org e-Print Archive

Crossref

Variance and Skewness in the FIRST survey

Author: Lahav O.
Maddox S. J.
Magliocchetti M.
Wall J. V.
Publication venue: 'Wiley'
Publication date: 24/02/1998
Field of study

We investigate the large-scale clustering of radio sources in the FIRST 1.4-GHz survey by analysing the distribution function (counts in cells). We select a reliable sample from the the FIRST catalogue, paying particular attention to the problem of how to define single radio sources from the multiple components listed. We also consider the incompleteness of the catalogue. We estimate the angular two-point correlation function

w(\theta)

, the variance

\Psi_2

, and skewness

\Psi_3

of the distribution for the various sub-samples chosen on different criteria. Both

w(\theta)

and

\Psi_2

show power-law behaviour with an amplitude corresponding a spatial correlation length of

r_0 \sim 10 h^{-1}

Mpc. We detect significant skewness in the distribution, the first such detection in radio surveys. This skewness is found to be related to the variance through

\Psi_3=S_3(\Psi_2)^{\alpha}

, with

\alpha=1.9\pm 0.1

, consistent with the non-linear gravitational growth of perturbations from primordial Gaussian initial conditions. We show that the amplitude of variance and skewness are consistent with realistic models of galaxy clustering.Comment: 13 pages, 21 inline figures, to appear in MNRA

arXiv.org e-Print Archive

Crossref

CERN Document Server

Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation.

Author: Berriman Matthew
Eastman Richard T.
Feller Avi I.
Kyes Sue A.
Lemieux Jacob E.
Newbold Chris I.
Otto Thomas D.
Pinches Robert A.
Su Xin-Zhuan
Publication venue: 'Foundation for Cellular and Molecular Medicine'
Publication date: 01/11/2013
Field of study

Spatial relationships within the eukaryotic nucleus are essential for proper nuclear function. In Plasmodium falciparum, the repositioning of chromosomes has been implicated in the regulation of the expression of genes responsible for antigenic variation, and the formation of a single, peri-nuclear nucleolus results in the clustering of rDNA. Nevertheless, the precise spatial relationships between chromosomes remain poorly understood, because, until recently, techniques with sufficient resolution have been lacking. Here we have used chromosome conformation capture and second-generation sequencing to study changes in chromosome folding and spatial positioning that occur during switches in var gene expression. We have generated maps of chromosomal spatial affinities within the P. falciparum nucleus at 25 Kb resolution, revealing a structured nucleolus, an absence of chromosome territories, and confirming previously identified clustering of heterochromatin foci. We show that switches in var gene expression do not appear to involve interaction with a distant enhancer, but do result in local changes at the active locus. These maps reveal the folding properties of malaria chromosomes, validate known physical associations, and characterize the global landscape of spatial interactions. Collectively, our data provide critical information for a better understanding of gene expression regulation and antigenic variation in malaria parasites

Oxford University Research Archive

Enlighten

A supervised clustering approach for fMRI-based inference of brain states

Author: Alexandre Gramfort
Bertrand Thirion
Bishop
Carroll
Christine Keribin
Cordes
Cortes
Cox
Dayan
Eger
Evelyn Eger
Fan
Filzmoser
Flandin
Friedman
Friston
Gaël Varoquaux
Ghebreab
Golland
Haynes
Haynes
He
Hughes
Johnson
Kamitani
Keller
Kontos
Kriegeskorte
Krishnapuram
Mitchell
Norman
Oliver
Palatucci
Thirion
Thyreau
Tucholka
Tzourio-Mazoyer
Ugurbil
Vincent Michel
Ward
Zou
Publication venue: 'Elsevier BV'
Publication date: 20/04/2011
Field of study

We propose a method that combines signals from many brain regions observed in functional Magnetic Resonance Imaging (fMRI) to predict the subject's behavior during a scanning session. Such predictions suffer from the huge number of brain regions sampled on the voxel grid of standard fMRI data sets: the curse of dimensionality. Dimensionality reduction is thus needed, but it is often performed using a univariate feature selection procedure, that handles neither the spatial structure of the images, nor the multivariate nature of the signal. By introducing a hierarchical clustering of the brain volume that incorporates connectivity constraints, we reduce the span of the possible spatial configurations to a single tree of nested regions tailored to the signal. We then prune the tree in a supervised setting, hence the name supervised clustering, in order to extract a parcellation (division of the volume) such that parcel-based signal averages best predict the target information. Dimensionality reduction is thus achieved by feature agglomeration, and the constructed features now provide a multi-scale representation of the signal. Comparisons with reference methods on both simulated and real data show that our approach yields higher prediction accuracy than standard voxel-based approaches. Moreover, the method infers an explicit weighting of the regions involved in the regression or classification task

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Inserm

HAL-CEA

Clustering multivariate spatial data based on local measures of spatial autocorrelation.

Author: Luca Scrucca
Publication venue
Publication date
Field of study

A growing interest in clustering spatial data is emerging in several areas, from local economic development to epidemiology, from remote sensing data to environment analyses. However, methods and procedures to face such problem are still lacking. Local measures of spatial autocorrelation aim at identifying patterns of spatial dependence within the study region. Mapping these measures provide the basic building block for identifying spatial clusters of units. If this may work satisfactorily in the univariate case, most of the real problems have a multidimensional nature. Thus, we need a clustering method based on both the multivariate data information and the spatial distribution of units. In this paper we propose a procedure for exploring and discover patterns of spatial clustering. We discuss an implementation of the popular partitioning algorithm known as K-means which incorporates the spatial structure of the data through the use of local measures of spatial autocorrelation. An example based on a set of variables related to the labour market of the Italian region Umbria is presented and deeply discussed.

Research Papers in Economics