Search CORE

2,218 research outputs found

Model-Based Clustering and Classification of Functional Data

Author: Breiman L.
Celeux G.
Cormen T. H.
Dempster A. P.
Diebold F.
Ferraty F.
Frühwirth‐Schnatter S.
Hastie T.
Hastie T.
McLachlan G. J.
Raftery A. E.
Titterington D.
Publication venue
Publication date: 01/03/2018
Field of study

The problem of complex data analysis is a central topic of modern statistical science and learning systems and is becoming of broader interest with the increasing prevalence of high-dimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to acquire knowledge from raw data for exploratory analysis, which can be achieved through clustering techniques or to make predictions of future data via classification (i.e., discriminant analysis) techniques. Latent data models, including mixture model-based approaches are one of the most popular and successful approaches in both the unsupervised context (i.e., clustering) and the supervised one (i.e, classification or discrimination). Although traditionally tools of multivariate analysis, they are growing in popularity when considered in the framework of functional data analysis (FDA). FDA is the data analysis paradigm in which the individual data units are functions (e.g., curves, surfaces), rather than simple vectors. In many areas of application, the analyzed data are indeed often available in the form of discretized values of functions or curves (e.g., time series, waveforms) and surfaces (e.g., 2d-images, spatio-temporal data). This functional aspect of the data adds additional difficulties compared to the case of a classical multivariate (non-functional) data analysis. We review and present approaches for model-based clustering and classification of functional data. We derive well-established statistical models along with efficient algorithmic tools to address problems regarding the clustering and the classification of these high-dimensional data, including their heterogeneity, missing information, and dynamical hidden structure. The presented models and algorithms are illustrated on real-world functional data analysis problems from several application area

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.

Author: Das Diya
Dudoit Sandrine
Fletcher Russell B
Ngai John
Purdom Elizabeth
Risso Davide
Street Kelly
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

BackgroundSingle-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve.ResultsWe introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods.ConclusionsSlingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression

Directory of Open Access Journals

eScholarship - University of California

Archivio istituzionale della ricerca - Università di Padova

Automatic pharynx and larynx cancer segmentation framework (PLCSF) on contrast enhanced MR images

Author: Di Caterina Gaetano
Doshi Trushali
Grose Derek
Mackenzie Kenneth
Petropoulakis Lykourgos
Soraghan John
Wilson Christina
Publication venue: 'Elsevier BV'
Publication date: 01/03/2017
Field of study

A novel and effective pharynx and larynx cancer segmentation framework (PLCSF) is presented for automatic base of tongue and larynx cancer segmentation from gadolinium-enhanced T1-weighted magnetic resonance images (MRI). The aim of the proposed PLCSF is to assist clinicians in radiotherapy treatment planning. The initial processing of MRI data in PLCSF includes cropping of region of interest; reduction of artefacts and detection of the throat region for the location prior. Further, modified fuzzy c-means clustering is developed to robustly separate candidate cancer pixels from other tissue types. In addition, region-based level set method is evolved to ensure spatial smoothness for the final segmentation boundary after noise removal using non-linear and morphological filtering. Validation study of PLCSF on 102 axial MRI slices demonstrate mean dice similarity coefficient of 0.79 and mean modified Hausdorff distance of 2.2 mm when compared with manual segmentations. Comparison of PLCSF with other algorithms validates the robustness of the PLCSF. Inter- and intra-variability calculations from manual segmentations suggest that PLCSF can help to reduce the human subjectivity

University of Strathclyde Institutional Repository

Quantitative Classification of Somatostatin-Positive Neocortical Interneurons Identifies Three Interneuron Subtypes

Author: Fino Elodie
McGarry Laura M.
Nikolenko Volodymyr
Packer Adam M.
Sippy Tanya
Yuste Rafael
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2010
Field of study

Deciphering the circuitry of the neocortex requires knowledge of its components, making a systematic classification of neocortical neurons necessary. GABAergic interneurons contribute most of the morphological, electrophysiological and molecular diversity of the cortex, yet interneuron subtypes are still not well defined. To quantitatively identify classes of interneurons, 59 GFP-positive interneurons from a somatostatin-positive mouse line were characterized by whole-cell recordings and anatomical reconstructions. For each neuron, we measured a series of physiological and morphological variables and analyzed these data using unsupervised classification methods. PCA and cluster analysis of morphological variables revealed three groups of cells: one comprised of Martinotti cells, and two other groups of interneurons with short asymmetric axons targeting layers 2/3 and bending medially. PCA and cluster analysis of electrophysiological variables also revealed the existence of these three groups of neurons, particularly with respect to action potential time course. These different morphological and electrophysiological characteristics could make each of these three interneuron subtypes particularly suited for a different function within the cortical circuit

Crossref

Hal - Université Grenoble Alpes

Directory of Open Access Journals

PubMed Central

Spike sorting for large, dense electrode arrays

Author: Belluscio M
Buzsáki G
Carandini M
Denfield GH
Ecker AS
Goodman DF
Grosmark A
Harris KD
Hunter ML
Kadir SN
Rossant C
Saleem AB
Schulman J
Solomon S
Tolias AS
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/02/2016
Field of study

Developments in microfabrication technology have enabled the production of neural electrode arrays with hundreds of closely spaced recording sites, and electrodes with thousands of sites are under development. These probes in principle allow the simultaneous recording of very large numbers of neurons. However, use of this technology requires the development of techniques for decoding the spike times of the recorded neurons from the raw data captured from the probes. Here we present a set of tools to solve this problem, implemented in a suite of practical, user-friendly, open-source software. We validate these methods on data from the cortex, hippocampus and thalamus of rat, mouse, macaque and marmoset, demonstrating error rates as low as 5%

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

UCL Discovery

PubMed Central

Spiral - Imperial College Digital Repository

University of Hertfordshire Research Archive

Benchmark data and model independent event classification for the large hadron collider

Author: Aarrestad T
Bona M
Boveia A
Caron S
Davies J
de Austri RR
De Simone A
Doglioni C
Duarte JM
Farbin A
Gupta H
Heinrich LA
Hendriks L
Howarth J
Jawahar P
Jueid A
Lastow J
Leinweber A
Mamuzic J
Merenyi E
Morandini A
Moskvitina P
Nellist C
Ngadiuba J
Ostdiek B
Pierini M
Ravina B
Sekmen S
Touranakou M
van Beekveld M
Vaskeviciute M
Verheyen R
Vilalta R
Vlimant J-R
Wallin E
White M
Wozniak KA
Wulff E
Zhang Z
Publication venue: 'Stichting SciPost'
Publication date: 28/05/2021
Field of study

We describe the outcome of a data challenge conducted as part of the Dark Machines (https://www.darkmachines.org) initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims to detect signals of new physics at the Large Hadron Collider (LHC) using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of > 1 billion simulated LHC events corresponding to 10 fb−1 of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge

Lund University Publications

UCL Discovery

Queen Mary Research Online

CERN Document Server

Landscape mapping at sub-Antarctic South Georgia provides a protocol for underpinning large-scale marine protected areas

Author: Dorschel Boris
Griffiths Huw
Hogg Oliver
Huvenne Veerle
Linse Katrin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Global biodiversity is in decline, with the marine environment experiencing significant and increasing anthropogenic pressures. In response marine protected areas (MPAs) have increasingly been adopted as the flagship approach to marine conservation, many covering enormous areas. At present, however, the lack of biological sampling makes prioritising which regions of the ocean to protect, especially over large spatial scales, particularly problematic. Here we present an interdisciplinary approach to marine landscape mapping at the sub-Antarctic island of South Georgia as an effective protocol for underpinning large-scale (105–106 km2) MPA designations. We have developed a new high-resolution (100 m) digital elevation model (DEM) of the region and integrated this DEM with bathymetry-derived parameters, modelled oceanographic data, and satellite primary productivity data. These interdisciplinary datasets were used to apply an objective statistical approach to hierarchically partition and map the benthic environment into physical habitats types. We assess the potential application of physical habitat classifications as proxies for biological structuring and the application of the landscape mapping for informing on marine spatial plannin

Southampton (e-Prints Soton)

PubMed Central

Electronic Publication Information Center

Open Marine Archive

NERC Open Research Archive

Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data

Author: A Beyer
AL Barabási
BH Good
BW Kernighan
CO Daub
D Duewer
D Marbach
DFT Veiga
E Bonnet
E Ravasz
E Segal
EH Davidson
F Luo
G Balázsi
G Getz
G Palla
G Palla
H Zare
HW Ma
J Chen
J Duch
J Hubble
J Lemke
J Reichardt
JJ Faith
JJ Faith
JN Weinstein
K Baggerly
Kevin E. Bassler
KY Yeung
M Blatt
M Riley
MB Eisen
MEJ Newman
MEJ Newman
MF Traxler
MM Barker
N Friedman
N Friedman
O Alter
PD Karp
Q Lu
R Guimerà
RA Irizarry
S Fortunato
S Fortunato
S Gama-Castro
S Raychaudhuri
S Tavazoie
Santiago Treviño
Satoru Miyano
SB Seidman
SB Seidman
SP Borgatii
SP Borgatii
TF Cooper
Tim F. Cooper
TS Gardner
U Brandes
UN Raghavan
X Wen
Y Benjamini
Y Sun
Yudong Sun
Z Shi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/01/2012
Field of study

Determining the functional structure of biological networks is a central goal of systems biology. One approach is to analyze gene expression data to infer a network of gene interactions on the basis of their correlated responses to environmental and genetic perturbations. The inferred network can then be analyzed to identify functional communities. However, commonly used algorithms can yield unreliable results due to experimental noise, algorithmic stochasticity, and the influence of arbitrarily chosen parameter values. Furthermore, the results obtained typically provide only a simplistic view of the network partitioned into disjoint communities and provide no information of the relationship between communities. Here, we present methods to robustly detect coregulated and functionally enriched gene communities and demonstrate their application and validity for Escherichia coli gene expression data. Applying a recently developed community detection algorithm to the network of interactions identified with the context likelihood of relatedness (CLR) method, we show that a hierarchy of network communities can be identified. These communities significantly enrich for gene ontology (GO) terms, consistent with them representing biologically meaningful groups. Further, analysis of the most significantly enriched communities identified several candidate new regulatory interactions. The robustness of our methods is demonstrated by showing that a core set of functional communities is reliably found when artificial noise, modeling experimental noise, is added to the data. We find that noise mainly acts conservatively, increasing the relatedness required for a network link to be reliably assigned and decreasing the size of the core communities, rather than causing association of genes into new communities.Comment: Due to appear in PLoS Computational Biology. Supplementary Figure S1 was not uploaded but is available by contacting the author. 27 pages, 5 figures, 15 supplementary file

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare