Search CORE

1,203 research outputs found

Clustering Via Nonparametric Density Estimation: the R Package pdfCluster

Author: Azzalini Adelchi
Menardi Giovanna
Publication venue
Publication date: 28/01/2013
Field of study

The R package pdfCluster performs cluster analysis based on a nonparametric estimate of the density of the observed variables. After summarizing the main aspects of the methodology, we describe the features and the usage of the package, and finally illustrate its working with the aid of two datasets

arXiv.org e-Print Archive

Directory of Open Access Journals

Journal of Statistical Software

Archivio istituzionale della ricerca - Università di Padova

A probabilistic approach to emission-line galaxy classification

Author: Beck R.
Costa-Duarte M. V.
Dantas M. L. L.
de Souza R. S.
Feigelson E. D.
Gieseke F.
Killedar M.
Krone-Martins A.
Lablanche P. -Y.
Vilalta R.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional emission-line classification schemes of galaxy ionization sources: the Baldwin-Phillips-Terlevich (BPT) and

\rm W_{H\alpha}

vs. [NII]/H

\alpha

(WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey Data Release 7 and SEAGal/STARLIGHT datasets. We apply a GMM to empirically define classes of galaxies in a three-dimensional space spanned by the

\log

[OIII]/H

\beta

\log

[NII]/H

\alpha

, and

\log

EW(H

{\alpha}

), optical parameters. The best-fit GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart. GC1 and GC4 are associated with star-forming galaxies, suggesting the need to define a new starburst subgroup. GC2 is associated with BPT's Active Galaxy Nuclei (AGN) class and WHAN's weak AGN class. GC3 is associated with BPT's composite class and WHAN's strong AGN class. Conversely, there is no statistical evidence -- based on four GCs -- for the existence of a Seyfert/LINER dichotomy in our sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The GC5 appears associated to the LINER and Passive galaxies on the BPT and WHAN diagrams respectively. Subtleties aside, we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical datasets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available within the COINtoolbox (https://cointoolbox.github.io/GMM\_Catalogue/).Comment: Accepted for publication in MNRA

arXiv.org e-Print Archive

Leiden University Scholary Publications

Radboud Repository

Inflammation-associated enterotypes, host genotype, cage and inter-individual effects drive gut microbiota variation in common laboratory mice

Author: Brinkman Brigitta
Cauwe Benedicte
Garcia Yunta Roberto
Hildebrand Falk
Liston Adrian
Nguyen (Thi Loan Anh
Raes Jeroen
Vandenabeele Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Murine models are a crucial component of gut microbiome research. Unfortunately, a multitude of genetic backgrounds and experimental setups, together with inter-individual variation, complicates cross-study comparisons and a global understanding of the mouse microbiota landscape. Here, we investigate the variability of the healthy mouse microbiota of five common lab mouse strains using 16S rDNA pyrosequencing. Results: We find initial evidence for richness-driven, strain-independent murine enterotypes that show a striking resemblance to those in human, and which associate with calprotectin levels, a marker for intestinal inflammation. After enterotype stratification, we find that genetic, caging and inter-individual variation contribute on average 19%, 31.7% and 45.5%, respectively, to the variance in the murine gut microbiota composition. Genetic distance correlates positively to microbiota distance, so that genetically similar strains have more similar microbiota than genetically distant ones. Specific mouse strains are enriched for specific operational taxonomic units and taxonomic groups, while the 'cage effect' can occur across mouse strain boundaries and is mainly driven by Helicobacter infections. Conclusions: The detection of enterotypes suggests a common ecological cause, possibly low-grade inflammation that might drive differences among gut microbiota composition in mammals. Furthermore, the observed environmental and genetic effects have important consequences for experimental design in mouse microbiome research

Ghent University Academic Bibliography

PubMed Central

On the non-local geometry of turbulence

Author: Bermejo-Moreno Iván
Pullin D. I.
Publication venue
Publication date: 01/01/2008
Field of study

A multi-scale methodology for the study of the non-local geometry of eddy structures in turbulence is developed. Starting from a given three-dimensional field, this consists of three main steps: extraction, characterization and classification of structures. The extraction step is done in two stages. First, a multi-scale decomposition based on the curvelet transform is applied to the full three-dimensional field, resulting in a finite set of component three-dimensional fields, one per scale. Second, by iso-contouring each component field at one or more iso-contour levels, a set of closed iso-surfaces is obtained that represents the structures at that scale. The characterization stage is based on the joint probability density function (p.d.f.), in terms of area coverage on each individual iso-surface, of two differential-geometry properties, the shape index and curvedness, plus the stretching parameter, a dimensionless global invariant of the surface. Taken together, this defines the geometrical signature of the iso-surface. The classification step is based on the construction of a finite set of parameters, obtained from algebraic functions of moments of the joint p.d.f. of each structure, that specify its location as a point in a multi-dimensional ‘feature space’. At each scale the set of points in feature space represents all structures at that scale, for the specified iso-contour value. This then allows the application, to the set, of clustering techniques that search for groups of structures with a common geometry. Results are presented of a first application of this technique to a passive scalar field obtained from 5123 direct numerical simulation of scalar mixing by forced, isotropic turbulence (Reλ = 265). These show transition, with decreasing scale, from blob-like structures in the larger scales to blob- and tube-like structures with small or moderate stretching in the inertial range of scales, and then toward tube and, predominantly, sheet-like structures with high level of stretching in the dissipation range of scales. Implications of these results for the dynamical behaviour of passive scalar stirring and mixing by turbulence are discussed

CiteSeerX

Caltech Authors

Factor PD-Clustering

Author: A. Ben-Israel
A. K. Jain
A. Montanari
G. Menardi
H. Kiers
M. Vichi
P. Kroonenberg
Publication venue
Publication date: 03/07/2012
Field of study

Factorial clustering methods have been developed in recent years thanks to the improving of computational power. These methods perform a linear transformation of data and a clustering on transformed data optimizing a common criterion. Factorial PD-clustering is based on Probabilistic Distance clustering (PD-clustering). PD-clustering is an iterative, distribution free, probabilistic, clustering method. Factor PD-clustering make a linear transformation of original variables into a reduced number of orthogonal ones using a common criterion with PD-Clustering. It is demonstrated that Tucker 3 decomposition allows to obtain this transformation. Factor PD-clustering makes alternatively a Tucker 3 decomposition and a PD-clustering on transformed data until convergence. This method could significantly improve the algorithm performance and allows to work with large dataset, to improve the stability and the robustness of the method

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Archivio della ricerca - Università degli studi di Napoli Federico II

Crossref

Similarity-based clustering for patterns of extreme values

Author: de Carvalho Miguel
Huser Raphael
Rubio Rodrigo
Publication venue: 'Wiley'
Publication date: 31/12/2023
Field of study

Edinburgh Research Explorer