1,203 research outputs found
Clustering Via Nonparametric Density Estimation: the R Package pdfCluster
The R package pdfCluster performs cluster analysis based on a nonparametric
estimate of the density of the observed variables. After summarizing the main
aspects of the methodology, we describe the features and the usage of the
package, and finally illustrate its working with the aid of two datasets
A probabilistic approach to emission-line galaxy classification
We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional
emission-line classification schemes of galaxy ionization sources: the
Baldwin-Phillips-Terlevich (BPT) and vs. [NII]/H
(WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey
Data Release 7 and SEAGal/STARLIGHT datasets. We apply a GMM to empirically
define classes of galaxies in a three-dimensional space spanned by the
[OIII]/H, [NII]/H, and EW(H), optical
parameters. The best-fit GMM based on several statistical criteria suggests a
solution around four Gaussian components (GCs), which are capable to explain up
to 97 per cent of the data variance. Using elements of information theory, we
compare each GC to their respective astronomical counterpart. GC1 and GC4 are
associated with star-forming galaxies, suggesting the need to define a new
starburst subgroup. GC2 is associated with BPT's Active Galaxy Nuclei (AGN)
class and WHAN's weak AGN class. GC3 is associated with BPT's composite class
and WHAN's strong AGN class. Conversely, there is no statistical evidence --
based on four GCs -- for the existence of a Seyfert/LINER dichotomy in our
sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The
GC5 appears associated to the LINER and Passive galaxies on the BPT and WHAN
diagrams respectively. Subtleties aside, we demonstrate the potential of our
methodology to recover/unravel different objects inside the wilderness of
astronomical datasets, without lacking the ability to convey physically
interpretable results. The probabilistic classifications from the GMM analysis
are publicly available within the COINtoolbox
(https://cointoolbox.github.io/GMM\_Catalogue/).Comment: Accepted for publication in MNRA
Inflammation-associated enterotypes, host genotype, cage and inter-individual effects drive gut microbiota variation in common laboratory mice
Background: Murine models are a crucial component of gut microbiome research. Unfortunately, a multitude of genetic backgrounds and experimental setups, together with inter-individual variation, complicates cross-study comparisons and a global understanding of the mouse microbiota landscape. Here, we investigate the variability of the healthy mouse microbiota of five common lab mouse strains using 16S rDNA pyrosequencing.
Results: We find initial evidence for richness-driven, strain-independent murine enterotypes that show a striking resemblance to those in human, and which associate with calprotectin levels, a marker for intestinal inflammation. After enterotype stratification, we find that genetic, caging and inter-individual variation contribute on average 19%, 31.7% and 45.5%, respectively, to the variance in the murine gut microbiota composition. Genetic distance correlates positively to microbiota distance, so that genetically similar strains have more similar microbiota than genetically distant ones. Specific mouse strains are enriched for specific operational taxonomic units and taxonomic groups, while the 'cage effect' can occur across mouse strain boundaries and is mainly driven by Helicobacter infections.
Conclusions: The detection of enterotypes suggests a common ecological cause, possibly low-grade inflammation that might drive differences among gut microbiota composition in mammals. Furthermore, the observed environmental and genetic effects have important consequences for experimental design in mouse microbiome research
On the non-local geometry of turbulence
A multi-scale methodology for the study of the non-local geometry of eddy structures in turbulence is developed. Starting from a given three-dimensional field, this consists of three main steps: extraction, characterization and classification of structures. The extraction step is done in two stages. First, a multi-scale decomposition based on the curvelet transform is applied to the full three-dimensional field, resulting in a finite set of component three-dimensional fields, one per scale. Second, by iso-contouring each component field at one or more iso-contour levels, a set of closed iso-surfaces is obtained that represents the structures at that scale. The characterization stage is based on the joint probability density function (p.d.f.), in terms of area coverage on each individual iso-surface, of two differential-geometry properties, the shape index and curvedness, plus the stretching parameter, a dimensionless global invariant of the surface. Taken together, this defines the geometrical signature of the iso-surface. The classification step is based on the construction of a finite set of parameters, obtained from algebraic functions of moments of the joint p.d.f. of each structure, that specify its location as a point in a multi-dimensional ‘feature space’. At each scale the set of points in feature space represents all structures at that scale, for the specified iso-contour value. This then allows the application, to the set, of clustering techniques that search for groups of structures with a common geometry. Results are presented of a first application of this technique to a passive scalar field obtained from 5123 direct numerical simulation of scalar mixing by forced, isotropic turbulence (Reλ = 265). These show transition, with decreasing scale, from blob-like structures in the larger scales to blob- and tube-like structures with small or moderate stretching in the inertial range of scales, and then toward tube and, predominantly, sheet-like structures with high level of stretching in the dissipation range of scales. Implications of these results for the dynamical behaviour of passive scalar stirring and mixing by turbulence are discussed
Factor PD-Clustering
Factorial clustering methods have been developed in recent years thanks to
the improving of computational power. These methods perform a linear
transformation of data and a clustering on transformed data optimizing a common
criterion. Factorial PD-clustering is based on Probabilistic Distance
clustering (PD-clustering). PD-clustering is an iterative, distribution free,
probabilistic, clustering method. Factor PD-clustering make a linear
transformation of original variables into a reduced number of orthogonal ones
using a common criterion with PD-Clustering. It is demonstrated that Tucker 3
decomposition allows to obtain this transformation. Factor PD-clustering makes
alternatively a Tucker 3 decomposition and a PD-clustering on transformed data
until convergence. This method could significantly improve the algorithm
performance and allows to work with large dataset, to improve the stability and
the robustness of the method
- …