1,203 research outputs found

    Clustering Via Nonparametric Density Estimation: the R Package pdfCluster

    Get PDF
    The R package pdfCluster performs cluster analysis based on a nonparametric estimate of the density of the observed variables. After summarizing the main aspects of the methodology, we describe the features and the usage of the package, and finally illustrate its working with the aid of two datasets

    A probabilistic approach to emission-line galaxy classification

    Get PDF
    We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional emission-line classification schemes of galaxy ionization sources: the Baldwin-Phillips-Terlevich (BPT) and WHα\rm W_{H\alpha} vs. [NII]/Hα\alpha (WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey Data Release 7 and SEAGal/STARLIGHT datasets. We apply a GMM to empirically define classes of galaxies in a three-dimensional space spanned by the log\log [OIII]/Hβ\beta, log\log [NII]/Hα\alpha, and log\log EW(Hα{\alpha}), optical parameters. The best-fit GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart. GC1 and GC4 are associated with star-forming galaxies, suggesting the need to define a new starburst subgroup. GC2 is associated with BPT's Active Galaxy Nuclei (AGN) class and WHAN's weak AGN class. GC3 is associated with BPT's composite class and WHAN's strong AGN class. Conversely, there is no statistical evidence -- based on four GCs -- for the existence of a Seyfert/LINER dichotomy in our sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The GC5 appears associated to the LINER and Passive galaxies on the BPT and WHAN diagrams respectively. Subtleties aside, we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical datasets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available within the COINtoolbox (https://cointoolbox.github.io/GMM\_Catalogue/).Comment: Accepted for publication in MNRA

    Inflammation-associated enterotypes, host genotype, cage and inter-individual effects drive gut microbiota variation in common laboratory mice

    Get PDF
    Background: Murine models are a crucial component of gut microbiome research. Unfortunately, a multitude of genetic backgrounds and experimental setups, together with inter-individual variation, complicates cross-study comparisons and a global understanding of the mouse microbiota landscape. Here, we investigate the variability of the healthy mouse microbiota of five common lab mouse strains using 16S rDNA pyrosequencing. Results: We find initial evidence for richness-driven, strain-independent murine enterotypes that show a striking resemblance to those in human, and which associate with calprotectin levels, a marker for intestinal inflammation. After enterotype stratification, we find that genetic, caging and inter-individual variation contribute on average 19%, 31.7% and 45.5%, respectively, to the variance in the murine gut microbiota composition. Genetic distance correlates positively to microbiota distance, so that genetically similar strains have more similar microbiota than genetically distant ones. Specific mouse strains are enriched for specific operational taxonomic units and taxonomic groups, while the 'cage effect' can occur across mouse strain boundaries and is mainly driven by Helicobacter infections. Conclusions: The detection of enterotypes suggests a common ecological cause, possibly low-grade inflammation that might drive differences among gut microbiota composition in mammals. Furthermore, the observed environmental and genetic effects have important consequences for experimental design in mouse microbiome research

    On the non-local geometry of turbulence

    Get PDF
    A multi-scale methodology for the study of the non-local geometry of eddy structures in turbulence is developed. Starting from a given three-dimensional field, this consists of three main steps: extraction, characterization and classification of structures. The extraction step is done in two stages. First, a multi-scale decomposition based on the curvelet transform is applied to the full three-dimensional field, resulting in a finite set of component three-dimensional fields, one per scale. Second, by iso-contouring each component field at one or more iso-contour levels, a set of closed iso-surfaces is obtained that represents the structures at that scale. The characterization stage is based on the joint probability density function (p.d.f.), in terms of area coverage on each individual iso-surface, of two differential-geometry properties, the shape index and curvedness, plus the stretching parameter, a dimensionless global invariant of the surface. Taken together, this defines the geometrical signature of the iso-surface. The classification step is based on the construction of a finite set of parameters, obtained from algebraic functions of moments of the joint p.d.f. of each structure, that specify its location as a point in a multi-dimensional ‘feature space’. At each scale the set of points in feature space represents all structures at that scale, for the specified iso-contour value. This then allows the application, to the set, of clustering techniques that search for groups of structures with a common geometry. Results are presented of a first application of this technique to a passive scalar field obtained from 5123 direct numerical simulation of scalar mixing by forced, isotropic turbulence (Reλ = 265). These show transition, with decreasing scale, from blob-like structures in the larger scales to blob- and tube-like structures with small or moderate stretching in the inertial range of scales, and then toward tube and, predominantly, sheet-like structures with high level of stretching in the dissipation range of scales. Implications of these results for the dynamical behaviour of passive scalar stirring and mixing by turbulence are discussed

    Factor PD-Clustering

    Full text link
    Factorial clustering methods have been developed in recent years thanks to the improving of computational power. These methods perform a linear transformation of data and a clustering on transformed data optimizing a common criterion. Factorial PD-clustering is based on Probabilistic Distance clustering (PD-clustering). PD-clustering is an iterative, distribution free, probabilistic, clustering method. Factor PD-clustering make a linear transformation of original variables into a reduced number of orthogonal ones using a common criterion with PD-Clustering. It is demonstrated that Tucker 3 decomposition allows to obtain this transformation. Factor PD-clustering makes alternatively a Tucker 3 decomposition and a PD-clustering on transformed data until convergence. This method could significantly improve the algorithm performance and allows to work with large dataset, to improve the stability and the robustness of the method
    corecore