42,766 research outputs found
Detecting Bimodality in Astronomical Datasets
We discuss statistical techniques for detecting and quantifying bimodality in
astronomical datasets. We concentrate on the KMM algorithm, which estimates the
statistical significance of bimodality in such datasets and objectively
partitions data into sub-populations. By simulating bimodal distributions with
a range of properties we investigate the sensitivity of KMM to datasets with
varying characteristics. Our results facilitate the planning of optimal
observing strategies for systems where bimodality is suspected.
Mixture-modeling algorithms similar to the KMM algorithm have been used in
previous studies to partition the stellar population of the Milky Way into
subsystems. We illustrate the broad applicability of KMM by analysing published
data on globular cluster metallicity distributions, velocity distributions of
galaxies in clusters, and burst durations of gamma-ray sources. PostScript
versions of the tables and figures, as well as FORTRAN code for KMM and
instructions for its use, are available by anonymous ftp from
kula.phsx.ukans.edu.Comment: 32 page
Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas
<br>This paper presents a finite mixture of multivariate betas as a new model-based clustering method tailored to applications where the feature space is constrained to the unit hypercube. The mixture component densities are taken to be conditionally independent, univariate unimodal beta densities (from the subclass of reparameterized beta densities given by Bagnato and Punzo 2013). The EM algorithm used to fit this mixture is discussed in detail, and results from both this beta mixture model and the more standard Gaussian model-based clustering are presented for simulated skill mastery data from a common cognitive diagnosis model and for real data from the Assistment System online mathematics tutor (Feng et al 2009). The multivariate beta mixture appears to outperform the standard Gaussian model-based clustering approach, as would be expected on the constrained space. Fewer components are selected (by BIC-ICL) in the beta mixture than in the Gaussian mixture, and the resulting clusters seem more reasonable and interpretable.</br>
<br>This article is in technical report form, the final publication is available at http://www.springerlink.com/openurl.asp?genre=article &id=doi:10.1007/s11634-013-0149-z</br>
A probabilistic approach to emission-line galaxy classification
We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional
emission-line classification schemes of galaxy ionization sources: the
Baldwin-Phillips-Terlevich (BPT) and vs. [NII]/H
(WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey
Data Release 7 and SEAGal/STARLIGHT datasets. We apply a GMM to empirically
define classes of galaxies in a three-dimensional space spanned by the
[OIII]/H, [NII]/H, and EW(H), optical
parameters. The best-fit GMM based on several statistical criteria suggests a
solution around four Gaussian components (GCs), which are capable to explain up
to 97 per cent of the data variance. Using elements of information theory, we
compare each GC to their respective astronomical counterpart. GC1 and GC4 are
associated with star-forming galaxies, suggesting the need to define a new
starburst subgroup. GC2 is associated with BPT's Active Galaxy Nuclei (AGN)
class and WHAN's weak AGN class. GC3 is associated with BPT's composite class
and WHAN's strong AGN class. Conversely, there is no statistical evidence --
based on four GCs -- for the existence of a Seyfert/LINER dichotomy in our
sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The
GC5 appears associated to the LINER and Passive galaxies on the BPT and WHAN
diagrams respectively. Subtleties aside, we demonstrate the potential of our
methodology to recover/unravel different objects inside the wilderness of
astronomical datasets, without lacking the ability to convey physically
interpretable results. The probabilistic classifications from the GMM analysis
are publicly available within the COINtoolbox
(https://cointoolbox.github.io/GMM\_Catalogue/).Comment: Accepted for publication in MNRA
Where are compact groups in the local Universe?
The purpose of this work is to perform a statistical analysis of the location
of compact groups in the Universe from observational and semi-analytical points
of view. We used the velocity-filtered compact group sample extracted from the
Two Micron All Sky Survey for our analysis. We also used a new sample of galaxy
groups identified in the 2M++ galaxy redshift catalogue as tracers of the
large-scale structure. We defined a procedure to search in redshift space for
compact groups that can be considered embedded in other overdense systems and
applied this criterion to several possible combinations of different compact
and galaxy group subsamples. We also performed similar analyses for simulated
compact and galaxy groups identified in a 2M++ mock galaxy catalogue
constructed from the Millennium Run Simulation I plus a semi-analytical model
of galaxy formation. We observed that only of the compact groups can
be considered to be embedded in larger overdense systems, that is, most of the
compact groups are more likely to be isolated systems. The embedded compact
groups show statistically smaller sizes and brighter surface brightnesses than
non-embedded systems. No evidence was found that embedded compact groups are
more likely to inhabit galaxy groups with a given virial mass or with a
particular dynamical state. We found very similar results when the analysis was
performed using mock compact and galaxy groups. Based on the semi-analytical
studies, we predict that of the embedded compact groups probably are 3D
physically dense systems. Finally, real space information allowed us to reveal
the bimodal behaviour of the distribution of 3D minimum distances between
compact and galaxy groups. The location of compact groups should be carefully
taken into account when comparing properties of galaxies in environments that
are a priori different.Comment: 14 pages, 5 figures, 8 tables. Accepted for publication in Astronomy
& Astrophysics. Tables B1 and B2 will only be available in electronic form at
the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via
http://cdsweb.u-strasbg.fr/cgi-bin/qcat?J/A+A
Quantifying the Evolutionary Self Structuring of Embodied Cognitive Networks
We outline a possible theoretical framework for the quantitative modeling of
networked embodied cognitive systems. We notice that: 1) information self
structuring through sensory-motor coordination does not deterministically occur
in Rn vector space, a generic multivariable space, but in SE(3), the group
structure of the possible motions of a body in space; 2) it happens in a
stochastic open ended environment. These observations may simplify, at the
price of a certain abstraction, the modeling and the design of self
organization processes based on the maximization of some informational
measures, such as mutual information. Furthermore, by providing closed form or
computationally lighter algorithms, it may significantly reduce the
computational burden of their implementation. We propose a modeling framework
which aims to give new tools for the design of networks of new artificial self
organizing, embodied and intelligent agents and the reverse engineering of
natural ones. At this point, it represents much a theoretical conjecture and it
has still to be experimentally verified whether this model will be useful in
practice.
Bridge Simulation and Metric Estimation on Landmark Manifolds
We present an inference algorithm and connected Monte Carlo based estimation
procedures for metric estimation from landmark configurations distributed
according to the transition distribution of a Riemannian Brownian motion
arising from the Large Deformation Diffeomorphic Metric Mapping (LDDMM) metric.
The distribution possesses properties similar to the regular Euclidean normal
distribution but its transition density is governed by a high-dimensional PDE
with no closed-form solution in the nonlinear case. We show how the density can
be numerically approximated by Monte Carlo sampling of conditioned Brownian
bridges, and we use this to estimate parameters of the LDDMM kernel and thus
the metric structure by maximum likelihood
Maximum Fidelity
The most fundamental problem in statistics is the inference of an unknown
probability distribution from a finite number of samples. For a specific
observed data set, answers to the following questions would be desirable: (1)
Estimation: Which candidate distribution provides the best fit to the observed
data?, (2) Goodness-of-fit: How concordant is this distribution with the
observed data?, and (3) Uncertainty: How concordant are other candidate
distributions with the observed data? A simple unified approach for univariate
data that addresses these traditionally distinct statistical notions is
presented called "maximum fidelity". Maximum fidelity is a strict frequentist
approach that is fundamentally based on model concordance with the observed
data. The fidelity statistic is a general information measure based on the
coordinate-independent cumulative distribution and critical yet previously
neglected symmetry considerations. An approximation for the null distribution
of the fidelity allows its direct conversion to absolute model concordance (p
value). Fidelity maximization allows identification of the most concordant
model distribution, generating a method for parameter estimation, with
neighboring, less concordant distributions providing the "uncertainty" in this
estimate. Maximum fidelity provides an optimal approach for parameter
estimation (superior to maximum likelihood) and a generally optimal approach
for goodness-of-fit assessment of arbitrary models applied to univariate data.
Extensions to binary data, binned data, multidimensional data, and classical
parametric and nonparametric statistical tests are described. Maximum fidelity
provides a philosophically consistent, robust, and seemingly optimal foundation
for statistical inference. All findings are presented in an elementary way to
be immediately accessible to all researchers utilizing statistical analysis.Comment: 66 pages, 32 figures, 7 tables, submitte
- …