3,401 research outputs found
Coefficient of intrinsic dependence: a new measure of association
To detect dependence among variables is an essential task in many scientific
investigations. In this study we propose a new measure of association, the coefficient
of intrinsic dependence (CID), which takes value in [0,1] and faithfully reflects the full
range of dependence for two random variables. The CID is free of distributional and
functional assumptions. It can be easily implemented and extended to multivariate
situations.
Traditionally, the correlation coefficient is the preferred measure of association.
However, it's effectiveness is considerably compromised when the random variables
are not normally distributed. Besides, the interpretation of the correlation coefficient
is difficult when the data are categorical. By contrast, the CID is free of these problems.
In our simulation studies, we find that the ability of the CID in differentiating
different levels of dependence remains robust across different data types (categorical
or continuous) and model features (linear or curvilinear). Also, the CID is particularly
effective when the dependence is strong, making it a powerful tool for variable
selection.
As an illustration, the CID is applied to variable selection in two aspects: classification
and prediction. The analysis of actual data from a study of breast cancer gene expression
is included. For the classification problem, we identify a pair of genes that best
classify a patient's prognosis signature, and for the prediction problem, we identify a
pair of genes that best relates to the expression of a specific gene
Statistical identification of gene association by CID in application of constructing ER regulatory network
<p>Abstract</p> <p>Background</p> <p>A variety of high-throughput techniques are now available for constructing comprehensive gene regulatory networks in systems biology. In this study, we report a new statistical approach for facilitating <it>in silico </it>inference of regulatory network structure. The new measure of association, coefficient of intrinsic dependence (CID), is model-free and can be applied to both continuous and categorical distributions. When given two variables X and Y, CID answers whether Y is dependent on X by examining the conditional distribution of Y given X. In this paper, we apply CID to analyze the regulatory relationships between transcription factors (TFs) (X) and their downstream genes (Y) based on clinical data. More specifically, we use estrogen receptor α (ERα) as the variable X, and the analyses are based on 48 clinical breast cancer gene expression arrays (48A).</p> <p>Results</p> <p>The analytical utility of CID was evaluated in comparison with four commonly used statistical methods, Galton-Pearson's correlation coefficient (GPCC), Student's <it>t</it>-test (STT), coefficient of determination (CoD), and mutual information (MI). When being compared to GPCC, CoD, and MI, CID reveals its preferential ability to discover the regulatory association where distribution of the mRNA expression levels on X and Y does not fit linear models. On the other hand, when CID is used to measure the association of a continuous variable (Y) against a discrete variable (X), it shows similar performance as compared to STT, and appears to outperform CoD and MI. In addition, this study established a two-layer transcriptional regulatory network to exemplify the usage of CID, in combination with GPCC, in deciphering gene networks based on gene expression profiles from patient arrays.</p> <p>Conclusion</p> <p>CID is shown to provide useful information for identifying associations between genes and transcription factors of interest in patient arrays. When coupled with the relationships detected by GPCC, the association predicted by CID are applicable to the construction of transcriptional regulatory networks. This study shows how information from different data sources and learning algorithms can be integrated to investigate whether relevant regulatory mechanisms identified in cell models can also be partially re-identified in clinical samples of breast cancers.</p> <p>Availability</p> <p>the implementation of CID in R codes can be freely downloaded from <url>http://homepage.ntu.edu.tw/~lyliu/BC/</url>.</p
Evidence for Quasar Activity Triggered by Galaxy Mergers in HST Observations of Dust-reddened Quasars
We present Hubble ACS images of thirteen dust reddened Type-1 quasars
selected from the FIRST/2MASS Red Quasar Survey. These quasars have high
intrinsic luminosities after correction for dust obscuration (-23.5 > M_B >
-26.2 from K-magnitude). The images show strong evidence of recent or ongoing
interaction in eleven of the thirteen cases, even before the quasar nucleus is
subtracted. None of the host galaxies are well fit by a simple elliptical
profile. The fraction of quasars showing interaction is significantly higher
than the 30% seen in samples of host galaxies of normal, unobscured quasars.
There is a weak correlation between the amount of dust reddening and the
magnitude of interaction in the host galaxy, measured using the Gini
coefficient and the Concentration index. Although few host galaxy studies of
normal quasars are matched to ours in intrinsic quasar luminosity, no evidence
has been found for a strong dependence of merger activity on host luminosity in
samples of the host galaxies of normal quasars. We thus believe that the high
merger fraction in our sample is related to their obscured nature, with a
significant amount of reddening occurring in the host galaxy. The red quasar
phenomenon seems to have an evolutionary explanation, with the young quasar
spending the early part of its lifetime enshrouded in an interacting galaxy.
This might be further indication of a link between AGN and starburst galaxies.Comment: 18 pages, 6 low resolution figures, accepted for publication in Ap
Design, fabrication, and delivery of a charge injection device as a stellar tracking device
Six 128 x 128 CID imagers fabricated on bulk silicon and with thin polysilicon upper-level electrodes were tested in a star tracking mode. Noise and spectral response were measured as a function of temperature over the range of +25 C to -40 C. Noise at 0 C and below was less than 40 rms carriers/pixel for all devices at an effective noise bandwidth of 150 Hz. Quantum yield for all devices averaged 40% from 0.4 to 1.0 microns with no measurable temperature dependence. Extrapolating from these performance parameters to those of a large (400 x 400) array and accounting for design and processing improvements, indicates that the larger array would show a further improvement in noise performance -- on the order of 25 carriers. A preliminary evaluation of the projected performance of the 400 x 400 array and a representative set of star sensor requirements indicates that the CID has excellent potential as a stellar tracking device
Fitting Analysis using Differential Evolution Optimization (FADO): Spectral population synthesis through genetic optimization under self-consistency boundary conditions
The goal of population spectral synthesis (PSS) is to decipher from the
spectrum of a galaxy the mass, age and metallicity of its constituent stellar
populations. This technique has been established as a fundamental tool in
extragalactic research. It has been extensively applied to large spectroscopic
data sets, notably the SDSS, leading to important insights into the galaxy
assembly history. However, despite significant improvements over the past
decade, all current PSS codes suffer from two major deficiencies that inhibit
us from gaining sharp insights into the star-formation history (SFH) of
galaxies and potentially introduce substantial biases in studies of their
physical properties (e.g., stellar mass, mass-weighted stellar age and specific
star formation rate). These are i) the neglect of nebular emission in spectral
fits, consequently, ii) the lack of a mechanism that ensures consistency
between the best-fitting SFH and the observed nebular emission characteristics
of a star-forming (SF) galaxy. In this article, we present FADO (Fitting
Analysis using Differential evolution Optimization): a conceptually novel,
publicly available PSS tool with the distinctive capability of permitting
identification of the SFH that reproduces the observed nebular characteristics
of a SF galaxy. This so-far unique self-consistency concept allows us to
significantly alleviate degeneracies in current spectral synthesis. The
innovative character of FADO is further augmented by its mathematical
foundation: FADO is the first PSS code employing genetic differential evolution
optimization. This, in conjunction with other unique elements in its
mathematical concept (e.g., optimization of the spectral library using
artificial intelligence, convergence test, quasi-parallelization) results in
key improvements with respect to computational efficiency and uniqueness of the
best-fitting SFHs.Comment: 25 pages, 12 figures, A&A accepte
A detailed look at the stellar populations in green valley galaxies
\require{mediawiki-texvc}The green valley (GV) represents an important
transitional state from actively star-forming galaxies to passively evolving
systems. Its traditional definition, based on colour, rests on a number of
assumptions that can be subject to non-trivial systematics. In Angthopo et al.
(2019), we proposed a new definition of the GV based on the 4000 break
strength. In this paper, we explore in detail the properties of the underlying
stellar populations by use of ~230 thousand high-quality spectra from the Sloan
Digital Sky Survey (SDSS), contrasting our results with a traditional approach
via dust-corrected colours. We explore high quality stacked SDSS spectra, and
find a population trend that suggests a substantial difference between low- and
high-mass galaxies, with the former featuring younger populations with star
formation quenching, and the latter showing older (post-quenching) populations
that include rejuvenation events. Subtle but measurable differences are found
between a colour-based approach and our definition, especially as our selection
of GV galaxies produces a cleaner "stratification" of the GV, with more
homogeneous population properties within sections of the GV. Our definition
based on 4000 break strength gives a clean representation of the
transition to quiescence, easily measurable in the upcoming and future
spectroscopic surveys.Comment: 20 pages, 13+3 figures. Accepted for publication in MNRA
Galaxy properties from J-PAS narrow-band photometry
We study the consistency of the physical properties of galaxies retrieved
from SED-fitting as a function of spectral resolution and signal-to-noise ratio
(SNR). Using a selection of physically motivated star formation histories, we
set up a control sample of mock galaxy spectra representing observations of the
local universe in high-resolution spectroscopy, and in 56 narrow-band and 5
broad-band photometry. We fit the SEDs at these spectral resolutions and
compute their corresponding the stellar mass, the mass- and luminosity-weighted
age and metallicity, and the dust extinction. We study the biases,
correlations, and degeneracies affecting the retrieved parameters and explore
the r\^ole of the spectral resolution and the SNR in regulating these
degeneracies. We find that narrow-band photometry and spectroscopy yield
similar trends in the physical properties derived, the former being
considerably more precise. Using a galaxy sample from the SDSS, we compare more
realistically the results obtained from high-resolution and narrow-band SEDs
(synthesized from the same SDSS spectra) following the same spectral fitting
procedures. We use results from the literature as a benchmark to our
spectroscopic estimates and show that the prior PDFs, commonly adopted in
parametric methods, may introduce biases not accounted for in a Bayesian
framework. We conclude that narrow-band photometry yields the same trend in the
age-metallicity relation in the literature, provided it is affected by the same
biases as spectroscopy; albeit the precision achieved with the latter is
generally twice as large as with the narrow-band, at SNR values typical of the
different kinds of data.Comment: 26 pages, 15 figures. Accepted for publication in MNRA
SDSS-IV MaNGA: Stellar initial mass function variation inferred from Bayesian analysis of the integral field spectroscopy of early type galaxies
We analyze the stellar initial mass functions (IMF) of a large sample of
early type galaxies (ETGs) provided by MaNGA. The large number of IFU spectra
of individual galaxies provide high signal-to-noise composite spectra that are
essential for constraining IMF and to investigate possible radial gradients of
the IMF within individual galaxies. The large sample of ETGs also make it
possible to study how the IMF shape depends on various properties of galaxies.
We adopt a novel approach to study IMF variations in ETGs, use Bayesian
inferences based on full spectrum fitting. The Bayesian method provides a
statistically rigorous way to explore potential degeneracy in spectrum fitting,
and to distinguish different IMF models with Bayesian evidence. We find that
the IMF slope depends systematically on galaxy velocity dispersion, in that
galaxies of higher velocity dispersion prefer a more bottom-heavy IMF, but the
dependence is almost entirely due to the change of metallicity, , with
velocity dispersion. The IMF shape also depends on stellar age, , but the
dependence is completely degenerate with that on metallicity through a
combination . Using independent age and metallicity estimates we
find that the IMF variation is produced by metallicity instead of age. The IMF
near the centers of massive ETGs appears more bottom-heavy than that in the
outer parts, while a weak opposite trend is seen for low-mass ETGs.
Uncertainties produced by star formation history, dust extinction,
-element abundance enhancement and noise in the spectra are tested.Comment: 21 pages,20 figures, accepted for publication in MNRA
Fitting the integrated Spectral Energy Distributions of Galaxies
Fitting the spectral energy distributions (SEDs) of galaxies is an almost
universally used technique that has matured significantly in the last decade.
Model predictions and fitting procedures have improved significantly over this
time, attempting to keep up with the vastly increased volume and quality of
available data. We review here the field of SED fitting, describing the
modelling of ultraviolet to infrared galaxy SEDs, the creation of
multiwavelength data sets, and the methods used to fit model SEDs to observed
galaxy data sets. We touch upon the achievements and challenges in the major
ingredients of SED fitting, with a special emphasis on describing the interplay
between the quality of the available data, the quality of the available models,
and the best fitting technique to use in order to obtain a realistic
measurement as well as realistic uncertainties. We conclude that SED fitting
can be used effectively to derive a range of physical properties of galaxies,
such as redshift, stellar masses, star formation rates, dust masses, and
metallicities, with care taken not to over-interpret the available data. Yet
there still exist many issues such as estimating the age of the oldest stars in
a galaxy, finer details ofdust properties and dust-star geometry, and the
influences of poorly understood, luminous stellar types and phases. The
challenge for the coming years will be to improve both the models and the
observational data sets to resolve these uncertainties. The present review will
be made available on an interactive, moderated web page (sedfitting.org), where
the community can access and change the text. The intention is to expand the
text and keep it up to date over the coming years.Comment: 54 pages, 26 figures, Accepted for publication in Astrophysics &
Space Scienc
- …