3,401 research outputs found

    Coefficient of intrinsic dependence: a new measure of association

    Get PDF
    To detect dependence among variables is an essential task in many scientific investigations. In this study we propose a new measure of association, the coefficient of intrinsic dependence (CID), which takes value in [0,1] and faithfully reflects the full range of dependence for two random variables. The CID is free of distributional and functional assumptions. It can be easily implemented and extended to multivariate situations. Traditionally, the correlation coefficient is the preferred measure of association. However, it's effectiveness is considerably compromised when the random variables are not normally distributed. Besides, the interpretation of the correlation coefficient is difficult when the data are categorical. By contrast, the CID is free of these problems. In our simulation studies, we find that the ability of the CID in differentiating different levels of dependence remains robust across different data types (categorical or continuous) and model features (linear or curvilinear). Also, the CID is particularly effective when the dependence is strong, making it a powerful tool for variable selection. As an illustration, the CID is applied to variable selection in two aspects: classification and prediction. The analysis of actual data from a study of breast cancer gene expression is included. For the classification problem, we identify a pair of genes that best classify a patient's prognosis signature, and for the prediction problem, we identify a pair of genes that best relates to the expression of a specific gene

    Statistical identification of gene association by CID in application of constructing ER regulatory network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A variety of high-throughput techniques are now available for constructing comprehensive gene regulatory networks in systems biology. In this study, we report a new statistical approach for facilitating <it>in silico </it>inference of regulatory network structure. The new measure of association, coefficient of intrinsic dependence (CID), is model-free and can be applied to both continuous and categorical distributions. When given two variables X and Y, CID answers whether Y is dependent on X by examining the conditional distribution of Y given X. In this paper, we apply CID to analyze the regulatory relationships between transcription factors (TFs) (X) and their downstream genes (Y) based on clinical data. More specifically, we use estrogen receptor α (ERα) as the variable X, and the analyses are based on 48 clinical breast cancer gene expression arrays (48A).</p> <p>Results</p> <p>The analytical utility of CID was evaluated in comparison with four commonly used statistical methods, Galton-Pearson's correlation coefficient (GPCC), Student's <it>t</it>-test (STT), coefficient of determination (CoD), and mutual information (MI). When being compared to GPCC, CoD, and MI, CID reveals its preferential ability to discover the regulatory association where distribution of the mRNA expression levels on X and Y does not fit linear models. On the other hand, when CID is used to measure the association of a continuous variable (Y) against a discrete variable (X), it shows similar performance as compared to STT, and appears to outperform CoD and MI. In addition, this study established a two-layer transcriptional regulatory network to exemplify the usage of CID, in combination with GPCC, in deciphering gene networks based on gene expression profiles from patient arrays.</p> <p>Conclusion</p> <p>CID is shown to provide useful information for identifying associations between genes and transcription factors of interest in patient arrays. When coupled with the relationships detected by GPCC, the association predicted by CID are applicable to the construction of transcriptional regulatory networks. This study shows how information from different data sources and learning algorithms can be integrated to investigate whether relevant regulatory mechanisms identified in cell models can also be partially re-identified in clinical samples of breast cancers.</p> <p>Availability</p> <p>the implementation of CID in R codes can be freely downloaded from <url>http://homepage.ntu.edu.tw/~lyliu/BC/</url>.</p

    Evidence for Quasar Activity Triggered by Galaxy Mergers in HST Observations of Dust-reddened Quasars

    Get PDF
    We present Hubble ACS images of thirteen dust reddened Type-1 quasars selected from the FIRST/2MASS Red Quasar Survey. These quasars have high intrinsic luminosities after correction for dust obscuration (-23.5 > M_B > -26.2 from K-magnitude). The images show strong evidence of recent or ongoing interaction in eleven of the thirteen cases, even before the quasar nucleus is subtracted. None of the host galaxies are well fit by a simple elliptical profile. The fraction of quasars showing interaction is significantly higher than the 30% seen in samples of host galaxies of normal, unobscured quasars. There is a weak correlation between the amount of dust reddening and the magnitude of interaction in the host galaxy, measured using the Gini coefficient and the Concentration index. Although few host galaxy studies of normal quasars are matched to ours in intrinsic quasar luminosity, no evidence has been found for a strong dependence of merger activity on host luminosity in samples of the host galaxies of normal quasars. We thus believe that the high merger fraction in our sample is related to their obscured nature, with a significant amount of reddening occurring in the host galaxy. The red quasar phenomenon seems to have an evolutionary explanation, with the young quasar spending the early part of its lifetime enshrouded in an interacting galaxy. This might be further indication of a link between AGN and starburst galaxies.Comment: 18 pages, 6 low resolution figures, accepted for publication in Ap

    Design, fabrication, and delivery of a charge injection device as a stellar tracking device

    Get PDF
    Six 128 x 128 CID imagers fabricated on bulk silicon and with thin polysilicon upper-level electrodes were tested in a star tracking mode. Noise and spectral response were measured as a function of temperature over the range of +25 C to -40 C. Noise at 0 C and below was less than 40 rms carriers/pixel for all devices at an effective noise bandwidth of 150 Hz. Quantum yield for all devices averaged 40% from 0.4 to 1.0 microns with no measurable temperature dependence. Extrapolating from these performance parameters to those of a large (400 x 400) array and accounting for design and processing improvements, indicates that the larger array would show a further improvement in noise performance -- on the order of 25 carriers. A preliminary evaluation of the projected performance of the 400 x 400 array and a representative set of star sensor requirements indicates that the CID has excellent potential as a stellar tracking device

    Fitting Analysis using Differential Evolution Optimization (FADO): Spectral population synthesis through genetic optimization under self-consistency boundary conditions

    Full text link
    The goal of population spectral synthesis (PSS) is to decipher from the spectrum of a galaxy the mass, age and metallicity of its constituent stellar populations. This technique has been established as a fundamental tool in extragalactic research. It has been extensively applied to large spectroscopic data sets, notably the SDSS, leading to important insights into the galaxy assembly history. However, despite significant improvements over the past decade, all current PSS codes suffer from two major deficiencies that inhibit us from gaining sharp insights into the star-formation history (SFH) of galaxies and potentially introduce substantial biases in studies of their physical properties (e.g., stellar mass, mass-weighted stellar age and specific star formation rate). These are i) the neglect of nebular emission in spectral fits, consequently, ii) the lack of a mechanism that ensures consistency between the best-fitting SFH and the observed nebular emission characteristics of a star-forming (SF) galaxy. In this article, we present FADO (Fitting Analysis using Differential evolution Optimization): a conceptually novel, publicly available PSS tool with the distinctive capability of permitting identification of the SFH that reproduces the observed nebular characteristics of a SF galaxy. This so-far unique self-consistency concept allows us to significantly alleviate degeneracies in current spectral synthesis. The innovative character of FADO is further augmented by its mathematical foundation: FADO is the first PSS code employing genetic differential evolution optimization. This, in conjunction with other unique elements in its mathematical concept (e.g., optimization of the spectral library using artificial intelligence, convergence test, quasi-parallelization) results in key improvements with respect to computational efficiency and uniqueness of the best-fitting SFHs.Comment: 25 pages, 12 figures, A&A accepte

    A detailed look at the stellar populations in green valley galaxies

    Full text link
    \require{mediawiki-texvc}The green valley (GV) represents an important transitional state from actively star-forming galaxies to passively evolving systems. Its traditional definition, based on colour, rests on a number of assumptions that can be subject to non-trivial systematics. In Angthopo et al. (2019), we proposed a new definition of the GV based on the 4000A˚\AA break strength. In this paper, we explore in detail the properties of the underlying stellar populations by use of ~230 thousand high-quality spectra from the Sloan Digital Sky Survey (SDSS), contrasting our results with a traditional approach via dust-corrected colours. We explore high quality stacked SDSS spectra, and find a population trend that suggests a substantial difference between low- and high-mass galaxies, with the former featuring younger populations with star formation quenching, and the latter showing older (post-quenching) populations that include rejuvenation events. Subtle but measurable differences are found between a colour-based approach and our definition, especially as our selection of GV galaxies produces a cleaner "stratification" of the GV, with more homogeneous population properties within sections of the GV. Our definition based on 4000A˚\AA break strength gives a clean representation of the transition to quiescence, easily measurable in the upcoming and future spectroscopic surveys.Comment: 20 pages, 13+3 figures. Accepted for publication in MNRA

    Galaxy properties from J-PAS narrow-band photometry

    Full text link
    We study the consistency of the physical properties of galaxies retrieved from SED-fitting as a function of spectral resolution and signal-to-noise ratio (SNR). Using a selection of physically motivated star formation histories, we set up a control sample of mock galaxy spectra representing observations of the local universe in high-resolution spectroscopy, and in 56 narrow-band and 5 broad-band photometry. We fit the SEDs at these spectral resolutions and compute their corresponding the stellar mass, the mass- and luminosity-weighted age and metallicity, and the dust extinction. We study the biases, correlations, and degeneracies affecting the retrieved parameters and explore the r\^ole of the spectral resolution and the SNR in regulating these degeneracies. We find that narrow-band photometry and spectroscopy yield similar trends in the physical properties derived, the former being considerably more precise. Using a galaxy sample from the SDSS, we compare more realistically the results obtained from high-resolution and narrow-band SEDs (synthesized from the same SDSS spectra) following the same spectral fitting procedures. We use results from the literature as a benchmark to our spectroscopic estimates and show that the prior PDFs, commonly adopted in parametric methods, may introduce biases not accounted for in a Bayesian framework. We conclude that narrow-band photometry yields the same trend in the age-metallicity relation in the literature, provided it is affected by the same biases as spectroscopy; albeit the precision achieved with the latter is generally twice as large as with the narrow-band, at SNR values typical of the different kinds of data.Comment: 26 pages, 15 figures. Accepted for publication in MNRA

    SDSS-IV MaNGA: Stellar initial mass function variation inferred from Bayesian analysis of the integral field spectroscopy of early type galaxies

    Get PDF
    We analyze the stellar initial mass functions (IMF) of a large sample of early type galaxies (ETGs) provided by MaNGA. The large number of IFU spectra of individual galaxies provide high signal-to-noise composite spectra that are essential for constraining IMF and to investigate possible radial gradients of the IMF within individual galaxies. The large sample of ETGs also make it possible to study how the IMF shape depends on various properties of galaxies. We adopt a novel approach to study IMF variations in ETGs, use Bayesian inferences based on full spectrum fitting. The Bayesian method provides a statistically rigorous way to explore potential degeneracy in spectrum fitting, and to distinguish different IMF models with Bayesian evidence. We find that the IMF slope depends systematically on galaxy velocity dispersion, in that galaxies of higher velocity dispersion prefer a more bottom-heavy IMF, but the dependence is almost entirely due to the change of metallicity, ZZ, with velocity dispersion. The IMF shape also depends on stellar age, AA, but the dependence is completely degenerate with that on metallicity through a combination AZ1.42AZ^{-1.42}. Using independent age and metallicity estimates we find that the IMF variation is produced by metallicity instead of age. The IMF near the centers of massive ETGs appears more bottom-heavy than that in the outer parts, while a weak opposite trend is seen for low-mass ETGs. Uncertainties produced by star formation history, dust extinction, α\alpha-element abundance enhancement and noise in the spectra are tested.Comment: 21 pages,20 figures, accepted for publication in MNRA

    Fitting the integrated Spectral Energy Distributions of Galaxies

    Full text link
    Fitting the spectral energy distributions (SEDs) of galaxies is an almost universally used technique that has matured significantly in the last decade. Model predictions and fitting procedures have improved significantly over this time, attempting to keep up with the vastly increased volume and quality of available data. We review here the field of SED fitting, describing the modelling of ultraviolet to infrared galaxy SEDs, the creation of multiwavelength data sets, and the methods used to fit model SEDs to observed galaxy data sets. We touch upon the achievements and challenges in the major ingredients of SED fitting, with a special emphasis on describing the interplay between the quality of the available data, the quality of the available models, and the best fitting technique to use in order to obtain a realistic measurement as well as realistic uncertainties. We conclude that SED fitting can be used effectively to derive a range of physical properties of galaxies, such as redshift, stellar masses, star formation rates, dust masses, and metallicities, with care taken not to over-interpret the available data. Yet there still exist many issues such as estimating the age of the oldest stars in a galaxy, finer details ofdust properties and dust-star geometry, and the influences of poorly understood, luminous stellar types and phases. The challenge for the coming years will be to improve both the models and the observational data sets to resolve these uncertainties. The present review will be made available on an interactive, moderated web page (sedfitting.org), where the community can access and change the text. The intention is to expand the text and keep it up to date over the coming years.Comment: 54 pages, 26 figures, Accepted for publication in Astrophysics & Space Scienc