1,949 research outputs found
Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/
Did Ebola emerge in West Africa by a policy-driven phase change in agroecology? Ebola's social context
SCOPUS: no.jinfo:eu-repo/semantics/publishe
Pair-instability and super-luminous supernova discoveries at z = 2.05, z = 2.50, and z = 3.90
We present the discovery of three super-luminous supernovae (SLSNe) at z = 2 - 4 as part of our survey to detect ultraviolet-luminous supernova at z > 2. SLSNe are ≥10 times more luminous than normal supernova types, reaching peak luminosities of ≳10^(44) erg s^(−1). A small subset of SLSNe (type SLSN-R) exhibit a slow evolution, and thus enormous integrated energies (≳10^(51) erg), consistent with the radiative decay of several solar masses of 56 Ni. SLSN-R are believed to be the deaths of very massive stars, ∼140 - 260 M_⊙, that are theorized to result in pair-instability supernovae. Two of the high redshift SLSNe presented here are consistent with the behavior of SLSN-R out to the extent in which their light curves are sampled, with the third event being consistent with the more rapid fade of the type II-L SLSN SN 2008es at z = 0.205. SLSNe are extremely rare locally but are expected to have been more common in the early Universe and as members of the first generation of stars to form after the Big Bang, the Population III stars. The high intrinsic luminosity of SLSNe and their detectability using our image-stacking technique out to z ∼ 6 provide the first viable route to detect and study the deaths of massive Population III stars which are expected to form in pristine gas at redshifts as low as z ∼ 2
Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics
Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites.
google.com/site/gaussianbhc
Measurements of the UV background at 4.6 < z < 6.4 using the quasar proximity effect
We present measurements of the ionising ultraviolet background (UVB) at z ~
5-6 using the quasar proximity effect. The fifteen quasars in our sample cover
the range 4.6 < z_q < 6.4, enabling the first proximity effect measurements of
the UVB at z > 5. The metagalactic hydrogen ionisation rate, Gamma_bkg, was
determined by modelling the combined ionisation field from the quasar and the
UVB in the proximity zone on a pixel-by-pixel basis. The optical depths in the
spectra were corrected for the expected effect of the quasar until the mean
flux in the proximity region equalled that in the average Ly-alpha forest, and
from this we make a measurement of Gamma_bkg. A number of systematic effects
were tested using synthetic spectra. Noise in the flux was found to be the
largest source of bias at z ~ 5, while uncertainties in the mean transmitted
Ly-alpha flux are responsible for the largest bias at z ~ 6. The impacts of
large-scale overdensities and Lyman limit systems on Gamma_bkg were also
investigated, but found to be small at z > 5. We find a decline in Gamma_bkg
with redshift, from log(Gamma_bkg) = -12.15 0.16 at z ~ 5 to
log(Gamma_bkg) = -12.84 0.18 at z ~ 6 (1 sigma errors). Compared to UVB
measurements at lower redshifts, our measurements suggest a drop of a factor of
five in the HI photoionisation rate between z ~ 4 and z ~ 6. The decline of
Gamma_bkg appears to be gradual, and we find no evidence for a sudden change in
the UVB at any redshift that would indicate a rapid change in the attenuation
length of ionising photons. Combined with recent measurements of the evolution
of the mean free path of ionising photons, our results imply decline in the
emissivity of ionising photons by roughly a factor of two from z ~ 5 to 6,
albeit with significant uncertainty due to the measurement errors in both
Gamma_bkg and the mean free path.Comment: 22 pages, 19 figures, 5 tables; accepted for publication in MNRA
Simulated recovery of LEO objects using sCMOS blind stacking
We present the methodology and results of a simulation to determine the recoverability of LEO objects using a blind stacking technique. The method utilises sCMOS and GPU technology to inject and recover LEO objects in real observed data. We explore the target recovery fraction and pipeline run-time as a function of three optimisation parameters; number of frames per data-set, exposure time, and binning factor. Results are presented as a function of magnitude and velocity. We find that target recovery using blind stacking is significantly more complete, and can reach fainter magnitudes, than using individual frames alone. We present results showing that, depending on the combination of optimisation parameters, recovery fraction is up to 90% of detectable targets for magnitudes up to 13.5, and then falls off steadily up to a magnitude limit around 14.5. Run-time is shown to be a few multiples of the observing time for the best combinations of optimisation parameters, approaching real-time processing
Developing independent investigators for clinical research relevant for Africa
Sustainable research capacity building requires training individuals at multiple levels within a supportive institutional infrastructure to develop a critical mass of independent researchers. At many African medical institutions, a PhD is important for academic promotion and is, therefore, an important focal area for capacity building programs. We examine the training at the Infectious Diseases Institute (IDI) as a model for in-country training based on systems capacity building and attention to the academic environment. PhD training in Africa should provide a strong research foundation for individuals to perform independent, original research and to mentor others. Training the next generation of researchers within excellent indigenous academic centers of excellence with strong institutional infrastructure will empower trainees to ask regionally relevant research questions that will benefit Africans
Web-based alcohol intervention:study of systematic attrition of heavy drinkers
Background: Web-based alcohol interventions are a promising way to reduce alcohol consumption because of their anonymity and the possibility of reaching a high numbers of individuals including heavy drinkers. However, Web-based interventions are often characterized by high rates of attrition. To date, very few studies have investigated whether individuals with higher alcohol consumption show higher attrition rates in Web-based alcohol interventions as compared with individuals with lower alcohol consumption. Objectives: The aim of this study was to examine the attrition rate and predictors of attrition in a Web-based intervention study on alcohol consumption. Methods: The analysis of the predictors of attrition rate was performed on data collected in a Web-based randomized control trial. Data collection took place at the University of Konstanz, Germany. A total of 898 people, which consisted of 46.8% males (420/898) and 53.2% females (478/898) with a mean age of 23.57 years (SD 5.19), initially volunteered to participate in a Web-based intervention study to reduce alcohol consumption. Out of the sample, 86.9% (781/898) were students. Participants were classified as non-completers (439/898, 48.9%) if they did not complete the Web-based intervention. Potential predictors of attrition were self-reported: alcohol consumption in the last seven days, per week, from Monday to Thursday, on weekends, excessive drinking behavior measured with the Alcohol Use Disorder Identification Test (AUDIT), and drinking motives measured by the Drinking Motive Questionnaire (DMQ-R SF). Results: Significant differences between completers and non-completers emerged regarding alcohol consumption in the last seven days (B=-.02, P=.05, 95% CI [0.97-1.00]), on weekends (B=-.05, P=.003, 95% CI [0.92-0.98]), the AUDIT (B=-.06, P=.007, 95% CI [0.90-0.98], and the status as a student (B=.72, P=.001, 95% CI [1.35-3.11]). Most importantly, non-completers had a significantly higher alcohol consumption compared with completers. Conclusions: Hazardous alcohol consumption appears to be a key factor of the dropout rate in a Web-based alcohol intervention study. Thus, it is important to develop strategies to keep participants who are at high risk in Web-based interventions
A Mismatch in the Ultraviolet Spectra between Low-Redshift and Intermediate-Redshift Type Ia Supernovae as a Possible Systematic Uncertainty for Supernova Cosmology
We present Keck high-quality rest-frame ultraviolet (UV) through optical
spectra of 21 Type Ia supernovae (SNe Ia) in the redshift range 0.11 < z < 0.37
and a mean redshift of 0.22 that were discovered during the Sloan Digital Sky
Survey-II (SDSS-II) SN Survey. Using the broad-band photometry of the SDSS
survey, we are able to reconstruct the SN host-galaxy spectral energy
distributions (SEDs), allowing for a correction for the host-galaxy
contamination in the SN Ia spectra. Comparison of composite spectra constructed
from a subsample of 17 high-quality spectra to those created from a
low-redshift sample with otherwise similar properties shows that the Keck/SDSS
SNe Ia have, on average, extremely similar rest-frame optical spectra but show
a UV flux excess. This observation is confirmed by comparing synthesized
broad-band colors of the individual spectra, showing a difference in mean
colors at the 2.4 - 4.4 sigma level for various UV colors. We further see a
slight difference in the UV spectral shape between SNe with low-mass and
high-mass host galaxies. Additionally, we detect a relationship between the
flux ratio at 2770 and 2900 A and peak luminosity that differs from that
observed at low redshift. We find that changing the UV SED of an SN Ia within
the observed dispersion can change the inferred distance moduli by ~0.1 mag.
This effect only occurs when the data probe the rest-frame UV. We suggest that
this discrepancy could be due to differences in the host-galaxy population of
the two SN samples or to small-sample statistics.Comment: 28 pages, 21 figures, accepted by AJ, spectra are available at
http://www.cfa.harvard.edu/~rfoley/data
- …