13,298 research outputs found
Iterative Random Forests to detect predictive and stable high-order interactions
Genomics has revolutionized biology, enabling the interrogation of whole
transcriptomes, genome-wide binding sites for proteins, and many other
molecular processes. However, individual genomic assays measure elements that
interact in vivo as components of larger molecular machines. Understanding how
these high-order interactions drive gene expression presents a substantial
statistical challenge. Building on Random Forests (RF), Random Intersection
Trees (RITs), and through extensive, biologically inspired simulations, we
developed the iterative Random Forest algorithm (iRF). iRF trains a
feature-weighted ensemble of decision trees to detect stable, high-order
interactions with same order of computational cost as RF. We demonstrate the
utility of iRF for high-order interaction discovery in two prediction problems:
enhancer activity in the early Drosophila embryo and alternative splicing of
primary transcripts in human derived cell lines. In Drosophila, among the 20
pairwise transcription factor interactions iRF identifies as stable (returned
in more than half of bootstrap replicates), 80% have been previously reported
as physical interactions. Moreover, novel third-order interactions, e.g.
between Zelda (Zld), Giant (Gt), and Twist (Twi), suggest high-order
relationships that are candidates for follow-up experiments. In human-derived
cells, iRF re-discovered a central role of H3K36me3 in chromatin-mediated
splicing regulation, and identified novel 5th and 6th order interactions,
indicative of multi-valent nucleosomes with specific roles in splicing
regulation. By decoupling the order of interactions from the computational cost
of identification, iRF opens new avenues of inquiry into the molecular
mechanisms underlying genome biology
Refining interaction search through signed iterative Random Forests
Advances in supervised learning have enabled accurate prediction in
biological systems governed by complex interactions among biomolecules.
However, state-of-the-art predictive algorithms are typically black-boxes,
learning statistical interactions that are difficult to translate into testable
hypotheses. The iterative Random Forest algorithm took a step towards bridging
this gap by providing a computationally tractable procedure to identify the
stable, high-order feature interactions that drive the predictive accuracy of
Random Forests (RF). Here we refine the interactions identified by iRF to
explicitly map responses as a function of interacting features. Our method,
signed iRF, describes subsets of rules that frequently occur on RF decision
paths. We refer to these rule subsets as signed interactions. Signed
interactions share not only the same set of interacting features but also
exhibit similar thresholding behavior, and thus describe a consistent
functional relationship between interacting features and responses. We describe
stable and predictive importance metrics to rank signed interactions. For each
SPIM, we define null importance metrics that characterize its expected behavior
under known structure. We evaluate our proposed approach in biologically
inspired simulations and two case studies: predicting enhancer activity and
spatial gene expression patterns. In the case of enhancer activity, s-iRF
recovers one of the few experimentally validated high-order interactions and
suggests novel enhancer elements where this interaction may be active. In the
case of spatial gene expression patterns, s-iRF recovers all 11 reported links
in the gap gene network. By refining the process of interaction recovery, our
approach has the potential to guide mechanistic inquiry into systems whose
scale and complexity is beyond human comprehension
Firm Size, Technical Change and Wages: Evidence from the Pork Sector from 1990-2005
A long-standing puzzle in labor economics has been the positive relationship between wages and firm size. Even after controlling for worker's observed characteristics such as education, work experience, gender, and geographic location, a significant firm size wage effect averaging 15 percent remains. This paper investigates whether the size-wage premium on hog farms persists over time and whether the magnitude is growing or shrinking. The paper pays particular attention to the matching process by which workers are allocated to farms of different size and technology use, and whether the matching process may explain differences in wages across farms. The study relies on four surveys of employees on hog farms collected in 1990, 1995, 2000, and 2005. The survey was conducted across the United States. The data allow us to evaluate how farm size and technology adoption have changed over time and how employee pay has changed in response to these changes. Detailed investigations of these pay differences between small and large hog farms and between farms using few and many technologies show that the differences cannot be explained away by differences in the education, work experience, or geographic location of the farm. Although more educated and experienced workers are more likely to work on larger and more technologically advanced hog farms, the positive relationships between wages and both farm size and technology remain large and statistically significant when differences in observable worker attributes are controlled. Furthermore, these effects are reinforcing in that large hog farms also adopt more technologies, and so the firm size effect persists even after differences in the number of technologies are held constant. The size-wage and technology-wage prema have persisted over time, and we cannot reject the null hypothesis that the premia are constant over the sample period.Agribusiness, Livestock Production/Industries,
Non-gaussianity of optical emission lines in SDSS star-forming galaxies and its implications on galactic outflows
The shape of emission lines in the optical spectra of star-forming galaxies
reveals the kinematics of the diffuse gaseous component. We analyse the shape
of prominent emission lines in a sample of ~53,000 star-forming galaxies from
the Sloan Digital Sky Survey, focusing on departures from gaussianity.
Departures from a single gaussian profile allow us to probe the motion of gas
and to assess the role of outflows. The sample is divided into groups according
to their stellar velocity dispersion and star formation rate. The spectra
within each group are stacked to improve the signal-to-noise ratio of the
emission lines, to remove individual signatures, and to enhance the effect of
star formation rate on the shapes of the emission lines. The moments of the
emission lines, including kurtosis and skewness, are determined. We find that
most of the emission lines in strong star-forming systems unequivocally feature
negative kurtosis. This signature is present in H, H, [N II] and
[S II] in massive galaxies with high star formation rates. We attribute it as
evidence of radial outflows of ionised gas driven by the star formation of the
galaxies. Also, most of the emission lines in low-mass systems with high star
formation rates feature negative skewness, and we interpret it as evidence of
dust obscuration in the galactic disk. These signatures are however absent in
the [O III] line, which is believed to trace a different gas component. The
observed trend is significantly stronger in face-on galaxies, indicating that
star formation drives the outflows along the galactic rotation axis, presumably
the path of least resistance. The data suggest that outflows driven by star
formation exert accumulated impacts on the interstellar medium, and the outflow
signature is more evident in older galaxies as they have experienced a longer
total duration of star formation.Comment: 16 pages, 13 figures, Accepted for publication in PAS
Dynamics of correlations due to a phase noisy laser
We analyze the dynamics of various kinds of correlations present between two
initially entangled independent qubits, each one subject to a local phase noisy
laser. We give explicit expressions of the relevant quantifiers of correlations
for the general case of single-qubit unital evolution, which includes the case
of a phase noisy laser. Although the light field is treated as classical, we
find that this model can describe revivals of quantum correlations. Two
different dynamical regimes of decay of correlations occur, a Markovian one
(exponential decay) and a non-Markovian one (oscillatory decay with revivals)
depending on the values of system parameters. In particular, in the
non-Markovian regime, quantum correlations quantified by quantum discord show
an oscillatory decay faster than that of classical correlations. Moreover,
there are time regions where nonzero discord is present while entanglement is
zero.Comment: 7 pages, 3 figures, accepted for publication in Phys. Scripta,
special issue for CEWQO 2011 proceeding
Phosphodiesterase 5 (PDE5) restricts intracellular cGMP accumulation during enterotoxigenic Escherichia coli infection
Diarrhea caused by enterotoxigeni
Laser Amplifier Development for the Remote Sensing of CO2 from Space
Accurate global measurements of tropospheric CO2 mixing ratios are needed to study CO2 emissions and CO2 exchange with the land and oceans. NASA Goddard Space Flight Center (GSFC) is developing a pulsed lidar approach for an integrated path differential absorption (IPDA) lidar to allow global measurements of atmospheric CO2 column densities from space. Our group has developed, and successfully flown, an airborne pulsed lidar instrument that uses two tunable pulsed laser transmitters allowing simultaneous measurement of a single CO2 absorption line in the 1570 nm band, absorption of an O2 line pair in the oxygen A-band (765 nm), range, and atmospheric backscatter profiles in the same path. Both lasers are pulsed at 10 kHz, and the two absorption line regions are sampled at typically a 300 Hz rate. A space-based version of this lidar must have a much larger lidar power-area product due to the approximately x40 longer range and faster along track velocity compared to airborne instrument. Initial link budget analysis indicated that for a 400 km orbit, a 1.5 m diameter telescope and a 10 second integration time, a approximately 2 mJ laser energy is required to attain the precision needed for each measurement. To meet this energy requirement, we have pursued parallel power scaling efforts to enable space-based lidar measurement of CO2 concentrations. These included a multiple aperture approach consists of multi-element large mode area fiber amplifiers and a single-aperture approach consists of a multi-pass Er:Yb:Phosphate glass based planar waveguide amplifier (PWA). In this paper we will present our laser amplifier design approaches and preliminary results
Calibration and application of B/Ca, Cd/Ca, and δ^(11)B in Neogloboquadrina pachyderma (sinistral) to constrain CO_2 uptake in the subpolar North Atlantic during the last deglaciation
The North Atlantic and Norwegian Sea are prominent sinks of atmospheric CO_2 today, but their roles in the past remain poorly constrained. In this study, we attempt to use B/Ca and δ^(11)B ratios in the planktonic foraminifera Neogloboquadrina pachyderma (sinistral variety) to reconstruct subsurface water pH and pCO_2 changes in the polar North Atlantic during the last deglaciation. Comparison of core-top results with nearby hydrographic data shows that B/Ca in N. pachyderma (s) is mainly controlled by seawater B(OH)_4−/HCO_3− with a roughly constant partition coefficient (K_D = [B/Ca]_(CaCO_3) / [B(OH)_4−/HCO_3− (seawater)) of 1.48 ± 0.15 × 10^(−3) (2σ), and δ^(11)B in this species is offset below δ^(11)B of the borate in seawater by 3.38 ± 0.71‰ (2σ). These values represent our best estimates with the sparse available hydrographic data close to our core-tops. More culturing and sediment trap work is needed to improve our understanding of boron incorporation into N. pachyderma (s). Application of a constant K_D of 1.48 × 10^(−3) to high resolution N. pachyderma (s) B/Ca records from two adjacent cores off Iceland shows that subsurface pCO_2 at the habitat depth of N. pachyderma (s) (~50 m) generally followed the atmospheric CO_2 trend but with negative offsets of ~10–50 ppmv during 19–10 ka. These B/Ca-based reconstructions are supported by independent estimates from low-resolution δ^(11)B measurements in the same cores. We also calibrate and apply Cd/Ca in N. pachyderma (s) to reconstruct nutrient levels for the same down cores. Like today's North Atlantic, past subsurface pCO_2 variability off Iceland was significantly correlated with nutrient changes that might be linked to surface nutrient utilization and mixing within the upper water column. Because surface pCO_2 (at 0 m water depth) is always lower than at deeper depths and if the application of a constant KD is valid, our results suggest that the polar North Atlantic has remained a CO_2 sink during the calcification seasons of N. pachyderma (s) over the last deglaciation
- …