12,885 research outputs found

    Iterative Random Forests to detect predictive and stable high-order interactions

    Get PDF
    Genomics has revolutionized biology, enabling the interrogation of whole transcriptomes, genome-wide binding sites for proteins, and many other molecular processes. However, individual genomic assays measure elements that interact in vivo as components of larger molecular machines. Understanding how these high-order interactions drive gene expression presents a substantial statistical challenge. Building on Random Forests (RF), Random Intersection Trees (RITs), and through extensive, biologically inspired simulations, we developed the iterative Random Forest algorithm (iRF). iRF trains a feature-weighted ensemble of decision trees to detect stable, high-order interactions with same order of computational cost as RF. We demonstrate the utility of iRF for high-order interaction discovery in two prediction problems: enhancer activity in the early Drosophila embryo and alternative splicing of primary transcripts in human derived cell lines. In Drosophila, among the 20 pairwise transcription factor interactions iRF identifies as stable (returned in more than half of bootstrap replicates), 80% have been previously reported as physical interactions. Moreover, novel third-order interactions, e.g. between Zelda (Zld), Giant (Gt), and Twist (Twi), suggest high-order relationships that are candidates for follow-up experiments. In human-derived cells, iRF re-discovered a central role of H3K36me3 in chromatin-mediated splicing regulation, and identified novel 5th and 6th order interactions, indicative of multi-valent nucleosomes with specific roles in splicing regulation. By decoupling the order of interactions from the computational cost of identification, iRF opens new avenues of inquiry into the molecular mechanisms underlying genome biology

    Refining interaction search through signed iterative Random Forests

    Full text link
    Advances in supervised learning have enabled accurate prediction in biological systems governed by complex interactions among biomolecules. However, state-of-the-art predictive algorithms are typically black-boxes, learning statistical interactions that are difficult to translate into testable hypotheses. The iterative Random Forest algorithm took a step towards bridging this gap by providing a computationally tractable procedure to identify the stable, high-order feature interactions that drive the predictive accuracy of Random Forests (RF). Here we refine the interactions identified by iRF to explicitly map responses as a function of interacting features. Our method, signed iRF, describes subsets of rules that frequently occur on RF decision paths. We refer to these rule subsets as signed interactions. Signed interactions share not only the same set of interacting features but also exhibit similar thresholding behavior, and thus describe a consistent functional relationship between interacting features and responses. We describe stable and predictive importance metrics to rank signed interactions. For each SPIM, we define null importance metrics that characterize its expected behavior under known structure. We evaluate our proposed approach in biologically inspired simulations and two case studies: predicting enhancer activity and spatial gene expression patterns. In the case of enhancer activity, s-iRF recovers one of the few experimentally validated high-order interactions and suggests novel enhancer elements where this interaction may be active. In the case of spatial gene expression patterns, s-iRF recovers all 11 reported links in the gap gene network. By refining the process of interaction recovery, our approach has the potential to guide mechanistic inquiry into systems whose scale and complexity is beyond human comprehension

    Firm Size, Technical Change and Wages: Evidence from the Pork Sector from 1990-2005

    Get PDF
    A long-standing puzzle in labor economics has been the positive relationship between wages and firm size. Even after controlling for worker's observed characteristics such as education, work experience, gender, and geographic location, a significant firm size wage effect averaging 15 percent remains. This paper investigates whether the size-wage premium on hog farms persists over time and whether the magnitude is growing or shrinking. The paper pays particular attention to the matching process by which workers are allocated to farms of different size and technology use, and whether the matching process may explain differences in wages across farms. The study relies on four surveys of employees on hog farms collected in 1990, 1995, 2000, and 2005. The survey was conducted across the United States. The data allow us to evaluate how farm size and technology adoption have changed over time and how employee pay has changed in response to these changes. Detailed investigations of these pay differences between small and large hog farms and between farms using few and many technologies show that the differences cannot be explained away by differences in the education, work experience, or geographic location of the farm. Although more educated and experienced workers are more likely to work on larger and more technologically advanced hog farms, the positive relationships between wages and both farm size and technology remain large and statistically significant when differences in observable worker attributes are controlled. Furthermore, these effects are reinforcing in that large hog farms also adopt more technologies, and so the firm size effect persists even after differences in the number of technologies are held constant. The size-wage and technology-wage prema have persisted over time, and we cannot reject the null hypothesis that the premia are constant over the sample period.Agribusiness, Livestock Production/Industries,

    Non-gaussianity of optical emission lines in SDSS star-forming galaxies and its implications on galactic outflows

    Get PDF
    The shape of emission lines in the optical spectra of star-forming galaxies reveals the kinematics of the diffuse gaseous component. We analyse the shape of prominent emission lines in a sample of ~53,000 star-forming galaxies from the Sloan Digital Sky Survey, focusing on departures from gaussianity. Departures from a single gaussian profile allow us to probe the motion of gas and to assess the role of outflows. The sample is divided into groups according to their stellar velocity dispersion and star formation rate. The spectra within each group are stacked to improve the signal-to-noise ratio of the emission lines, to remove individual signatures, and to enhance the effect of star formation rate on the shapes of the emission lines. The moments of the emission lines, including kurtosis and skewness, are determined. We find that most of the emission lines in strong star-forming systems unequivocally feature negative kurtosis. This signature is present in Hβ\beta, Hα\alpha, [N II] and [S II] in massive galaxies with high star formation rates. We attribute it as evidence of radial outflows of ionised gas driven by the star formation of the galaxies. Also, most of the emission lines in low-mass systems with high star formation rates feature negative skewness, and we interpret it as evidence of dust obscuration in the galactic disk. These signatures are however absent in the [O III] line, which is believed to trace a different gas component. The observed trend is significantly stronger in face-on galaxies, indicating that star formation drives the outflows along the galactic rotation axis, presumably the path of least resistance. The data suggest that outflows driven by star formation exert accumulated impacts on the interstellar medium, and the outflow signature is more evident in older galaxies as they have experienced a longer total duration of star formation.Comment: 16 pages, 13 figures, Accepted for publication in PAS

    Dynamics of correlations due to a phase noisy laser

    Get PDF
    We analyze the dynamics of various kinds of correlations present between two initially entangled independent qubits, each one subject to a local phase noisy laser. We give explicit expressions of the relevant quantifiers of correlations for the general case of single-qubit unital evolution, which includes the case of a phase noisy laser. Although the light field is treated as classical, we find that this model can describe revivals of quantum correlations. Two different dynamical regimes of decay of correlations occur, a Markovian one (exponential decay) and a non-Markovian one (oscillatory decay with revivals) depending on the values of system parameters. In particular, in the non-Markovian regime, quantum correlations quantified by quantum discord show an oscillatory decay faster than that of classical correlations. Moreover, there are time regions where nonzero discord is present while entanglement is zero.Comment: 7 pages, 3 figures, accepted for publication in Phys. Scripta, special issue for CEWQO 2011 proceeding

    Laser Amplifier Development for the Remote Sensing of CO2 from Space

    Get PDF
    Accurate global measurements of tropospheric CO2 mixing ratios are needed to study CO2 emissions and CO2 exchange with the land and oceans. NASA Goddard Space Flight Center (GSFC) is developing a pulsed lidar approach for an integrated path differential absorption (IPDA) lidar to allow global measurements of atmospheric CO2 column densities from space. Our group has developed, and successfully flown, an airborne pulsed lidar instrument that uses two tunable pulsed laser transmitters allowing simultaneous measurement of a single CO2 absorption line in the 1570 nm band, absorption of an O2 line pair in the oxygen A-band (765 nm), range, and atmospheric backscatter profiles in the same path. Both lasers are pulsed at 10 kHz, and the two absorption line regions are sampled at typically a 300 Hz rate. A space-based version of this lidar must have a much larger lidar power-area product due to the approximately x40 longer range and faster along track velocity compared to airborne instrument. Initial link budget analysis indicated that for a 400 km orbit, a 1.5 m diameter telescope and a 10 second integration time, a approximately 2 mJ laser energy is required to attain the precision needed for each measurement. To meet this energy requirement, we have pursued parallel power scaling efforts to enable space-based lidar measurement of CO2 concentrations. These included a multiple aperture approach consists of multi-element large mode area fiber amplifiers and a single-aperture approach consists of a multi-pass Er:Yb:Phosphate glass based planar waveguide amplifier (PWA). In this paper we will present our laser amplifier design approaches and preliminary results

    Calibration and application of B/Ca, Cd/Ca, and δ^(11)B in Neogloboquadrina pachyderma (sinistral) to constrain CO_2 uptake in the subpolar North Atlantic during the last deglaciation

    Get PDF
    The North Atlantic and Norwegian Sea are prominent sinks of atmospheric CO_2 today, but their roles in the past remain poorly constrained. In this study, we attempt to use B/Ca and δ^(11)B ratios in the planktonic foraminifera Neogloboquadrina pachyderma (sinistral variety) to reconstruct subsurface water pH and pCO_2 changes in the polar North Atlantic during the last deglaciation. Comparison of core-top results with nearby hydrographic data shows that B/Ca in N. pachyderma (s) is mainly controlled by seawater B(OH)_4−/HCO_3− with a roughly constant partition coefficient (K_D = [B/Ca]_(CaCO_3) / [B(OH)_4−/HCO_3− (seawater)) of 1.48 ± 0.15 × 10^(−3) (2σ), and δ^(11)B in this species is offset below δ^(11)B of the borate in seawater by 3.38 ± 0.71‰ (2σ). These values represent our best estimates with the sparse available hydrographic data close to our core-tops. More culturing and sediment trap work is needed to improve our understanding of boron incorporation into N. pachyderma (s). Application of a constant K_D of 1.48 × 10^(−3) to high resolution N. pachyderma (s) B/Ca records from two adjacent cores off Iceland shows that subsurface pCO_2 at the habitat depth of N. pachyderma (s) (~50 m) generally followed the atmospheric CO_2 trend but with negative offsets of ~10–50 ppmv during 19–10 ka. These B/Ca-based reconstructions are supported by independent estimates from low-resolution δ^(11)B measurements in the same cores. We also calibrate and apply Cd/Ca in N. pachyderma (s) to reconstruct nutrient levels for the same down cores. Like today's North Atlantic, past subsurface pCO_2 variability off Iceland was significantly correlated with nutrient changes that might be linked to surface nutrient utilization and mixing within the upper water column. Because surface pCO_2 (at 0 m water depth) is always lower than at deeper depths and if the application of a constant KD is valid, our results suggest that the polar North Atlantic has remained a CO_2 sink during the calcification seasons of N. pachyderma (s) over the last deglaciation

    Design of a Direct-Detection Wind and Aerosol Lidar for Mars Orbit

    Get PDF
    The present knowledge of the Mars atmosphere is greatly limited by a lack of global measurements of winds and aerosols. Hence, measurements of height-resolved wind and aerosol profiles are a priority for new Mars orbiting missions. We have designed a direct-detection lidar (MARLI) to provide global measurements of dust, winds and water ice profiles from Mars orbit. From a 400-km polar orbit, the instrument is designed to provide wind and backscatter measurements with a vertical resolution of 2 km and with resolution of 2 in latitude along track. The instrument uses a single-frequency, seeded Nd:YAG laser that emits 4 mJ pulses at 1064 nm at a 250 Hz pulse rate. The receiver utilizes a 50-cm diameter telescope and a double edge Fabry-Prot etalon as a frequency discriminator to measure the Doppler shift of the aerosol-backscatter profiles. The receiver also includes a polarization-sensitive channel to detect the cross-polarized backscatter profiles from water ice. The receiver uses a sensitive 4 4 pixel HgCdTe avalanche photodiode array as a detector for all signals. Here we describe the measurement concept, instrument design, and calculate its performance for several cases of Mars atmospheric conditions. The calculations show that under a range of atmospheric conditions MARLI is capable of measuring wind speed profiles with random error of 24 m/s within the first three scale heights, enabling vertically resolved mapping of transport processes in this important region of the atmosphere
    corecore