5,186 research outputs found
htsint: a Python library for sequencing pipelines that combines data through gene set generation
Background: Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses.
Results: We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages.
Conclusion: The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint
Clustering Analyses of 300,000 Photometrically Classified Quasars--II. The Excess on Very Small Scales
We study quasar clustering on small scales, modeling clustering amplitudes
using halo-driven dark matter descriptions. From 91 pairs on scales <35 kpc/h,
we detect only a slight excess in quasar clustering over our best-fit
large-scale model. Integrated across all redshifts, the implied quasar bias is
b_Q = 4.21+/-0.98 (b_Q = 3.93+/-0.71) at ~18 kpc/h (~28 kpc/h). Our best-fit
(real-space) power index is ~-2 (i.e., ), implying
steeper halo profiles than currently found in simulations. Alternatively,
quasar binaries with separation <35 kpc/h may trace merging galaxies, with
typical dynamical merger times t_d~(610+/-260)m^{-1/2} Myr/h, for quasars of
host halo mass m x 10^{12} Msolar/h. We find UVX quasars at ~28 kpc/h cluster
>5 times higher at z > 2, than at z < 2, at the level. However, as
the space density of quasars declines as z increases, an excess of quasar
binaries (over expectation) at z > 2 could be consistent with reduced merger
rates at z > 2 for the galaxies forming UVX quasars. Comparing our clustering
at ~28 kpc/h to a \xi(r)=(r/4.8\Mpch)^{-1.53} power-law, we find an upper
limit on any excess of a factor of 4.3+/-1.3, which, noting some caveats,
differs from large excesses recently measured for binary quasars, at
. We speculate that binary quasar surveys that are biased to z > 2
may find inflated clustering excesses when compared to models fit at z < 2. We
provide details of 111 photometrically classified quasar pairs with separations
<0.1'. Spectroscopy of these pairs could significantly constrain quasar
dynamics in merging galaxies.Comment: 12pages, 3 figures, 2 tables; uses amulateapj; accepted to Ap
lpEdit: an editor to facilitate reproducible analysis via literate programming
ArticleCopyright 2013 Adam J Richards et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
.There is evidence to suggest that a surprising proportion of published experiments in science are difficult if not impossible to reproduce. The concepts of data sharing, leaving an audit trail and extensive documentation are fundamental to reproducible research, whether it is in the laboratory or as part of an analysis. In this work, we introduce a tool for documentation that aims to make analyses more reproducible in the general scientific community.
The application, lpEdit, is a cross-platform editor, written with PyQt4, that enables a broad range of scientists to carry out the analytic component of their work in a reproducible manner—through the use of literate programming.
Literate programming mixes code and prose to produce a final report that reads like an article or book. lpEdit targets researchers getting started with statistics or programming, so the hurdles associated with setting up a proper pipeline are kept to a minimum and the learning burden is reduced through the use of templates and documentation. The documentation for lpEdit is centered around learning by example, and accordingly we use several increasingly involved examples to demonstrate the software’s capabilities.
We first consider applications of lpEdit to process analyses mixing R and Python code with the LATEX documentation system. Finally, we illustrate the use of lpEdit to conduct a reproducible functional analysis of high-throughput sequencing data, using the transcriptome of the butterfly species Pieris brassica
A Simple Likelihood Method for Quasar Target Selection
We present a new method for quasar target selection using photometric fluxes
and a Bayesian probabilistic approach. For our purposes we target quasars using
Sloan Digital Sky Survey (SDSS) photometry to a magnitude limit of g=22. The
efficiency and completeness of this technique is measured using the Baryon
Oscillation Spectroscopic Survey (BOSS) data, taken in 2010. This technique was
used for the uniformly selected (CORE) sample of targets in BOSS year one
spectroscopy to be realized in the 9th SDSS data release. When targeting at a
density of 40 objects per sq-deg (the BOSS quasar targeting density) the
efficiency of this technique in recovering z>2.2 quasars is 40%. The
completeness compared to all quasars identified in BOSS data is 65%. This paper
also describes possible extensions and improvements for this techniqueComment: Updated to accepted version for publication in the Astrophysical
Journal. 10 pages, 10 figures, 3 table
Hadronic Resonances from Lattice QCD
The determination of the pattern of hadronic resonances as predicted by
Quantum Chromodynamics requires the use of non-perturbative techniques. Lattice
QCD has emerged as the dominant tool for such calculations, and has produced
many QCD predictions which can be directly compared to experiment. The concepts
underlying lattice QCD are outlined, methods for calculating excited states are
discussed, and results from an exploratory Nucleon and Delta baryon spectrum
study are presented.Comment: 8 pages, VII Latin American Symposium on Nuclear Physics and
Application
Results and Frontiers in Lattice Baryon Spectroscopy
The Lattice Hadron Physics Collaboration (LHPC) baryon spectroscopy effort is
reviewed. To date the LHPC has performed exploratory Lattice QCD calculations
of the low-lying spectrum of Nucleon and Delta baryons. These calculations
demonstrate the effectiveness of our method by obtaining the masses of an
unprecedented number of excited states with definite quantum numbers. Future
work of the project is outlined.Comment: To appear in the proceedings for the VII Latin American Symposium of
Nuclear Physics and Application
- …