61,071 research outputs found
Jabba: hybrid error correction for long sequencing reads using maximal exact matches
Third generation sequencing platforms produce longer reads with higher error rates than second generation sequencing technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is that this mapping is constructed with a seed and extend methodology, using maximal exact matches as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of maximal exact matches in the context of third generation reads are presented
High Performance P3M N-body code: CUBEP3M
This paper presents CUBEP3M, a publicly-available high performance
cosmological N-body code and describes many utilities and extensions that have
been added to the standard package. These include a memory-light runtime SO
halo finder, a non-Gaussian initial conditions generator, and a system of
unique particle identification. CUBEP3M is fast, its accuracy is tuneable to
optimize speed or memory, and has been run on more than 27,000 cores, achieving
within a factor of two of ideal weak scaling even at this problem size. The
code can be run in an extra-lean mode where the peak memory imprint for large
runs is as low as 37 bytes per particles, which is almost two times leaner than
other widely used N-body codes. However, load imbalances can increase this
requirement by a factor of two, such that fast configurations with all the
utilities enabled and load imbalances factored in require between 70 and 120
bytes per particles. CUBEP3M is well designed to study large scales
cosmological systems, where imbalances are not too large and adaptive
time-stepping not essential. It has already been used for a broad number of
science applications that require either large samples of non-linear
realizations or very large dark matter N-body simulations, including
cosmological reionization, halo formation, baryonic acoustic oscillations, weak
lensing or non-Gaussian statistics. We discuss the structure, the accuracy,
known systematic effects and the scaling performance of the code and its
utilities, when applicable.Comment: 20 pages, 17 figures, added halo profiles, updated to match MNRAS
accepted versio
CalFUSE v3: A Data-Reduction Pipeline for the Far Ultraviolet Spectroscopic Explorer
Since its launch in 1999, the Far Ultraviolet Spectroscopic Explorer (FUSE)
has made over 4600 observations of some 2500 individual targets. The data are
reduced by the Principal Investigator team at the Johns Hopkins University and
archived at the Multimission Archive at Space Telescope (MAST). The
data-reduction software package, called CalFUSE, has evolved considerably over
the lifetime of the mission. The entire FUSE data set has recently been
reprocessed with CalFUSE v3.2, the latest version of this software. This paper
describes CalFUSE v3.2, the instrument calibrations upon which it is based, and
the format of the resulting calibrated data files.Comment: To appear in PASP; 29 pages, 13 figures, uses aastex, emulateap
DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing
We consider the correction of errors from nucleotide sequences produced by
next-generation targeted amplicon sequencing. The next-generation sequencing
(NGS) platforms can provide a great deal of sequencing data thanks to their
high throughput, but the associated error rates often tend to be high.
Denoising in high-throughput sequencing has thus become a crucial process for
boosting the reliability of downstream analyses. Our methodology, named
DUDE-Seq, is derived from a general setting of reconstructing finite-valued
source data corrupted by a discrete memoryless channel and effectively corrects
substitution and homopolymer indel errors, the two major types of sequencing
errors in most high-throughput targeted amplicon sequencing platforms. Our
experimental studies with real and simulated datasets suggest that the proposed
DUDE-Seq not only outperforms existing alternatives in terms of
error-correction capability and time efficiency, but also boosts the
reliability of downstream analyses. Further, the flexibility of DUDE-Seq
enables its robust application to different sequencing platforms and analysis
pipelines by simple updates of the noise model. DUDE-Seq is available at
http://data.snu.ac.kr/pub/dude-seq
Circular polarization measurement in millimeter-wavelength spectral-line VLBI observations
This paper considers the problem of accurate measurement of circular
polarization in imaging spectral-line VLBI observations in the lambda=7 mm and
lambda=3 mm wavelength bands. This capability is especially valuable for the
full observational study of compact, polarized SiO maser components in the
near-circumstellar environment of late-type, evolved stars. Circular VLBI
polarimetry provides important constraints on SiO maser astrophysics, including
the theory of polarized maser emission transport, and on the strength and
distribution of the stellar magnetic field and its dynamical role in this
critical circumstellar region. We perform an analysis here of the data model
containing the instrumental factors that limit the accuracy of circular
polarization measurements in such observations, and present a corresponding
data reduction algorithm for their correction. The algorithm is an enhancement
of existing spectral line VLBI polarimetry methods using autocorrelation data
for calibration, but with innovations in bandpass determination,
autocorrelation polarization self-calibration, and general optimizations for
the case of low SNR, as applicable at these wavelengths. We present an example
data reduction at mm and derive an estimate of the predicted
accuracy of the method of m_c < 0.5% or better at lambda=7 mm and m_c < 0.5-1%
or better at lambda=3 mm. Both the strengths and weaknesses of the proposed
algorithm are discussed, along with suggestions for future work.Comment: 23 pages, 13 figure
- …