50,684 research outputs found
Near-optimal Assembly for Shotgun Sequencing with Noisy Reads
Recent work identified the fundamental limits on the information requirements
in terms of read length and coverage depth required for successful de novo
genome reconstruction from shotgun sequencing data, based on the idealistic
assumption of no errors in the reads (noiseless reads). In this work, we show
that even when there is noise in the reads, one can successfully reconstruct
with information requirements close to the noiseless fundamental limit. A new
assembly algorithm, X-phased Multibridging, is designed based on a
probabilistic model of the genome. It is shown through analysis to perform well
on the model, and through simulations to perform well on real genomes
First Author Advantage: Citation Labeling in Research
Citations among research papers, and the networks they form, are the primary
object of study in scientometrics. The act of making a citation reflects the
citer's knowledge of the related literature, and of the work being cited. We
aim to gain insight into this process by studying citation keys: user-chosen
labels to identify a cited work. Our main observation is that the first listed
author is disproportionately represented in such labels, implying a strong
mental bias towards the first author.Comment: Computational Scientometrics: Theory and Applications at The 22nd
CIKM 201
Randomly Charged Polymers, Random Walks, and Their Extremal Properties
Motivated by an investigation of ground state properties of randomly charged
polymers, we discuss the size distribution of the largest Q-segments (segments
with total charge Q) in such N-mers. Upon mapping the charge sequence to
one--dimensional random walks (RWs), this corresponds to finding the
probability for the largest segment with total displacement Q in an N-step RW
to have length L. Using analytical, exact enumeration, and Monte Carlo methods,
we reveal the complex structure of the probability distribution in the large N
limit. In particular, the size of the longest neutral segment has a
distribution with a square-root singularity at l=L/N=1, an essential
singularity at l=0, and a discontinuous derivative at l=1/2. The behavior near
l=1 is related to a another interesting RW problem which we call the "staircase
problem". We also discuss the generalized problem for d-dimensional RWs.Comment: 33 pages, 19 Postscript figures, RevTe
- …