50,684 research outputs found

    Near-optimal Assembly for Shotgun Sequencing with Noisy Reads

    Full text link
    Recent work identified the fundamental limits on the information requirements in terms of read length and coverage depth required for successful de novo genome reconstruction from shotgun sequencing data, based on the idealistic assumption of no errors in the reads (noiseless reads). In this work, we show that even when there is noise in the reads, one can successfully reconstruct with information requirements close to the noiseless fundamental limit. A new assembly algorithm, X-phased Multibridging, is designed based on a probabilistic model of the genome. It is shown through analysis to perform well on the model, and through simulations to perform well on real genomes

    First Author Advantage: Citation Labeling in Research

    Full text link
    Citations among research papers, and the networks they form, are the primary object of study in scientometrics. The act of making a citation reflects the citer's knowledge of the related literature, and of the work being cited. We aim to gain insight into this process by studying citation keys: user-chosen labels to identify a cited work. Our main observation is that the first listed author is disproportionately represented in such labels, implying a strong mental bias towards the first author.Comment: Computational Scientometrics: Theory and Applications at The 22nd CIKM 201

    Randomly Charged Polymers, Random Walks, and Their Extremal Properties

    Full text link
    Motivated by an investigation of ground state properties of randomly charged polymers, we discuss the size distribution of the largest Q-segments (segments with total charge Q) in such N-mers. Upon mapping the charge sequence to one--dimensional random walks (RWs), this corresponds to finding the probability for the largest segment with total displacement Q in an N-step RW to have length L. Using analytical, exact enumeration, and Monte Carlo methods, we reveal the complex structure of the probability distribution in the large N limit. In particular, the size of the longest neutral segment has a distribution with a square-root singularity at l=L/N=1, an essential singularity at l=0, and a discontinuous derivative at l=1/2. The behavior near l=1 is related to a another interesting RW problem which we call the "staircase problem". We also discuss the generalized problem for d-dimensional RWs.Comment: 33 pages, 19 Postscript figures, RevTe
    • …
    corecore