36 research outputs found

    pkerpedjiev/gene-citation-counts: Code from Nature News article

    No full text
    The code used in the Nature News popular genes article: https://www.nature.com/articles/d41586-017-07291-

    NCBI genes with citation counts

    No full text
    <p>This dataset is a tab-separated file containing a list of human genes as listed on NCBI's Gene database and annotated with citation counts. The columns are:</p> <p>taxid    gene_id    unused   gene_symbol gene_description   gene_type  citation_count</p> <p>Example:</p> <p>9606    3553    -   IL1B    interleukin 1 beta  protein-coding  2462</p

    Adaptable probabilistic mapping of short reads using position specific scoring matrices

    No full text
    BACKGROUND: Modern DNA sequencing methods produce vast amounts of data that often requires mapping to a reference genome. Most existing programs use the number of mismatches between the read and the genome as a measure of quality. This approach is without a statistical foundation and can for some data types result in many wrongly mapped reads. Here we present a probabilistic mapping method based on position-specific scoring matrices, which can take into account not only the quality scores of the reads but also user-specified models of evolution and data-specific biases. RESULTS: We show how evolution, data-specific biases, and sequencing errors are naturally dealt with probabilistically. Our method achieves better results than Bowtie and BWA on simulated and real ancient and PAR-CLIP reads, as well as on simulated reads from the AT rich organism P. falciparum, when modeling the biases of these data. For simulated Illumina reads, the method has consistently higher sensitivity for both single-end and paired-end data. We also show that our probabilistic approach can limit the problem of random matches from short reads of contamination and that it improves the mapping of real reads from one organism (D. melanogaster) to a related genome (D. simulans). CONCLUSION: The presented work is an implementation of a novel approach to short read mapping where quality scores, prior mismatch probabilities and mapping qualities are handled in a statistically sound manner. The resulting implementation provides not only a tool for biologists working with low quality and/or biased sequencing data but also a demonstration of the feasibility of using a probability based alignment method on real and simulated data sets

    abdenlab/oxbow: v0.3.1

    No full text
    &lt;h2&gt;What's Changed&lt;/h2&gt; &lt;ul&gt; &lt;li&gt;Simplify GFF/GTF reader types by @GarrettNg in https://github.com/abdenlab/oxbow/pull/53&lt;/li&gt; &lt;li&gt;Fastq file like objects and cleanup by @GarrettNg in https://github.com/abdenlab/oxbow/pull/52&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;&lt;strong&gt;Full Changelog&lt;/strong&gt;: https://github.com/abdenlab/oxbow/compare/v0.3.0...v0.3.1&lt;/p&gt

    3D based on 2D: Forgi 2.0 Extended Data

    No full text
    Supplementary data for "3D based on 2D: Calculating helix angles and stacking patterns using forgi 2.0, an RNA Python library centered on secondary structure elements.
    corecore