Search CORE

225 research outputs found

Detection of microRNAs in color space

Author: Altschul
Antonio Marco
Applied
Applied
Bartel
Berezikov
Cai
Chen
Cloonan
David
Flicek
Friedlander
Goff
Hofacker
Homer
Kozomara
Langmead
Li
Li
Li
Lowe
Marco
Mardis
Morin
Moxon
Robin
Ruby
Rumble
Sam Griffiths-Jones
Sasson
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/12/2011
Field of study

MotivationDeep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs.ResultsHere we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs.Availability and implementationA bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/[email protected] informationSupplementary data are available at Bioinformatics online

University of Essex Research Repository

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Sensitive Long-Indel-Aware Alignment of Sequencing Reads

Author: Marschall Tobias
Schönhuth Alexander
Publication venue
Publication date: 01/01/2013
Field of study

The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed genom-wide discovery of variations beyond single-nucleotide polymorphisms (SNPs), in particular of structural variations (SVs) like deletions, insertions, duplications, translocations, inversions, and even more complex rearrangements. Here, we design a read aligner with special emphasis on the following properties: (1) high sensitivity, i.e. find all (reasonable) alignments; (2) ability to find (long) indels; (3) statistically sound alignment scores; and (4) runtime fast enough to be applied to whole genome data. We compare performance to BWA, bowtie2, stampy and find that our methods is especially advantageous on reads containing larger indels

arXiv.org e-Print Archive

Publications at Bielefeld University

Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity

Author: Beck Martin
Kerschbaum Florian
Publication venue
Publication date: 12/02/2013
Field of study

Consider two parties who want to compare their strings, e.g., genomes, but do not want to reveal them to each other. We present a system for privacy-preserving matching of strings, which differs from existing systems by providing a deterministic approximation instead of an exact distance. It is efficient (linear complexity), non-interactive and does not involve a third party which makes it particularly suitable for cloud computing. We extend our protocol, such that it mitigates iterated differential attacks proposed by Goodrich. Further an implementation of the system is evaluated and compared against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Languages of lossless seeds

Author: Břinda Karel
Publication venue: 'Open Publishing Association'
Publication date: 21/05/2014
Field of study

Several algorithms for similarity search employ seeding techniques to quickly discard very dissimilar regions. In this paper, we study theoretical properties of lossless seeds, i.e., spaced seeds having full sensitivity. We prove that lossless seeds coincide with languages of certain sofic subshifts, hence they can be recognized by finite automata. Moreover, we show that these subshifts are fully given by the number of allowed errors k and the seed margin l. We also show that for a fixed k, optimal seeds must asymptotically satisfy l ~ m^(k/(k+1)).Comment: In Proceedings AFL 2014, arXiv:1405.527

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

A case report of congenital myasthenic syndrome caused by a mutation in the CHRNE gene in the Iranian population

Author: Ebrahimi Neshat
Farjami Zahra
Galehdari Mohammad
Houshmand Massoud
Khodaienia Negar
Moradyar Mehdi
Zamani Ghaletaki Gholamreza
Ashnaei Amirhossein
Publication venue: Iranian Journal of Child Neurology
Publication date: 01/10/2020
Field of study

Congenital myasthenic syndrome (CMS) refers to a heterogeneous group of inherited disorders, characterized by defective transmission at the neuromuscular junction (NMJ). Patients with CMS showed similar muscle weakness, while other clinical manifestations are mostly dependent on genetic factors. This disease, caused by different DNA mutations, is genetically inherited. It is also associated with mutations of genes at NMJ, involving the acetylcholine receptor (AChR) subunits. Here, we present the case of a five-year-old Iranian boy with CMS, undergoing targeted sequencing of a panel of genes, associated with arthrogryposis and CMS. The patient had six affected relatives in his genetic pedigree chart. The investigations indicated a homozygous single base pair deletion at exon 12 of the CHRNE gene (chr17:4802186delC). This region was conserved across mammalian evolution and was not submitted to the 1000 Genomes Project database. Overall, the CHRNE variant may be classified as a significant variant in the etiology of CMS. It can be suggested that the Iranian CMS population carry regional pathogenic mutations, which can be detected via targeted and whole genome sequencing

Journals Portal, Shahid Beheshti University of Medical Sciences

SEAL: a distributed short read mapping and duplicate removal tool

Author: Durbin
G. Zanetti
Kozarewa
L. Pireddu
Metzker
S. Leo
Publication venue: Oxford University Press
Publication date
Field of study

Summary: SEAL is a scalable tool for short read pair mapping and duplicate removal. It computes mappings that are consistent with those produced by BWA and removes duplicates according to the same criteria employed by Picard MarkDuplicates. On a 16-node Hadoop cluster, it is capable of processing about 13 GB per hour in map+rmdup mode, while reaching a throughput of 19 GB per hour in mapping-only mode

Crossref

PubMed Central

Analysis of quality raw data of second generation sequencers with Quality Assessment Software

Author: A Smith
Adriana R Carneiro
Artur Silva
B Ewing
B Ewing
D Gordon
D Hernandez
DR Zerbino
DW Bryant
E Lande
H Li
J Butler
J Dohm
Jan Baumbach
M Chaisson
Maria PC Schneider
Rommel TJ Ramos
S Bentley
SC Schuster
V Pandey
Vasco Azevedo
W Jeck
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated. Findings We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads. Conclusions Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Towards Better Understanding of Artifacts in Variant Calling from High-Coverage Samples

Author: Li Heng
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/07/2015
Field of study

Motivation: Whole-genome high-coverage sequencing has been widely used for personal and cancer genomics as well as in various research areas. However, in the lack of an unbiased whole-genome truth set, the global error rate of variant calls and the leading causal artifacts still remain unclear even given the great efforts in the evaluation of variant calling methods. Results: We made ten SNP and INDEL call sets with two read mappers and five variant callers, both on a haploid human genome and a diploid genome at a similar coverage. By investigating false heterozygous calls in the haploid genome, we identified the erroneous realignment in low-complexity regions and the incomplete reference genome with respect to the sample as the two major sources of errors, which press for continued improvements in these two areas. We estimated that the error rate of raw genotype calls is as high as 1 in 10-15kb, but the error rate of post-filtered calls is reduced to 1 in 100-200kb without significant compromise on the sensitivity. Availability: BWA-MEM alignment: http://bit.ly/1g8XqRt; Scripts: https://github.com/lh3/varcmp; Additional data: https://figshare.com/articles/Towards_better_understanding_of_artifacts_in_variating_calling_from_high_coverage_samples/981073Comment: Published versio

arXiv.org e-Print Archive

CiteSeerX