43 research outputs found

    biobambam: tools for read pair collation based algorithms on BAM files

    Get PDF
    Sequence alignment data is often ordered by coordinate (id of the reference sequence plus position on the sequence where the fragment was mapped) when stored in BAM files, as this simplifies the extraction of variants between the mapped data and the reference or of variants within the mapped data. In this order paired reads are usually separated in the file, which complicates some other applications like duplicate marking or conversion to the FastQ format which require to access the full information of the pairs. In this paper we introduce biobambam, an API for efficient BAM file reading supporting the efficient collation of alignments by read name without performing a complete resorting of the input file and some tools based on this API performing tasks like marking duplicate reads and conversion to the FastQ format. In comparison with previous approaches to problems involving the collation of alignments by read name like the BAM to FastQ or duplication marking utilities in the Picard suite the approach of biobambam can often perform an equivalent task more efficiently in terms of the required main memory and run-time.Comment: 17 pages, 3 figures, 2 table

    Editorial

    Get PDF

    Development and validation of a comprehensive genomic diagnostic tool for myeloid malignancies.

    Get PDF
    The diagnosis of hematologic malignancies relies on multidisciplinary workflows involving morphology, flow cytometry, cytogenetic, and molecular genetic analyses. Advances in cancer genomics have identified numerous recurrent mutations with clear prognostic and/or therapeutic significance to different cancers. In myeloid malignancies, there is a clinical imperative to test for such mutations in mainstream diagnosis; however, progress toward this has been slow and piecemeal. Here we describe Karyogene, an integrated targeted resequencing/analytical platform that detects nucleotide substitutions, insertions/deletions, chromosomal translocations, copy number abnormalities, and zygosity changes in a single assay. We validate the approach against 62 acute myeloid leukemia, 50 myelodysplastic syndrome, and 40 blood DNA samples from individuals without evidence of clonal blood disorders. We demonstrate robust detection of sequence changes in 49 genes, including difficult-to-detect mutations such as FLT3 internal-tandem and mixed-lineage leukemia (MLL) partial-tandem duplications, and clinically significant chromosomal rearrangements including MLL translocations to known and unknown partners, identifying the novel fusion gene MLL-DIAPH2 in the process. Additionally, we identify most significant chromosomal gains and losses, and several copy neutral loss-of-heterozygosity mutations at a genome-wide level, including previously unreported changes such as homozygosity for DNMT3A R882 mutations. Karyogene represents a dependable genomic diagnosis platform for translational research and for the clinical management of myeloid malignancies, which can be readily adapted for use in other cancers

    Transcriptome map of mouse isochores

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The availability of fully sequenced genomes and the implementation of transcriptome technologies have increased the studies investigating the expression profiles for a variety of tissues, conditions, and species. In this study, using RNA-seq data for three distinct tissues (brain, liver, and muscle), we investigate how base composition affects mammalian gene expression, an issue of prime practical and evolutionary interest.</p> <p>Results</p> <p>We present the transcriptome map of the mouse isochores (DNA segments with a fairly homogeneous base composition) for the three different tissues and the effects of isochores' base composition on their expression activity. Our analyses also cover the relations between the genes' expression activity and their localization in the isochore families.</p> <p>Conclusions</p> <p>This study is the first where next-generation sequencing data are used to associate the effects of both genomic and genic compositional properties to their corresponding expression activity. Our findings confirm previous results, and further support the existence of a relationship between isochores and gene expression. This relationship corroborates that isochores are primarily a product of evolutionary adaptation rather than a simple by-product of neutral evolutionary processes.</p

    Theorie und Anwendungen parametrisch gewichteter endlicher Automaten

    No full text
    Parametric weighted finite automata (PWFA) are a multi-dimensional generalization of weighted finite automata. The expressiveness of PWFA contains the expressiveness of weighted finite automata as well as the expressiveness of affine iterated function system. The thesis discusses theory and applications of PWFA. The properties of PWFA definable sets are studied and it is shown that some fractal generator systems can be simulated using PWFA and that various real and complex functions can be represented by PWFA. Furthermore, the decoding of PWFA and the interpretation of PWFA definable sets is discussed.Parametrisch gewichtete endliche Automaten (PWFA) sind eine multidimensionale Verallgemeinerung gewichteter endlicher Automaten, die sowohl die funktionale Mächtigkeit gewichteter endlicher Automaten als auch die affiner iterierter Funktionensystem umschließt. Die Arbeit diskutiert die Theorie und Anwendungen von PWFA. Es werden Eigenschaften der von PWFA definierbaren Mengen untersucht, gezeigt dass verschiedene fraktale Generatorsysteme mit PWFA simuliert werden können und dass viele praxisrelevante reelle und komplexe Funktionen und Relationen mit PWFA darstellbar sind. Ferner wird die Dekodierung von PWFA und die Interpretation PWFA-darstellbarer Mengen diskutiert

    Computing Longest Previous non-overlapping Factors

    No full text
    International audienc
    corecore