121 research outputs found

    CRAC: an integrated approach to the analysis of RNA-seq reads

    No full text
    International audienceA large number of RNA-sequencing studies set out to predict mutations, splice junctions or fusion RNAs. We propose a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in each single read. CRAC increases precision compared with existing tools, reaching 99:5% for splice junctions, without losing sensitivity. Importantly, CRAC predictions improve with read length. In cancer libraries, CRAC recovered 74% of validated fusion RNAs and predicted novel recurrent chimeric junctions. CRAC is available at http://crac.gforge.inria.fr

    Querying large read collections in main memory: a versatile data structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High Throughput Sequencing (HTS) is now heavily exploited for genome (re-) sequencing, metagenomics, epigenomics, and transcriptomics and requires different, but computer intensive bioinformatic analyses. When a reference genome is available, mapping reads on it is the first step of this analysis. Read mapping programs owe their efficiency to the use of involved genome indexing data structures, like the Burrows-Wheeler transform. Recent solutions index both the genome, and the <it>k</it>-mers of the reads using hash-tables to further increase efficiency and accuracy. In various contexts (e.g. assembly or transcriptome analysis), read processing requires to determine the sub-collection of reads that are related to a given sequence, which is done by searching for some <it>k</it>-mers in the reads. Currently, many developments have focused on genome indexing structures for read mapping, but the question of read indexing remains broadly unexplored. However, the increase in sequence throughput urges for new algorithmic solutions to query large read collections efficiently.</p> <p>Results</p> <p>Here, we present a solution, named <it>Gk </it>arrays, to index large collections of reads, an algorithm to build the structure, and procedures to query it. Once constructed, the index structure is kept in main memory and is repeatedly accessed to answer queries like "given a <it>k</it>-mer, get the reads containing this <it>k</it>-mer (once/at least once)". We compared our structure to other solutions that adapt uncompressed indexing structures designed for long texts and show that it processes queries fast, while requiring much less memory. Our structure can thus handle larger read collections. We provide examples where such queries are adapted to different types of read analysis (SNP detection, assembly, RNA-Seq).</p> <p>Conclusions</p> <p><it>Gk </it>arrays constitute a versatile data structure that enables fast and more accurate read analysis in various contexts. The <it>Gk </it>arrays provide a flexible brick to design innovative programs that mine efficiently genomics, epigenomics, metagenomics, or transcriptomics reads. The <it>Gk </it>arrays library is available under Cecill (GPL compliant) license from <url>http://www.atgc-montpellier.fr/ngs/</url>.</p

    Differential expression of the RTP/Drg1/Ndr1 gene product in proliferating and growth arrested cells

    Get PDF
    AbstractUsing a differential display method to identify differentiation-related genes in human myelomonocytic U937 cells, we cloned the cDNA of a gene identical to Drg1 and homologous to other recently discovered genes, respectively human RTP and Cap43 and mouse Ndr1 and TDD5 genes. Their open reading frames encode proteins highly conserved between mouse and man but which do not share homology with other know proteins. Conditions in which mRNAs are up-regulated suggest a role for the protein in cell growth arrest and terminal differentiation. We raised antibodies against a synthetic peptide reproducing a characteristic sequence of the putative polypeptide chain. These antibodies revealed a protein with the expected 43 kDa molecular mass, up-regulated by phorbol ester, retinoids and 1,25-(OH)2 vitamin D3 in U937 cells. It was increased in mammary carcinoma MCF-7 cells treated by retinoids and by the anti-estrogen ICI 182,780 but not by 4-hydroxytamoxifen. The mouse Drg1 homologous protein was up-regulated by retinoic acid in C2 myogenic cells. The diversity of situations in which expression of RTP/Drg1/Ndr1 has now been observed shows that it is widely distributed and up-regulated by various agents. Here we show that ligands of nuclear transcription factors involved in cell differentiation are among the inducers of this novel protein

    Using reads to annotate the genome: influence of length, background distribution, and sequence errors on prediction capacity

    Get PDF
    Ultra high-throughput sequencing is used to analyse the transcriptome or interactome at unprecedented depth on a genome-wide scale. These techniques yield short sequence reads that are then mapped on a genome sequence to predict putatively transcribed or protein-interacting regions. We argue that factors such as background distribution, sequence errors, and read length impact on the prediction capacity of sequence census experiments. Here we suggest a computational approach to measure these factors and analyse their influence on both transcriptomic and epigenomic assays. This investigation provides new clues on both methodological and biological issues. For instance, by analysing chromatin immunoprecipitation read sets, we estimate that 4.6% of reads are affected by SNPs. We show that, although the nucleotide error probability is low, it significantly increases with the position in the sequence. Choosing a read length above 19 bp practically eliminates the risk of finding irrelevant positions, while above 20 bp the number of uniquely mapped reads decreases. With our procedure, we obtain 0.6% false positives among genomic locations. Hence, even rare signatures should identify biologically relevant regions, if they are mapped on the genome. This indicates that digital transcriptomics may help to characterize the wealth of yet undiscovered, low-abundance transcripts

    Enhanced overexpression of an HIF-1/hypoxia-related protein in cancer cells.

    Get PDF
    Cap43 is a protein whose RNA is induced under conditions of severe hypoxia or prolonged elevations of intracellular calcium. Additionally, Ni and Co also induce Cap43 because they produce a state of hypoxia in cells. Cap43 protein is expressed at low levels in normal tissues; however, in a variety of cancers, including lung, brain, melanoma, liver, prostate, breast, and renal cancers, Cap43 protein is overexpressed in cancer cells. The low level of expression of Cap43 in some normal tissues compared with their cancerous counterparts, combined with the high stability of Cap43 protein and mRNA, makes the Cap43 gene a new, important cancer marker. We hypothesize that the mechanism of Cap43 overexpression in cancer cells involves a state of hypoxia characteristic of cancer cells where the Cap43 protein becomes a signature for this hypoxic state

    Simultaneous gene expression profiling in human macrophages infected with Leishmania major parasites using SAGE

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Leishmania </it>(<it>L</it>) are intracellular protozoan parasites that are able to survive and replicate within the harsh and potentially hostile phagolysosomal environment of mammalian mononuclear phagocytes. A complex interplay then takes place between the macrophage (MΦ) striving to eliminate the pathogen and the parasite struggling for its own survival.</p> <p>To investigate this host-parasite conflict at the transcriptional level, in the context of monocyte-derived human MΦs (MDM) infection by <it>L. major </it>metacyclic promastigotes, the quantitative technique of serial analysis of gene expression (SAGE) was used.</p> <p>Results</p> <p>After extracting mRNA from resting human MΦs, <it>Leishmania</it>-infected human MΦs and <it>L. major </it>parasites, three SAGE libraries were constructed and sequenced generating up to 28,173; 57,514 and 33,906 tags respectively (corresponding to 12,946; 23,442 and 9,530 unique tags). Using computational data analysis and direct comparison to 357,888 publicly available experimental human tags, the parasite and the host cell transcriptomes were then simultaneously characterized from the mixed cellular extract, confidently discriminating host from parasite transcripts. This procedure led us to reliably assign 3,814 tags to MΦs' and 3,666 tags to <it>L. major </it>parasites transcripts. We focused on these, showing significant changes in their expression that are likely to be relevant to the pathogenesis of parasite infection: (i) human MΦs genes, belonging to key immune response proteins (e.g., IFNγ pathway, S100 and chemokine families) and (ii) a group of <it>Leishmania </it>genes showing a preferential expression at the parasite's intra-cellular developing stage.</p> <p>Conclusion</p> <p>Dual SAGE transcriptome analysis provided a useful, powerful and accurate approach to discriminating genes of human or parasitic origin in <it>Leishmania</it>-infected human MΦs. The findings presented in this work suggest that the <it>Leishmania </it>parasite modulates key transcripts in human MΦs that may be beneficial for its establishment and survival. Furthermore, these results provide an overview of gene expression at two developmental stages of the parasite, namely metacyclic promastigotes and intracellular amastigotes and indicate a broad difference between their transcriptomic profiles. Finally, our reported set of expressed genes will be useful in future rounds of data mining and gene annotation.</p

    Transcriptome annotation using tandem SAGE tags

    Get PDF
    Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences (TDGS), we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these TDGS overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation

    Contribution à l'étude des tumeurs chez le cheval ; Travail du Laboratoire d'Anatomie pathologique de la Faculté de médecine de Bordeaux

    No full text
    Thèse : Médecine : Université de Bordeaux : 1913N° d'ordre : 3

    Mecanismes de regulation de la lymphopoiese B impliquesdans deux lymphopathies malignes de type B : le myelome multiple et la leucemie lymphoide chronique

    No full text
    CNRS T Bordereau / INIST-CNRS - Institut de l'Information Scientifique et TechniqueSIGLEFRFranc

    La Creatividad desarrollada a través de la danza en la etapa de Educación Primaria

    No full text
    Partiendo de una disciplina y arte tan completo como es la danza, se va a trabajar el desarrollo en alumnos pertenecientes al primer tramo de la Etapa de Educación Primaria, de una de sus capacidades más demandadas en la sociedad actual, la creatividad, pero para la que hoy en día en los centros escolares no hay un rango horario establecido para su potenciación, y el tiempo dedicado a través de otras materias es demasiado reducido y por lo tanto insuficiente. Para ello y tras el paulatino control, a través de la ejecución de las sesiones pertinentes, de un determinado desarrollo motor, un conocimiento del esquema corporal, un dominio de la coordinación del propio cuerpo y de la expresión del mismo, como paulatinamente a través del trabajo de diferentes dinámicas, los alumnos van a ir dominando una serie de recursos corporales que proporciona la danza que le van a ir posibilitando el desarrollo de su propia creatividad. Una vez transcurrido el trabajo del desarrollo de esta capacidad a través de la danza, lo alumnos podrán globalizar esos conocimientos y extrapolarlos a la expresión de la creatividad a través de otras disciplinas, actividades, comportamientos…de su vida.Starting from a discipline and art as complete as dance, we will work on primary education stage children’s development around one of the most demanding capacity witch is creativity, but for that today in schools there is no established time for its reinforcement, ant the time spent through other subjects is too small and therefore insufficient. For this and after the gradual control, through the execution of the relevant sessions, of a certain motor development, a knowledge of the body scheme, a domain of the coordination of the body itself and the expression of the same, as gradually through the work with different dynamics, students will be mastering a series of body resources that the dance provides which will allow them to develop their own creativity. Once the work of developing this capacity through dance has passed, students will be able to globalize that knowledge and extrapolate them to the expression of creativity through other disciplines, activities, behaviors ... of their life
    corecore