212 research outputs found
Landscape of transcription in human cells
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell’s regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene
Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia
Chronic lymphocytic leukemia (CLL) has heterogeneous clinical and biological behavior. Whole-genome and -exome sequencing has contributed to the characterization of the mutational spectrum of the disease, but the underlying transcriptional profile is still poorly understood. We have performed deep RNA sequencing in different subpopulations of normal B-lymphocytes and CLL cells from a cohort of 98 patients, and characterized the CLL transcriptional landscape with unprecedented resolution. We detected thousands of transcriptional elements differentially expressed between the CLL and normal B cells, including protein-coding genes, noncoding RNAs, and pseudogenes. Transposable elements are globally derepressed in CLL cells. In addition, two thousand genes-most of which are not differentially expressed-exhibit CLL-specific splicing patterns. Genes involved in metabolic pathways showed higher expression in CLL, while genes related to spliceosome, proteasome, and ribosome were among the most down-regulated in CLL. Clustering of the CLL samples according to RNA-seq derived gene expression levels unveiled two robust molecular subgroups, C1 and C2. C1/C2 subgroups and the mutational status of the immunoglobulin heavy variable (IGHV) region were the only independent variables in predicting time to treatment in a multivariate analysis with main clinico-biological features. This subdivision was validated in an independent cohort of patients monitored through DNA microarrays. Further analysis shows that B-cell receptor (BCR) activation in the microenvironment of the lymph node may be at the origin of the C1/C2 differences
Characterization of 3D genomic interactions in fetal pig muscle
Genome sequence alone is not sufficient to explain the overall coordination of nuclear activity in a particular tissue. The nuclear organisation and genomic long-range intra- and inter-chromosomal interactions play an important role in the regulation of gene expression and the activation of tissue- specific gene networks. Here we present an overview of the pig genome architecture in muscle at two late developmental stages. The muscle maturation process occurs between the 90th day and the end of gestation (114 days), a key period for survival at birth. To characterise this period we profiled chromatin interactions genome-wide with in situ Hi-C (High Throughput Chromosome Conformation Capture) in muscle samples collected at 90 and 110 days of gestation, specific moments where a drastic change in gene expression has been reported. About 200 million read pairs per library were generated (3 replicates per condition). This allowed: (a) the design of an experimental Hi-C protocol optimized for frozen fetal tissues, (b) the first Hi-C contact heatmaps in fetal porcine muscle cells, and (c) to profile Topologically Associated Domains (TADs) defined as genomic domains with high levels of chromatin interactions. Using the new assembly version Sus scrofa v11, we could map 82% of the Hi-C reads on the reference genome. After filtering, 49% of valid read pairs were used to infer the genomic interactions in both developmental stages. In addition, ChIP-seq experiments were performed to map the binding of the structural protein CTCF, known to regulate genome structure by promoting interactions between genes and distal enhancers. The Hi-C and ChIP-seq data were analysed in combination with the results of a previous transcriptome analysis, focusing on the hun-dreds of genes that were reported as differentially expressed during muscle maturation. We will report the observed general differences between both developmental stages in terms of transcription and structure
Profiling the landscape of transcription, chromatin accessibility and chromosome conformation of cattle, pig, chicken and goat genomes [FAANG pilot project]
Functional annotation of livestock genomes is a critical and obvious next step to derive maximum benefit for agriculture, animal science, animal welfare and human health. The aim of the Fr-AgENCODE project is to generate multi-species functional genome annotations by applying high-throughput molecular assays on three target tissues/cells relevant to the study of immune and metabolic traits. An extensive collection of stored samples from other tissues is available for further use (FAANG Biosamples ‘FR-AGENCODE’). From each of two males and two females per species (pig, cattle, goat, chicken), strand-oriented RNA-seq and chromatin accessibility ATAC-seq assays were performed on liver tissue and on two T-cell types (CD3+CD4+&CD3+CD8+) sorted from blood (mammals) or spleen (chicken). Chromosome Conformation Capture (in situ Hi-C) was also carried out on liver. Sequencing reads from the 3 assays were processed using standard processing pipelines. While most (50–70%) RNA-seq reads mapped to annotated exons, thousands of novel transcripts and genes were found, including extensions of annotated protein-coding genes and new lncRNAs (see abstract #69857). Consistency of ATAC-seq results was confirmed by the significant proportion of called peaks in promoter regions (36–66%) and by the specific accumulation pattern of peaks around gene starts (TSS) v. gene ends (TTS). Principal Component Analyses for RNA-seq (based on quantified gene expression) and ATAC-seq (based on quantified chromatin accessibility) highlighted clusters characterised by cell type and sex in all species. From Hi-C data, we generated 40kb-resolution interaction maps, profiled a genome-wide Directionality Index and identified from 4,100 (chicken) to 12,100 (pig) topologically-associating do- mains (TADs). Correlations were reported between RNA-seq and ATAC-seq results (see abstract #71581). In summary, we present here an overview of the first multi-species and -tissue annotations of chromatin accessibility and genome architecture related to gene expression for farm animals
Empowering bioinformatics communities with Nextflow and nf-core: [Preprint]
Standardised analysis pipelines are an important part of FAIR bioinformatics research. Over the last decade, there has been a notable shift from point-and-click pipeline solutions such as Galaxy towards command-line solutions such as Nextflow and Snakemake. We report on recent developments in the nf-core and Nextflow frameworks that have led to widespread adoption across many scientific communities. We describe how adopting nf-core standards enables faster development, improved interoperability, and collaboration with the >8,000 members of the nf-core community. The recent development of Nextflow Domain-Specific Language 2 (DSL2) allows pipeline components to be shared and combined across projects. The nf-core community has harnessed this with a library of modules and subworkflows that can be integrated into any Nextflow pipeline, enabling research communities to progressively transition to nf-core best practices. We present a case study of nf-core adoption by six European research consortia, grouped under the EuroFAANG umbrella and dedicated to farmed animal genomics. We believe that the process outlined in this report can inspire many large consortia to seek harmonisation of their data analysis procedures
Enhanced Transcriptome Maps from Multiple Mouse Tissues Reveal Evolutionary Constraint in Gene Expression for Thousands of Genes
We characterized by RNA-seq the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles obtained in human cell lines reveals substantial conservation of transcriptional programs, and uncovers a distinct class of genes with levels of expression across cell types and species, that have been constrained early in vertebrate evolution. This core set of genes capture a substantial and constant fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with strong and conserved epigenetic marking, as well as to a characteristic post-transcriptional regulatory program in which sub-cellular localization and alternative splicing play comparatively large roles
Landscape of transcription in human cells
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene
Landscape of transcription in human cells
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific sub-cellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic sub-cellular localizations are also poorly understood. Since RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell’s regulatory capabilities are focused on its synthesis, processing, transport, modifications and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations taken together prompt to a redefinition of the concept of a gene
The genomic landscape of balanced cytogenetic abnormalities associated with human congenital anomalies
Despite the clinical significance of balanced chromosomal abnormalities (BCAs), their characterization has largely been restricted to cytogenetic resolution. We explored the landscape of BCAs at nucleotide resolution in 273 subjects with a spectrum of congenital anomalies. Whole-genome sequencing revised 93% of karyotypes and demonstrated complexity that was cryptic to karyotyping in 21% of BCAs, highlighting the limitations of conventional cytogenetic approaches. At least 33.9% of BCAs resulted in gene disruption that likely contributed to the developmental phenotype, 5.2% were associated with pathogenic genomic imbalances, and 7.3% disrupted topologically associated domains (TADs) encompassing known syndromic loci. Remarkably, BCA breakpoints in eight subjects altered a single TAD encompassing MEF2C, a known driver of 5q14.3 microdeletion syndrome, resulting in decreased MEF2C expression. We propose that sequence-level resolution dramatically improves prediction of clinical outcomes for balanced rearrangements and provides insight into new pathogenic mechanisms, such as altered regulation due to changes in chromosome topology
- …
