131 research outputs found
Recommended from our members
*-DCC: A platform to collect, annotate, and explore a large variety of sequencing experiments.
BackgroundOver the past few years the variety of experimental designs and protocols for sequencing experiments increased greatly. To ensure the wide usability of the produced data beyond an individual project, rich and systematic annotation of the underlying experiments is crucial.FindingsWe first developed an annotation structure that captures the overall experimental design as well as the relevant details of the steps from the biological sample to the library preparation, the sequencing procedure, and the sequencing and processed files. Through various design features, such as controlled vocabularies and different field requirements, we ensured a high annotation quality, comparability, and ease of annotation. The structure can be easily adapted to a large variety of species. We then implemented the annotation strategy in a user-hosted web platform with data import, query, and export functionality.ConclusionsWe present here an annotation structure and user-hosted platform for sequencing experiment data, suitable for lab-internal documentation, collaborations, and large-scale annotation efforts
Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data
BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from [email protected] upon request. CONCLUSION: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended
pre-miRNA profiles obtained through application of locked nucleic acids and deep sequencing reveals complex 5′/3′ arm variation including concomitant cleavage and polyuridylation patterns
Recent research hints at an underappreciated complexity in pre-miRNA processing and regulation. Global profiling of pre-miRNA and its potential to increase understanding of the pre-miRNA landscape is impeded by overlap with highly expressed classes of other non coding (nc) RNA. Here, we present a data set excluding these RNA before sequencing through locked nucleic acids (LNA), greatly increasing pre-miRNA sequence counts with no discernable effect on pre-miRNA or mature miRNA sequencing. Analysis of profiles generated in total, nuclear and cytoplasmic cell fractions reveals that pre-miRNAs are subject to a wide range of regulatory processes involving loci-specific 3′- and 5′-end variation entailing complex cleavage patterns with co-occurring polyuridylation. Additionally, examination of nuclear-enriched flanking sequences of pre-miRNA, particularly those derived from polycistronic miRNA transcripts, provides insight into miRNA and miRNA-offset (moRNA) production, specifically identifying novel classes of RNA potentially functioning as moRNA precursors. Our findings point to particularly intricate regulation of the let-7 family in many ways reminiscent of DICER1-independent, pre-mir-451-like processing, introduce novel and unify known forms of pre-miRNA regulation and processing, and shed new light on overlooked products of miRNA processing pathways
A comprehensive promoter landscape identifies a novel promoter for CD133 in restricted tissues, cancers, and stem cells
PROM1 is the gene encoding prominin-1 or CD133, an important cell surface marker for the isolation of both normal and cancer stem cells. PROM1 transcripts initiate at a range of transcription start sites (TSS) associated with distinct tissue and cancer expression profiles. Using high resolution Cap Analysis of Gene Expression (CAGE) sequencing we characterize TSS utilization across a broad range of normal and developmental tissues. We identify a novel proximal promoter (P6) within CD133+ melanoma cell lines and stem cells. Additional exon array sampling finds P6 to be active in populations enriched for mesenchyme, neural stem cells and within CD133+ enriched Ewing sarcomas. The P6 promoter is enriched with respect to previously characterized PROM1 promoters for a HMGI/Y (HMGA1) family transcription factor binding site motif and exhibits different epigenetic modifications relative to the canonical promoter region of PROM1
Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data
A set of methods is presented for normalization, quantification of noise and co-expression analysis for gene expression studies using deep sequencing
Identification and transfer of spatial transcriptomics signatures for cancer diagnosis
Background: Distinguishing ductal carcinoma in situ (DCIS) from invasive ductal carcinoma (IDC) regions in clinical biopsies constitutes a diagnostic challenge. Spatial transcriptomics (ST) is an in situ capturing method, which allows quantification and visualization of transcriptomes in individual tissue sections. In the past, studies have shown that breast cancer samples can be used to study their transcriptomes with spatial resolution in individual tissue sections. Previously, supervised machine learning methods were used in clinical studies to predict the clinical outcomes for cancer types. Methods: We used four publicly available ST breast cancer datasets from breast tissue sections annotated by pathologists as non-malignant, DCIS, or IDC. We trained and tested a machine learning method (support vector machine) based on the expert annotation as well as based on automatic selection of cell types by their transcriptome profiles. Results: We identified expression signatures for expert annotated regions (non-malignant, DCIS, and IDC) and build machine learning models. Classification results for 798 expression signature transcripts showed high coincidence with the expert pathologist annotation for DCIS (100%) and IDC (96%). Extending our analysis to include all 25,179 expressed transcripts resulted in an accuracy of 99% for DCIS and 98% for IDC. Further, classification based on an automatically identified expression signature covering all ST spots of tissue sections resulted in prediction accuracy of 95% for DCIS and 91% for IDC. Conclusions: This concept study suggest that the ST signatures learned from expert selected breast cancer tissue sections can be used to identify breast cancer regions in whole tissue sections including regions not trained on. Furthermore, the identified expression signatures can classify cancer regions in tissue sections not used for training with high accuracy. Expert-generated but even automatically generated cancer signatures from ST data might be able to classify breast cancer regions and provide clinical decision support for pathologists in the future
Transcriptional features of genomic regulatory blocks
CAGE tag mapping of transcription start sites across different human tissues shows that genomic regulatory blocks have unique features that are the likely cause of their ability to respond to regulatory inputs from very long distances
Nonimmunoglobulin target loci of activation-induced cytidine deaminase (AID) share unique features with immunoglobulin genes.
Activation-induced cytidine deaminase (AID) is required for both somatic hypermutation and class-switch recombination in activated B cells. AID is also known to target nonimmunoglobulin genes and introduce mutations or chromosomal translocations, eventually causing tumors. To identify as-yet-unknown AID targets, we screened early AID-induced DNA breaks by using two independent genome-wide approaches. Along with known AID targets, this screen identified a set of unique genes (SNHG3, MALAT1, BCL7A, and CUX1) and confirmed that these loci accumulated mutations as frequently as Ig locus after AID activation. Moreover, these genes share three important characteristics with the Ig gene: translocations in tumors, repetitive sequences, and the epigenetic modification of chromatin by H3K4 trimethylation in the vicinity of cleavage sites
- …