129 research outputs found

    Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data

    Get PDF
    BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from [email protected] upon request. CONCLUSION: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended

    pre-miRNA profiles obtained through application of locked nucleic acids and deep sequencing reveals complex 5′/3′ arm variation including concomitant cleavage and polyuridylation patterns

    Get PDF
    Recent research hints at an underappreciated complexity in pre-miRNA processing and regulation. Global profiling of pre-miRNA and its potential to increase understanding of the pre-miRNA landscape is impeded by overlap with highly expressed classes of other non coding (nc) RNA. Here, we present a data set excluding these RNA before sequencing through locked nucleic acids (LNA), greatly increasing pre-miRNA sequence counts with no discernable effect on pre-miRNA or mature miRNA sequencing. Analysis of profiles generated in total, nuclear and cytoplasmic cell fractions reveals that pre-miRNAs are subject to a wide range of regulatory processes involving loci-specific 3′- and 5′-end variation entailing complex cleavage patterns with co-occurring polyuridylation. Additionally, examination of nuclear-enriched flanking sequences of pre-miRNA, particularly those derived from polycistronic miRNA transcripts, provides insight into miRNA and miRNA-offset (moRNA) production, specifically identifying novel classes of RNA potentially functioning as moRNA precursors. Our findings point to particularly intricate regulation of the let-7 family in many ways reminiscent of DICER1-independent, pre-mir-451-like processing, introduce novel and unify known forms of pre-miRNA regulation and processing, and shed new light on overlooked products of miRNA processing pathways

    A comprehensive promoter landscape identifies a novel promoter for CD133 in restricted tissues, cancers, and stem cells

    Get PDF
    PROM1 is the gene encoding prominin-1 or CD133, an important cell surface marker for the isolation of both normal and cancer stem cells. PROM1 transcripts initiate at a range of transcription start sites (TSS) associated with distinct tissue and cancer expression profiles. Using high resolution Cap Analysis of Gene Expression (CAGE) sequencing we characterize TSS utilization across a broad range of normal and developmental tissues. We identify a novel proximal promoter (P6) within CD133+ melanoma cell lines and stem cells. Additional exon array sampling finds P6 to be active in populations enriched for mesenchyme, neural stem cells and within CD133+ enriched Ewing sarcomas. The P6 promoter is enriched with respect to previously characterized PROM1 promoters for a HMGI/Y (HMGA1) family transcription factor binding site motif and exhibits different epigenetic modifications relative to the canonical promoter region of PROM1

    Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data

    Get PDF
    A set of methods is presented for normalization, quantification of noise and co-expression analysis for gene expression studies using deep sequencing

    Transcriptional features of genomic regulatory blocks

    Get PDF
    CAGE tag mapping of transcription start sites across different human tissues shows that genomic regulatory blocks have unique features that are the likely cause of their ability to respond to regulatory inputs from very long distances

    Nonimmunoglobulin target loci of activation-induced cytidine deaminase (AID) share unique features with immunoglobulin genes.

    Get PDF
    Activation-induced cytidine deaminase (AID) is required for both somatic hypermutation and class-switch recombination in activated B cells. AID is also known to target nonimmunoglobulin genes and introduce mutations or chromosomal translocations, eventually causing tumors. To identify as-yet-unknown AID targets, we screened early AID-induced DNA breaks by using two independent genome-wide approaches. Along with known AID targets, this screen identified a set of unique genes (SNHG3, MALAT1, BCL7A, and CUX1) and confirmed that these loci accumulated mutations as frequently as Ig locus after AID activation. Moreover, these genes share three important characteristics with the Ig gene: translocations in tumors, repetitive sequences, and the epigenetic modification of chromatin by H3K4 trimethylation in the vicinity of cleavage sites
    corecore