57 research outputs found

    CAGE-TSSchip: promoter-based expression profiling using the 5'-leading label of capped transcripts

    Get PDF
    A novel approach that combines CAGE expression analysis with oligonucleotide array technology allows for the accurate and sensitive detection of promoter-based transcriptional activity

    Protein-protein interactions of the hyperthermophilic archaeon Pyrococcus horikoshii OT3

    Get PDF
    BACKGROUND: Although 2,061 proteins of Pyrococcus horikoshii OT3, a hyperthermophilic archaeon, have been predicted from the recently completed genome sequence, the majority of proteins show no similarity to those from other organisms and are thus hypothetical proteins of unknown function. Because most proteins operate as parts of complexes to regulate biological processes, we systematically analyzed protein-protein interactions in Pyrococcus using the mammalian two-hybrid system to determine the function of the hypothetical proteins. RESULTS: We examined 960 soluble proteins from Pyrococcus and selected 107 interactions based on luciferase reporter activity, which was then evaluated using a computational approach to assess the reliability of the interactions. We also analyzed the expression of the assay samples by western blot, and a few interactions by in vitro pull-down assays. We identified 11 hetero-interactions that we considered to be located at the same operon, as observed in Helicobacter pylori. We annotated and classified proteins in the selected interactions according to their orthologous proteins. Many enzyme proteins showed self-interactions, similar to those seen in other organisms. CONCLUSION: We found 13 unannotated proteins that interacted with annotated proteins; this information is useful for predicting the functions of the hypothetical Pyrococcus proteins from the annotations of their interacting partners. Among the heterogeneous interactions, proteins were more likely to interact with proteins within the same ortholog class than with proteins of different classes. The analysis described here can provide global insights into the biological features of the protein-protein interactions in P. horikoshii

    LRRN4 and UPK3B Are Markers of Primary Mesothelial Cells

    Get PDF
    Mesothelioma is a highly malignant tumor that is primarily caused by occupational or environmental exposure to asbestos fibers. Despite worldwide restrictions on asbestos usage, further cases are expected as diagnosis is typically 20–40 years after exposure. Once diagnosed there is a very poor prognosis with a median survival rate of 9 months. Considering this the development of early pre clinical diagnostic markers may help improve clinical outcomes.Microarray expression arrays on mesothelium and other tissues dissected from mice were used to identify candidate mesothelial lineage markers. Candidates were further tested by qRTPCR and in-situ hybridization across a mouse tissue panel. Two candidate biomarkers with the potential for secretion, uroplakin 3B (UPK3B), and leucine rich repeat neuronal 4 (LRRN4) and one commercialized mesothelioma marker, mesothelin (MSLN) were then chosen for validation across a panel of normal human primary cells, 16 established mesothelioma cell lines, 10 lung cancer lines, and a further set of 8 unrelated cancer cell lines.Within the primary cell panel, LRRN4 was only detected in primary mesothelial cells, but MSLN and UPK3B were also detected in other cell types. MSLN was detected in bronchial epithelial cells and alveolar epithelial cells and UPK3B was detected in retinal pigment epithelial cells and urothelial cells. Testing the cell line panel, MSLN was detected in 15 of the 16 mesothelioma cells lines, whereas LRRN4 was only detected in 8 and UPK3B in 6. Interestingly MSLN levels appear to be upregulated in the mesothelioma lines compared to the primary mesothelial cells, while LRRN4 and UPK3B, are either lost or down-regulated. Despite the higher fraction of mesothelioma lines positive for MSLN, it was also detected at high levels in 2 lung cancer lines and 3 other unrelated cancer lines derived from papillotubular adenocarcinoma, signet ring carcinoma and transitional cell carcinoma

    Automated Workflow for Preparation of cDNA for Cap Analysis of Gene Expression on a Single Molecule Sequencer

    Get PDF
    Background: Cap analysis of gene expression (CAGE) is a 59 sequence tag technology to globally determine transcriptional starting sites in the genome and their expression levels and has most recently been adapted to the HeliScope single molecule sequencer. Despite significant simplifications in the CAGE protocol, it has until now been a labour intensive protocol. Methodology: In this study we set out to adapt the protocol to a robotic workflow, which would increase throughput and reduce handling. The automated CAGE cDNA preparation system we present here can prepare 96 ‘HeliScope ready ’ CAGE cDNA libraries in 8 days, as opposed to 6 weeks by a manual operator.We compare the results obtained using the same RNA in manual libraries and across multiple automation batches to assess reproducibility. Conclusions: We show that the sequencing was highly reproducible and comparable to manual libraries with an 8 fold increase in productivity. The automated CAGE cDNA preparation system can prepare 96 CAGE sequencing samples simultaneously. Finally we discuss how the system could be used for CAGE on Illumina/SOLiD platforms, RNA-seq and fulllengt

    Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs

    Get PDF
    The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species

    The Constrained Maximal Expression Level Owing to Haploidy Shapes Gene Content on the Mammalian X Chromosome.

    Get PDF
    X chromosomes are unusual in many regards, not least of which is their nonrandom gene content. The causes of this bias are commonly discussed in the context of sexual antagonism and the avoidance of activity in the male germline. Here, we examine the notion that, at least in some taxa, functionally biased gene content may more profoundly be shaped by limits imposed on gene expression owing to haploid expression of the X chromosome. Notably, if the X, as in primates, is transcribed at rates comparable to the ancestral rate (per promoter) prior to the X chromosome formation, then the X is not a tolerable environment for genes with very high maximal net levels of expression, owing to transcriptional traffic jams. We test this hypothesis using The Encyclopedia of DNA Elements (ENCODE) and data from the Functional Annotation of the Mammalian Genome (FANTOM5) project. As predicted, the maximal expression of human X-linked genes is much lower than that of genes on autosomes: on average, maximal expression is three times lower on the X chromosome than on autosomes. Similarly, autosome-to-X retroposition events are associated with lower maximal expression of retrogenes on the X than seen for X-to-autosome retrogenes on autosomes. Also as expected, X-linked genes have a lesser degree of increase in gene expression than autosomal ones (compared to the human/Chimpanzee common ancestor) if highly expressed, but not if lowly expressed. The traffic jam model also explains the known lower breadth of expression for genes on the X (and the Z of birds), as genes with broad expression are, on average, those with high maximal expression. As then further predicted, highly expressed tissue-specific genes are also rare on the X and broadly expressed genes on the X tend to be lowly expressed, both indicating that the trend is shaped by the maximal expression level not the breadth of expression per se. Importantly, a limit to the maximal expression level explains biased tissue of expression profiles of X-linked genes. Tissues whose tissue-specific genes are very highly expressed (e.g., secretory tissues, tissues abundant in structural proteins) are also tissues in which gene expression is relatively rare on the X chromosome. These trends cannot be fully accounted for in terms of alternative models of biased expression. In conclusion, the notion that it is hard for genes on the Therian X to be highly expressed, owing to transcriptional traffic jams, provides a simple yet robustly supported rationale of many peculiar features of X's gene content, gene expression, and evolution

    Overview of probe design: genomic coordination of TSSs and CAGE-TSSchip probes

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "CAGE-TSSchip: promoter-based expression profiling using the 5'-leading label of capped transcripts"</p><p>http://genomebiology.com/2007/8/3/R42</p><p>Genome Biology 2007;8(3):R42-R42.</p><p>Published online 26 Mar 2007</p><p>PMCID:PMC1868931.</p><p></p> The upper four tracks are an arrangement example of full-length transcripts (cDNA) and 5'-ends of transcripts derived from various methods (cap analysis gene expression [CAGE], 5'-expressed sequence tag [EST], and 5'-end of gene identification signature/gene signature cloning [4]). Tag clusters (TC; green arrow) are the overlapping regions of the 5'-ends. The most frequent transciption start site (TSS) for each TC is the representative position (vertical line from TC arrows). Fragments for the probe design, of 120-nucleotide long genomic sequences, starts from the representative position of each TC fragment, shown by cyan arrows. If the fragment overlaps the 5'-end of any exon-intron junction (diamond of cDNA and 5'-EST transcripts), the fragment skips the intron to the next exon. According to the Agilent probe design service, the 60-nucleotide appropriate region within each fragment would then be suggested for array probes (probe; blue arrows). Details of probe preparation are available in Additional data file 8

    Unamplified cap analysis of gene expression on a single-molecule sequencer

    No full text
    We report the development of a simplified cap analysis of gene expression (CAGE) protocol adapted for single-molecule sequencers that avoids second strand synthesis, ligation, digestion, and PCR. HeliScopeCAGE directly sequences the 3′ end of cap trapped first-strand cDNAs. As with previous versions of CAGE, we better define transcription start sites (TSS) than known models, identify novel regions of transcription and alternative promoters, and find two major classes of TSS signal, sharp peaks and broad regions. However, using this protocol, we observe reproducible evidence of regulation at the much finer level of individual TSS positions. The libraries are quantitative over 5 orders of magnitude and highly reproducible (Pearson's correlation coefficient of 0.987). We have also scaled down the sample requirement to 5 μg of total RNA for a standard HeliScopeCAGE library and 100 ng for a low-quantity version. When the same RNA was run as 5-μg and 100-ng versions, the 100 ng was still able to detect expression for ∼60% of the 13,468 loci detected by a 5-μg library using the same threshold, allowing comparative analysis of even rare cell populations. Testing the protocol for differential gene expression measurements on triplicate HeLa and THP-1 samples, we find that the log fold change compared to Illumina microarray measurements is highly correlated (0.871). In addition, HeliScopeCAGE finds differential expression for thousands more loci including those with probes on the array. Finally, although the majority of tags are 5′ associated, we also observe a low level of signal on exons that is useful for defining gene structures
    corecore