355 research outputs found

    Simcluster: clustering enumeration gene expression data on the simplex space

    Get PDF
    Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space.

Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster.

Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data

    PatternLab for proteomics: a tool for differential shotgun proteomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu <it>et al</it>. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen <it>et al</it>. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired.</p> <p>Results</p> <p>To address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen <it>et al</it>. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine) because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies.</p> <p>Conclusion</p> <p>PatternLab offers an easy and unified access to a variety of feature selection and normalization strategies, each having its own niche. Additionally, graphing tools are available to aid in the analysis of high throughput experimental data. PatternLab is available at <url>http://pcarvalho.com/patternlab</url>.</p

    Transcriptome analysis of mRNA and miRNA in skeletal muscle indicates an important network for differential Residual Feed Intake in pigs

    Get PDF
    Feed efficiency (FE) can be measured by feed conversion ratio (FCR) or residual feed intake (RFI). In this study, we measured the FE related phenotypes of 236 castrated purebred Yorkshire boars, and selected 10 extreme individuals with high and low RFI for transcriptome analysis. We used RNA-seq analyses to determine the differential expression of genes and miRNAs in skeletal muscle. There were 99 differentially expressed genes identified (q ≤ 0.05). The down-regulated genes were mainly involved in mitochondrial energy metabolism, including FABP3, RCAN, PPARGC1 (PGC-1A), HK2 and PRKAG2. The up-regulated genes were mainly involved in skeletal muscle differentiation and proliferation, including IGF2, PDE7A, CEBPD, PIK3R1 and MYH6. Moreover, 15 differentially expressed miRNAs (|log2FC| ≥ 1, total reads count ≥ 20, p ≤ 0.05) were identified. Among them, miR-136, miR-30e-5p, miR-1, miR-208b, miR-199a, miR-101 and miR-29c were up-regulated, while miR-215, miR-365-5p, miR-486, miR-1271, miR-145, miR-99b, miR-191 and miR-10b were down-regulated in low RFI pigs. We conclude that decreasing mitochondrial energy metabolism, possibly through AMPK - PGC-1A pathways, and increasing muscle growth, through IGF-1/2 and TGF-β signaling pathways, are potential strategies for the improvement of FE in pigs (and possibly other livestock). This study provides new insights into the molecular mechanisms that determine RFI and FE in pigs

    Identification of Lipases Involved in PBAN Stimulated Pheromone Production in Bombyx mori Using the DGE and RNAi Approaches

    Get PDF
    BACKGROUND: Pheromone biosynthesis activating neuropeptide (PBAN) is a neurohormone that regulates sex pheromone synthesis in female moths. Bombyx mori is a model organism that has been used to explore the signal transduction pattern of PBAN, which is mediated by a G-protein coupled receptor (GPCR). Although significant progress has been made in elucidating PBAN-regulated lipolysis that releases the precursor of the sex pheromone, little is known about the molecular components involved in this step. To better elucidate the molecular mechanisms of PBAN-stimulated lipolysis of cytoplasmic lipid droplets (LDs), the associated lipase genes involved in PBAN- regulated sex pheromone biosynthesis were identified using digital gene expression (DGE) and subsequent RNA interference (RNAi). RESULTS: Three DGE libraries were constructed from pheromone glands (PGs) at different developed stages, namely, 72 hours before eclosion (-72 h), new emergence (0 h) and 72 h after eclosion (72 h), to investigate the gene expression profiles during PG development. The DGE evaluated over 5.6 million clean tags in each PG sample and revealed numerous genes that were differentially expressed at these stages. Most importantly, seven lipases were found to be richly expressed during the key stage of sex pheromone synthesis and release (new emergence). RNAi-mediated knockdown confirmed for the first time that four of these seven lipases play important roles in sex pheromone synthesis. CONCLUSION: This study has identified four lipases directly involved in PBAN-stimulated sex pheromone biosynthesis, which improve our understanding of the lipases involved in releasing bombykol precursors from triacylglycerols (TAGs) within the cytoplasmic LDs

    Wig-1, a novel regulator of N-Myc mRNA and N-Myc-driven tumor growth

    Get PDF
    Wig-1 is a transcriptional target of the p53 tumor suppressor and encodes an mRNA stability-regulating protein. We show here that Wig-1 knockdown causes a dramatic inhibition of N-Myc expression and triggers differentiation in neuroblastoma cells carrying amplified N-Myc. Transient Wig-1 knockdown significantly delays development of N-Myc-driven tumors in mice. We also show that N-Myc expression is induced upon moderate p53-activating stress, suggesting a role of the p53-Wig-1-N-Myc axis in promoting cell cycle re-entry upon p53-induced cell cycle arrest and DNA repair. Moreover, our findings raise possibilities for the improved treatment of poor prognosis neuroblastomas that carry amplified N-Myc

    Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries.</p> <p>Results</p> <p>We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter.</p> <p>Conclusion</p> <p>Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used.</p

    The Genome of Borrelia recurrentis, the Agent of Deadly Louse-Borne Relapsing Fever, Is a Degraded Subset of Tick-Borne Borrelia duttonii

    Get PDF
    In an effort to understand how a tick-borne pathogen adapts to the body louse, we sequenced and compared the genomes of the recurrent fever agents Borrelia recurrentis and B. duttonii. The 1,242,163–1,574,910-bp fragmented genomes of B. recurrentis and B. duttonii contain a unique 23-kb linear plasmid. This linear plasmid exhibits a large polyT track within the promoter region of an intact variable large protein gene and a telomere resolvase that is unique to Borrelia. The genome content is characterized by several repeat families, including antigenic lipoproteins. B. recurrentis exhibited a 20.4% genome size reduction and appeared to be a strain of B. duttonii, with a decaying genome, possibly due to the accumulation of genomic errors induced by the loss of recA and mutS. Accompanying this were increases in the number of impaired genes and a reduction in coding capacity, including surface-exposed lipoproteins and putative virulence factors. Analysis of the reconstructed ancestral sequence compared to B. duttonii and B. recurrentis was consistent with the accelerated evolution observed in B. recurrentis. Vector specialization of louse-borne pathogens responsible for major epidemics was associated with rapid genome reduction. The correlation between gene loss and increased virulence of B. recurrentis parallels that of Rickettsia prowazekii, with both species being genomic subsets of less-virulent strains

    Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes.

    Get PDF
    INTRODUCTION: Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. RESULTS: We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. CONCLUSION: Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study
    corecore