9,140 research outputs found

    Gill transcriptome response to changes in environmental calcium in the green spotted puffer fish

    Get PDF
    Abstract Background Calcium ion is tightly regulated in body fluids and for euryhaline fish, which are exposed to rapid changes in environmental [Ca2+], homeostasis is especially challenging. The gill is the main organ of active calcium uptake and therefore plays a crucial role in the maintenance of calcium ion homeostasis. To study the molecular basis of the short-term responses to changing calcium availability, the whole gill transcriptome obtained by Super Serial Analysis of Gene Expression (SuperSAGE) of the euryhaline teleost green spotted puffer fish, Tetraodon nigroviridis, exposed to water with altered [Ca2+] was analysed. Results Transfer of T. nigroviridis from 10 ppt water salinity containing 2.9 mM Ca2+ to high (10 mM Ca2+ ) and low (0.01 mM Ca2+) calcium water of similar salinity for 2-12 h resulted in 1,339 differentially expressed SuperSAGE tags (26-bp transcript identifiers) in gills. Of these 869 tags (65%) were mapped to T. nigroviridis cDNAs or genomic DNA and 497 (57%) were assigned to known proteins. Thirteen percent of the genes matched multiple tags indicating alternative RNA transcripts. The main enriched gene ontology groups belong to Ca2+ signaling/homeostasis but also muscle contraction, cytoskeleton, energy production/homeostasis and tissue remodeling. K-means clustering identified co-expressed transcripts with distinct patterns in response to water [Ca2+] and exposure time. Conclusions The generated transcript expression patterns provide a framework of novel water calcium-responsive genes in the gill during the initial response after transfer to different [Ca2+]. This molecular response entails initial perception of alterations, activation of signaling networks and effectors and suggests active remodeling of cytoskeletal proteins during the initial acclimation process. Genes related to energy production and energy homeostasis are also up-regulated, probably reflecting the increased energetic needs of the acclimation response. This study is the first genome-wide transcriptome analysis of fish gills and is an important resource for future research on the short-term mechanisms involved in the gill acclimation responses to environmental Ca2+ changes and osmoregulation.Peer Reviewe

    EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>ESTs and full-length cDNAs represent an invaluable source of evidence for inferring reliable gene structures and discovering potential alternative splicing events. In newly sequenced genomes, these tasks may not be practicable owing to the lack of appropriate training sets. However, when expression data are available, they can be used to build EST clusters related to specific genomic transcribed <it>loci</it>. Common strategies recently employed to this end are based on sequence similarity between transcripts and can lead, in specific conditions, to inconsistent and erroneous clustering. In order to improve the cluster building and facilitate all downstream annotation analyses, we developed a simple genome-based methodology to generate gene-oriented clusters of ESTs when a genomic sequence and a pool of related expressed sequences are provided. Our procedure has been implemented in the software EasyCluster and takes into account the spliced nature of ESTs after an <it>ad hoc </it>genomic mapping.</p> <p>Methods</p> <p>EasyCluster uses the well-known GMAP program in order to perform a very quick EST-to-genome mapping in addition to the detection of reliable splice sites. Given a genomic sequence and a pool of ESTs/FL-cDNAs, EasyCluster starts building genomic and EST local databases and runs GMAP. Subsequently, it parses results creating an initial collection of pseudo-clusters by grouping ESTs according to the overlap of their genomic coordinates on the same strand. In the final step, EasyCluster refines the clustering by again running GMAP on each pseudo-cluster and groups together ESTs sharing at least one splice site.</p> <p>Results</p> <p>The higher accuracy of EasyCluster with respect to other clustering tools has been verified by means of a manually cured benchmark of human EST clusters. Additional datasets including the Unigene cluster Hs.122986 and ESTs related to the human <it>HOXA </it>gene family have also been used to demonstrate the better clustering capability of EasyCluster over current genome-based web service tools such as ASmodeler and BIPASS. EasyCluster has also been used to provide a first compilation of gene-oriented clusters in the <it>Ricinus communis </it>oilseed plant for which no Unigene clusters are yet available, as well as an evaluation of the alternative splicing in this plant species.</p

    EGassembler: online bioinformatics service for large-scale processing, clustering and assembling ESTs and genomic DNA fragments

    Get PDF
    Expressed sequence tag (EST) sequencing has proven to be an economically feasible alternative for gene discovery in species lacking a draft genome sequence. Ongoing large-scale EST sequencing projects feel the need for bioinformatics tools to facilitate uniform EST handling. This brings about a renewed importance for a universal tool for processing and functional annotation of large sets of ESTs. EGassembler () is a web server, which provides an automated as well as a user-customized analysis tool for cleaning, repeat masking, vector trimming, organelle masking, clustering and assembling of ESTs and genomic fragments. The web server is publicly available and provides the community a unique all-in-one online application web service for large-scale ESTs and genomic DNA clustering and assembling. Running on a Sun Fire 15K supercomputer, a significantly large volume of data can be processed in a short period of time. The results can be used to functionally annotate genes, to facilitate splice alignment analysis, to link the transcripts to genetic and physical maps, design microarray chips, to perform transcriptome analysis and to map to KEGG metabolic pathways. The service provides an excellent bioinformatics tool to research groups in wet-lab as well as an all-in-one-tool for sequence handling to bioinformatics researchers

    Parallelisation of EST clustering

    Get PDF
    Master of Science - ScienceThe field of bioinformatics has been developing steadily, with computational problems related to biology taking on an increased importance as further advances are sought. The large data sets involved in problems within computational biology have dictated a search for good, fast approximations to computationally complex problems. This research aims to improve a method used to discover and understand genes, which are small subsequences of DNA. A difficulty arises because genes contain parts we know to be functional and other parts we assume are non-functional as there functions have not been determined. Isolating the functional parts requires the use of natural biological processes which perform this separation. However, these processes cannot read long sequences, forcing biologists to break a long sequence into a large number of small sequences, then reading these. This creates the computational difficulty of categorizing the short fragments according to gene membership. Expressed Sequence Tag Clustering is a technique used to facilitate the identification of expressed genes by grouping together similar fragments with the assumption that they belong to the same gene. The aim of this research was to investigate the usefulness of distributed memory parallelisation for the Expressed Sequence Tag Clustering problem. This was investigated empirically, with a distributed system tested for speed against a sequential one. It was found that distributed memory parallelisation can be very effective in this domain. The results showed a super-linear speedup for up to 100 processors, with higher numbers not tested, and likely to produce further speedups. The system was able to cluster 500000 ESTs in 641 minutes using 101 processors

    Large-scale methods in computational genomics

    Get PDF
    The explosive growth in biological sequence data coupled with the design and deployment of increasingly high throughput sequencing technologies has created a need for methods capable of processing large-scale sequence data in a time and cost effective manner. In this dissertation, we address this need through the development of faster algorithms, space-efficient methods, and high-performance parallel computing techniques for some key problems in computational genomics;The first problem addressed is the clustering of DNA sequences based on a measure of sequence similarity. Our clustering method: (i) guarantees linear space complexity, in contrast to the quadratic memory requirements of previously developed methods; (ii) identifies sequence pairs containing long maximal matches in the decreasing order of their maximal match lengths in run-time proportional to the sum of input and output sizes; (iii) provides heuristics to significantly reduce the number of pairs evaluated for checking sequence similarity without affecting quality; and (iv) has parallel strategies that provide linear speedup and a proportionate reduction in space per processor. Our approach has significantly enhanced the problem size reach while also drastically reducing the time to solution;The next problem we address is the de novo detection of genomic repeats called Long Terminal Repeat (LTR) retrotransposons. Our algorithm guarantees linear space complexity and produces high quality candidates for prediction in run-time proportional to the sum of input and output sizes. Validation of our approach on the yeast genome demonstrates both superior quality and performance results when compared to previously developed software;In a genome assembly project, fragments sequenced from a target genome are computationally assembled into numerous supersequences called contigs , which are then ordered and oriented into scaffolds . In this dissertation, we introduce a new problem called retroscaffolding for scaffolding contigs based on the knowledge of their LTR retrotransposon content. Through identification of sequencing gaps that span LTR retrotransposons, retroscaffolding provides a mechanism for prioritizing sequencing gaps for finishing purposes;While most of the problems addressed here have been studied previously, the main contribution in this dissertation is the development of methods that can scale to the largest available sequence collections

    CLU: A new algorithm for EST clustering

    Get PDF
    BACKGROUND: The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. RESULTS: We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats. CONCLUSION: CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded fro

    Simcluster: clustering enumeration gene expression data on the simplex space

    Get PDF
    Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST &#x22;digital northern&#x22;, are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space.&#xd;&#xa;&#xd;&#xa;Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster.&#xd;&#xa;&#xd;&#xa;Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data

    Computational studies with ESTs: assembly, SNP detection, and applications in alternative splicing

    Get PDF
    EST sequences are important in functional genomics studies. To better use available EST resources, clustering and assembling are crucial techniques. For EST sequences with deep coverage, no current assembly program can handle them well. We describe a deep assembly program named DA. The program keeps the number of differences in each contig alignment under control by making corrections to differences that are likely due to sequencing errors. Experimental results on the 115 clusters from the UniGene database show that DA can handle data sets of deep coverage efficiently. A comparison of the DA consensus sequences with the finished human and mouse genomes indicates that the consensus sequences are of acceptable quality;EST sequences can be used in SNP discovery. We describe a computational method for finding common SNPs with allele frequencies in single-pass sequences of deep coverage. The method enhances a widely used program named PolyBayes in several aspects. We present results from our method and PolyBayes on eighteen data sets of human expressed sequence tags (ESTs) with deep coverage. The results indicate that our method used almost all single-pass sequences in computation of the allele frequencies of SNPs;EST sequences can also be used to study alternative splicing (AS), which is the most common post transcription event in metazoans. We first developed a pipeline to identify AS forms by comparing alignments between expressed sequences and genomic sequences. Then we studied the relationship between AS and gene duplication. We observed that duplicate genes have fewer AS forms than single-copy genes; we also found that the loss of alternative splicing in duplicate genes may occur shortly after the gene duplication. Further analysis of the alternative splicing distribution in human duplicate pairs showed the asymmetric evolution of alternative splicing after gene duplications. We also compared AS among six species. We found significant differences on both AS rates and splice forms per gene among the studied species by detailed and categorized studies. The difference in AS rate between rice and Arabidopsis is significant enough to lead to a difference in protein diversity between those two species

    High-throughput gene discovery in the rat

    Get PDF
    The rat is an important animal model for human diseases and is widely used in physiology. In this article we present a new strategy for gene discovery based on the production of ESTs from serially subtracted and normalized cDNA libraries, and we describe its application for the development of a comprehensive nonredundant collection of rat ESTs. Our new strategy appears to yield substantially more EST clusters per ESTs sequenced than do previous approaches that did not use serial subtraction. However, multiple rounds of library subtraction resulted in high frequencies of otherwise rare internally primed cDNAs, defining the limits of this powerful approach. To date, we have generated >200,000 3′ ESTs from >100 cDNA libraries representing a wide range of tissues and developmental stages of the laboratory rat. Most importantly, we have contributed to ∼50,000 rat UniGene clusters. We have identified, arrayed, and derived 5′ ESTs from >30,000 unique rat cDNA clones. Complete information, including radiation hybrid mapping data, is also maintained locally at http://genome.uiowa.edu/clcg.html. All of the sequences described in this article have been submitted to the dbEST division of the NCBI

    Deep and comparative analysis of the mycelium and appressorium transcriptomes of Magnaporthe grisea using MPSS, RL-SAGE, and oligoarray methods

    Get PDF
    BACKGROUND: Rice blast, caused by the fungal pathogen Magnaporthe grisea, is a devastating disease causing tremendous yield loss in rice production. The public availability of the complete genome sequence of M. grisea provides ample opportunities to understand the molecular mechanism of its pathogenesis on rice plants at the transcriptome level. To identify all the expressed genes encoded in the fungal genome, we have analyzed the mycelium and appressorium transcriptomes using massively parallel signature sequencing (MPSS), robust-long serial analysis of gene expression (RL-SAGE) and oligoarray methods. RESULTS: The MPSS analyses identified 12,531 and 12,927 distinct significant tags from mycelia and appressoria, respectively, while the RL-SAGE analysis identified 16,580 distinct significant tags from the mycelial library. When matching these 12,531 mycelial and 12,927 appressorial significant tags to the annotated CDS, 500 bp upstream and 500 bp downstream of CDS, 6,735 unique genes in mycelia and 7,686 unique genes in appressoria were identified. A total of 7,135 mycelium-specific and 7,531 appressorium-specific significant MPSS tags were identified, which correspond to 2,088 and 1,784 annotated genes, respectively, when matching to the same set of reference sequences. Nearly 85% of the significant MPSS tags from mycelia and appressoria and 65% of the significant tags from the RL-SAGE mycelium library matched to the M. grisea genome. MPSS and RL-SAGE methods supported the expression of more than 9,000 genes, representing over 80% of the predicted genes in M. grisea. About 40% of the MPSS tags and 55% of the RL-SAGE tags represent novel transcripts since they had no matches in the existing M. grisea EST collections. Over 19% of the annotated genes were found to produce both sense and antisense tags in the protein-coding region. The oligoarray analysis identified the expression of 3,793 mycelium-specific and 4,652 appressorium-specific genes. A total of 2,430 mycelial genes and 1,886 appressorial genes were identified by both MPSS and oligoarray. CONCLUSION: The comprehensive and deep transcriptome analysis by MPSS and RL-SAGE methods identified many novel sense and antisense transcripts in the M. grisea genome at two important growth stages. The differentially expressed transcripts that were identified, especially those specifically expressed in appressoria, represent a genomic resource useful for gaining a better understanding of the molecular basis of M. grisea pathogenicity. Further analysis of the novel antisense transcripts will provide new insights into the regulation and function of these genes in fungal growth, development and pathogenesis in the host plants
    corecore