226 research outputs found

    CLU: A new algorithm for EST clustering

    Get PDF
    BACKGROUND: The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. RESULTS: We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats. CONCLUSION: CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded fro

    phorest: a web-based tool for comparative analyses of expressed sequence tag data

    Get PDF
    Comparative analysis of expressed sequence tags is becoming an important tool in molecular ecology for comparing gene expression in organisms grown in certain environments. Additionally, expressed sequence tag database information can be used for the construction of DNA microarrays and for the detection of single nucleotide polymorphisms. For such applications, we present PHOREST, a web-based tool for managing, analysing and comparing various collections of expressed sequence tags. It is written in PHP (PHP: Hypertext Preprocessor) and runs on UNIX, Microsoft Windows and Macintosh (Mac OS X) platforms

    annot8r: GO, EC and KEGG annotation of EST datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways.</p> <p>Results</p> <p>annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools.</p> <p>Conclusion</p> <p>annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.</p

    Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

    Get PDF
    BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. RESULTS: We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. CONCLUSION: TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms

    Analysis of multiplex gene expression maps obtained by voxelation

    Get PDF
    BackgroundGene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions.ResultsTo analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in cortex and corpus callosum.ConclusionThe experimental results confirm the hypothesis that genes with similar gene expression maps might have similar gene functions. The voxelation data takes into account the location information of gene expression level in mouse brain, which is novel in related research. The proposed approach can potentially be used to predict gene functions and provide helpful suggestions to biologists

    Evolutionary History of the HAP2/GCS1 Gene and Sexual Reproduction in Metazoans

    Get PDF
    The HAP2/GCS1 gene first appeared in the common ancestor of plants, animals, and protists, and is required in the male gamete for fusion to the female gamete in the unicellular organisms Chlamydomonas and Plasmodium. We have identified a HAP2/GCS1 gene in the genome sequence of the sponge Amphimedon queenslandica. This finding provides a continuous evolutionary history of HAP2/GCS1 from unicellular organisms into the metazoan lineage. Divergent versions of the HAP2/GCS1 gene are also present in the genomes of some but not all arthropods. By examining the expression of the HAP2/GCS1 gene in the cnidarian Hydra, we have found the first evidence supporting the hypothesis that HAP2/GCS1 was used for male gamete fusion in the ancestor of extant metazoans and that it retains that function in modern cnidarians

    SILAC-based proteomic quantification of chemoattractant-induced cytoskeleton dynamics on a second to minute timescale

    Get PDF
    Cytoskeletal dynamics during cell behaviours ranging from endocytosis and exocytosis to cell division and movement is controlled by a complex network of signalling pathways, the full details of which are as yet unresolved. Here we show that SILAC-based proteomic methods can be used to characterize the rapid chemoattractant-induced dynamic changes in the actin–myosin cytoskeleton and regulatory elements on a proteome-wide scale with a second to minute timescale resolution. This approach provides novel insights in the ensemble kinetics of key cytoskeletal constituents and association of known and novel identified binding proteins. We validate the proteomic data by detailed microscopy-based analysis of in vivo translocation dynamics for key signalling factors. This rapid large-scale proteomic approach may be applied to other situations where highly dynamic changes in complex cellular compartments are expected to play a key role

    The vertebrate phylotypic stage and an early bilaterian-related stage in mouse embryogenesis defined by genomic information

    Get PDF
    BACKGROUND: Embryos of taxonomically different vertebrates are thought to pass through a stage in which they resemble one another morphologically. This "vertebrate phylotypic stage" may represent the basic vertebrate body plan that was established in the common ancestor of vertebrates. However, much controversy remains about when the phylotypic stage appears, and whether it even exists. To overcome the limitations of studies based on morphological comparison, we explored a comprehensive quantitative method for defining the constrained stage using expressed sequence tag (EST) data, gene ontologies (GO), and available genomes of various animals. If strong developmental constraints occur during the phylotypic stage of vertebrate embryos, then genes conserved among vertebrates would be highly expressed at this stage. RESULTS: We established a novel method for evaluating the ancestral nature of mouse embryonic stages that does not depend on comparative morphology. The numerical "ancestor index" revealed that the mouse indeed has a highly conserved embryonic period at embryonic day 8.0–8.5, the time of appearance of the pharyngeal arch and somites. During this period, the mouse prominently expresses GO-determined developmental genes shared among vertebrates. Similar analyses revealed the existence of a bilaterian-related period, during which GO-determined developmental genes shared among bilaterians are markedly expressed at the cleavage-to-gastrulation period. The genes associated with the phylotypic stage identified by our method are essential in embryogenesis. CONCLUSION: Our results demonstrate that the mid-embryonic stage of the mouse is indeed highly constrained, supporting the existence of the phylotypic stage. Furthermore, this candidate stage is preceded by a putative bilaterian ancestor-related period. These results not only support the developmental hourglass model, but also highlight the hierarchical aspect of embryogenesis proposed by von Baer. Identification of conserved stages and tissues by this method in various animals would be a powerful tool to examine the phylotypic stage hypothesis, and to understand which kinds of developmental events and gene sets are evolutionarily constrained and how they limit the possible variations of animal basic body plans

    BCR and its mutants, the reciprocal t(9;22)-associated ABL/BCR fusion proteins, differentially regulate the cytoskeleton and cell motility

    Get PDF
    BACKGROUND: The reciprocal (9;22) translocation fuses the bcr (breakpoint cluster region) gene on chromosome 22 to the abl (Abelson-leukemia-virus) gene on chromosome 9. Depending on the breakpoint on chromosome 22 (the Philadelphia chromosome – Ph+) the derivative 9+ encodes either the p40((ABL/BCR) )fusion transcript, detectable in about 65% patients suffering from chronic myeloid leukemia, or the p96((ABL/BCR) )fusion transcript, detectable in 100% of Ph+ acute lymphatic leukemia patients. The ABL/BCRs are N-terminally truncated BCR mutants. The fact that BCR contains Rho-GEF and Rac-GAP functions strongly suggest an important role in cytoskeleton modeling by regulating the activity of Rho-like GTPases, such as Rho, Rac and cdc42. We, therefore, compared the function of the ABL/BCR proteins with that of wild-type BCR. METHODS: We investigated the effects of BCR and ABL/BCRs i.) on the activation status of Rho, Rac and cdc42 in GTPase-activation assays; ii.) on the actin cytoskeleton by direct immunofluorescence; and iii) on cell motility by studying migration into a three-dimensional stroma spheroid model, adhesion on an endothelial cell layer under shear stress in a flow chamber model, and chemotaxis and endothelial transmigration in a transwell model with an SDF-1α gradient. RESULTS: Here we show that both ABL/BCRs lost fundamental functional features of BCR regarding the regulation of small Rho-like GTPases with negative consequences on cell motility, in particular on the capacity to adhere to endothelial cells. CONCLUSION: Our data presented here describe for the first time an analysis of the biological function of the reciprocal t(9;22) ABL/BCR fusion proteins in comparison to their physiological counterpart BCR

    Systematic identification of abundant A-to-I editing sites in the human transcriptome

    Full text link
    RNA editing by members of the double-stranded RNA-specific ADAR family leads to site-specific conversion of adenosine to inosine (A-to-I) in precursor messenger RNAs. Editing by ADARs is believed to occur in all metazoa, and is essential for mammalian development. Currently, only a limited number of human ADAR substrates are known, while indirect evidence suggests a substantial fraction of all pre-mRNAs being affected. Here we describe a computational search for ADAR editing sites in the human transcriptome, using millions of available expressed sequences. 12,723 A-to-I editing sites were mapped in 1,637 different genes, with an estimated accuracy of 95%, raising the number of known editing sites by two orders of magnitude. We experimentally validated our method by verifying the occurrence of editing in 26 novel substrates. A-to-I editing in humans primarily occurs in non-coding regions of the RNA, typically in Alu repeats. Analysis of the large set of editing sites indicates the role of editing in controlling dsRNA stability.Comment: Pre-print version. See http://dx.doi.org/10.1038/nbt996 for a reprin
    • …
    corecore