49 research outputs found
miROrtho: computational survey of microRNA genes
MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature βΌ22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige.ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary suppor
OrthoDB: the hierarchical catalog of eukaryotic orthologs
The concept of orthology is widely used to relate genes across different species using comparative genomics, and it provides the basis for inferring gene function. Here we present the web accessible OrthoDB database that catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species phylogeny, from more general groups to more fine-grained delineations between closely related species. We used a COG-like and Inparanoid-like ortholog delineation procedure on the basis of all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes, focusing on vertebrates, insects and fungi to facilitate further comparative studies. The database is freely available at http://cegg.unige.ch/orthod
miROrtho: computational survey of microRNA genes
MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature βΌ22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige.ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support
Partitioning clustering algorithms for protein sequence data sets
<p>Abstract</p> <p>Background</p> <p>Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods.</p> <p>Methods</p> <p>We developed four partitioning clustering approaches using Smith-Waterman local-alignment algorithm to determine pair-wise similarities of sequences. Four different sets of protein sequences were used as evaluation data sets for the proposed methods.</p> <p>Results</p> <p>We show that these methods outperform several other published clustering methods in terms of correctly predicting a classifier and especially in terms of the correctness of the provided prediction. The software is available to academic users from the authors upon request.</p
Insights into corn genes derived from large-scale cDNA sequencing
We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701βEU977132 (FLI cDNA) and FK944382-FL482108 (EST)
Allelic Gene Structure Variations in Anopheles gambiae Mosquitoes
Allelic gene structure variations and alternative splicing are responsible for transcript structure variations. More than 75% of human genes have structural isoforms of transcripts, but to date few studies have been conducted to verify the alternative splicing systematically.The present study used expressed sequence tags (ESTs) and EST tagged SNP patterns to examine the transcript structure variations resulting from allelic gene structure variations in the major human malaria vector, Anopheles gambiae. About 80% of 236,004 available A. gambiae ESTs were successfully aligned to A. gambiae reference genomes. More than 2,340 transcript structure variation events were detected. Because the current A. gambiae annotation is incomplete, we re-annotated the A. gambiae genome with an A. gambiae-specific gene model so that the effect of variations on gene coding could be better evaluated. A total of 15,962 genes were predicted. Among them, 3,873 were novel genes and 12,089 were previously identified genes. The gene completion rate improved from 60% to 84%. Based on EST support, 82.5% of gene structures were predicted correctly. In light of the new annotation, we found that approximately 78% of transcript structure variations were located within the coding sequence (CDS) regions, and >65% of variations in the CDS regions have the same open-reading-frame. The association between transcript structure isoforms and SNPs indicated that more than 28% of transcript structure variation events were contributed by different gene alleles in A. gambiae.We successfully expanded the A. gambiae genome annotation. We predicted and analyzed transcript structure variations in A. gambiae and found that allelic gene structure variation plays a major role in transcript diversity in this important human malaria vector
Semaphorin-1a Is Required for Aedes aegypti Embryonic Nerve Cord Development
Although mosquito genome projects have uncovered orthologues of many known developmental regulatory genes, extremely little is known about mosquito development. In this study, the role of semaphorin-1a (sema1a) was investigated during vector mosquito embryonic ventral nerve cord development. Expression of sema1a and the plexin A (plexA) receptor are detected in the embryonic ventral nerve cords of Aedes aegypti (dengue vector) and Anopheles gambiae (malaria vector), suggesting that Sema1a signaling may regulate mosquito nervous system development. Analysis of sema1a function was investigated through siRNA-mediated knockdown in A. aegypti embryos. Knockdown of sema1a during A. aegypti development results in a number of nerve cord phenotypes, including thinning, breakage, and occasional fusion of the longitudinal connectives, thin or absent commissures, and general distortion of the nerve cord. Although analysis of Drosophila melanogaster sema1a loss-of-function mutants uncovered many similar phenotypes, aspects of the longitudinal phenotypes differed between D. melanogaster and A. aegypti. The results of this investigation suggest that Sema1a is required for development of the insect ventral nerve cord, but that the developmental roles of this guidance molecule have diverged in dipteran insects
Expansion of the human ΞΌ-opioid receptor gene architecture: novel functional variants
The ΞΌ-opioid receptor (OPRM1) is the principal receptor target for both endogenous and exogenous opioid analgesics. There are substantial individual differences in human responses to painful stimuli and to opiate drugs that are attributed to genetic variations in OPRM1. In searching for new functional variants, we employed comparative genome analysis and obtained evidence for the existence of an expanded human OPRM1 gene locus with new promoters, alternative exons and regulatory elements. Examination of polymorphisms within the human OPRM1 gene locus identified strong association between single nucleotide polymorphism (SNP) rs563649 and individual variations in pain perception. SNP rs563649 is located within a structurally conserved internal ribosome entry site (IRES) in the 5β²-UTR of a novel exon 13-containing OPRM1 isoforms (MOR-1K) and affects both mRNA levels and translation efficiency of these variants. Furthermore, rs563649 exhibits very strong linkage disequilibrium throughout the entire OPRM1 gene locus and thus affects the functional contribution of the corresponding haplotype that includes other functional OPRM1 SNPs. Our results provide evidence for an essential role for MOR-1K isoforms in nociceptive signaling and suggest that genetic variations in alternative OPRM1 isoforms may contribute to individual differences in opiate responses
Unity in defence: honeybee workers exhibit conserved molecular responses to diverse pathogens
This is the final version of the article. Available from the publisher via the DOI in this record.Background: Organisms typically face infection by diverse pathogens, and hosts are thought to have developed specific responses to each type of pathogen they encounter. The advent of transcriptomics now makes it possible to test this hypothesis and compare host gene expression responses to multiple pathogens at a genome-wide scale. Here, we performed a meta-analysis of multiple published and new transcriptomes using a newly developed bioinformatics approach that filters genes based on their expression profile across datasets. Thereby, we identified common and unique molecular responses of a model host species, the honey bee (Apis mellifera), to its major pathogens and parasites: the Microsporidia Nosema apis and Nosema ceranae, RNA viruses, and the ectoparasitic mite Varroa destructor, which transmits viruses.
Results:
We identified a common suite of genes and conserved molecular pathways that respond to all investigated pathogens, a result that suggests a commonality in response mechanisms to diverse pathogens. We found that genes differentially expressed after infection exhibit a higher evolutionary rate than non-differentially expressed genes. Using our new bioinformatics approach, we unveiled additional pathogen-specific responses of honey bees; we found that apoptosis appeared to be an important response following microsporidian infection, while genes from the immune signalling pathways, Toll and Imd, were differentially expressed after Varroa/virus infection. Finally, we applied our bioinformatics approach and generated a gene co-expression network to identify highly connected (hub) genes that may represent important mediators and regulators of anti-pathogen responses.
Conclusions:
Our meta-analysis generated a comprehensive overview of the host metabolic and other biological processes that mediate interactions between insects and their pathogens. We identified key host genes and pathways that respond to phylogenetically diverse pathogens, representing an important source for future functional studies as well as offering new routes to identify or generate pathogen resilient honey bee stocks. The statistical and bioinformatics approaches that were developed for this study are broadly applicable to synthesize information across transcriptomic datasets. These approaches will likely have utility in addressing a variety of biological questions.This article is a joint effort of the working group TRANSBEE and an
outcome of two workshops kindly supported by sDiv, the Synthesis
Centre for Biodiversity Sciences within the German Centre for Integrative
Biodiversity Research (iDiv) Halle-Jena-Leipzig, funded by the German Science
Foundation (FZT 118). New datasets were performed thanks to the Insect
Pollinators Initiative (IPI grant BB/I000100/1 and BB/I000151/1), with participation
of the UK-USA exchange funded by the BBSRC BB/I025220/1 (datasets #4,
11 and 14). The IPI is funded jointly by the Biotechnology and Biological
Sciences Research Council, the Department for Environment, Food and Rural
Affairs, the Natural Environment Research Council, the Scottish Government
and the Wellcome Trust, under the Living with Environmental Change
Partnershi
Comparative Genomic Analysis of Drosophila melanogaster and Vector Mosquito Developmental Genes
Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1) are components of developmental signaling pathways, 2) regulate fundamental developmental processes, 3) are critical for the development of tissues of vector importance, 4) function in developmental processes known to have diverged within insects, and 5) encode microRNAs (miRNAs) that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments