43 research outputs found

    A new bioinformatics analysis tools framework at EMBL–EBI

    Get PDF
    The EMBL-EBI provides access to various mainstream sequence analysis applications. These include sequence similarity search services such as BLAST, FASTA, InterProScan and multiple sequence alignment tools such as ClustalW, T-Coffee and MUSCLE. Through the sequence similarity search services, the users can search mainstream sequence databases such as EMBL-Bank and UniProt, and more than 2000 completed genomes and proteomes. We present here a new framework aimed at both novice as well as expert users that exposes novel methods of obtaining annotations and visualizing sequence analysis results through one uniform and consistent interface. These services are available over the web and via Web Services interfaces for users who require systematic access or want to interface with customized pipe-lines and workflows using common programming languages. The framework features novel result visualizations and integration of domain and functional predictions for protein database searches. It is available at http://www.ebi.ac.uk/Tools/sss for sequence similarity searches and at http://www.ebi.ac.uk/Tools/msa for multiple sequence alignments

    Uncovering hidden biodiversity in the Cryptophyta: New picoplanktonic clades from clone library studies at the Helgoland time series site in the southern German Bight.

    Get PDF
    Cryptophyceae are important group in marine phytoplankton, but little is known about the occurrence and distribution of individual species. Recently, with use of molecular probes and microarray technology, it has been shown that species related to Teleaulax spp. or Chroomonas spp. (clades 4 and 6) contributed most to cryptophyceam biomass in the North Sea. The probe for clades 4 and 6 cannot separate them and the single probe recognises members of both clades. Here, we increase the genetic diversity of our investigations of cryptophycean diversity in the North Sea by sequencing 18S rRNA clone libraries made from fractionated water samples to examine specifically the picoplanktonic fraction and to determine whether clade 4 or 6 were the dominant cyrptophytes. We focused on samples from the spring phytoplankton bloom in 2004 because the microarray signals were the strongest at this time. Excluding chimeric sequences, we detected nine cryptophycean OTUs, seven of which fell into the Teleaulax/ Plagioselmis branch, whereas two grouped with Geminigera spp. Our results indicate that these OTUs, affiliated with clade 4, may be an important component of cryptophyte community during spring bloom in the North Sea

    The EMBL Nucleotide Sequence Database

    Get PDF
    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data

    Analysis of the Human Kinome Using Methods Including Fold Recognition Reveals Two Novel Kinases

    Get PDF
    Background: Protein sequence similarity is a commonly used criterion for inferring the unknown function of a protein from a protein of known function. However, proteins can diverge significantly over time such that sequence similarity is difficult, if not impossible, to find. In some cases, a structural similarity remains over long evolutionary time scales and once detected can be used to predict function. Methodology/Principal Findings: Here we employed a high-throughput approach to assign structural and functional annotation to the human proteome, focusing on the collection of human protein kinases, the human kinome. We compared human protein sequences to a library of domains from known structures using WU-BLAST, PSI-BLAST, and 123D. This approach utilized both sequence comparison and fold recognition methods. The resulting set of potential protein kinases was cross-checked against previously identified human protein kinases, and analyzed for conserved kinase motifs. Conclusions/Significance: We demonstrate that our structure-based method can be used to identify both typical and atypical human protein kinases. We also identify two potentially novel kinases that contain an interesting combination o

    RiboSubstrates: a web application addressing the cleavage specificities of ribozymes in designated genomes

    Get PDF
    BACKGROUND: RNA-dependent gene silencing is becoming a routine tool used in laboratories worldwide. One of the important remaining hurdles in the selection of the target sequence, if not the most important one, is the designing of tools that have minimal off-target effects (i.e. cleaves only the desired sequence). Increasingly, in the current dawn of the post-genomic era, there is a heavy reliance on tools that are suitable for high-throughput functional genomics, consequently more and more bioinformatic software is becoming available. However, to date none have been designed to satisfy the ever-increasing need for the accurate selection of targets for a specific silencing reagent. RESULTS: In order to overcome this hurdle we have developed RiboSubstrates . This integrated bioinformatic software permits the searching of a cDNA database for all potential substrates for a given ribozyme. This includes the mRNAs that perfectly match the specific requirements of a given ribozyme, as well those including Wobble base pairs and mismatches. The results generated allow rapid selection of sequences suitable as targets for RNA degradation. The current web-based RiboSubstrates version permits the identification of potential gene targets for both SOFA-HDV ribozymes and for hammerhead ribozymes. Moreover, a minimal template for the search of siRNAs is also available. This flexible and reliable tool is easily adaptable for use with any RNA tool (i.e. other ribozymes, deoxyribozymes and antisense), and may use the information present in any cDNA bank. CONCLUSION: RiboSubstrates should become an essential step for all, even including "non-RNA biologists", who endeavor to develop a gene-inactivation system

    WormBase: a comprehensive data resource for Caenorhabditis biology and genomics

    Get PDF
    WormBase (http://www.wormbase.org), the model organism database for information about Caenorhabditis elegans and related nematodes, continues to expand in breadth and depth. Over the past year, WormBase has added multiple large-scale datasets including SAGE, interactome, 3D protein structure datasets and NCBI KOGs. To accommodate this growth, the International WormBase Consortium has improved the user interface by adding new features to aid in navigation, visualization of large-scale datasets, advanced searching and data mining. Internally, we have restructured the database models to rationalize the representation of genes and to prepare the system to accept the genome sequences of three additional Caenorhabditis species over the coming year

    PBmice: an integrated database system of piggyBac (PB) insertional mutations and their characterizations in mice

    Get PDF
    DNA transposon piggyBac (PB) is a newly established mutagen for large-scale mutagenesis in mice. We have designed and implemented an integrated database system called PBmice (PB Mutagenesis Information CEnter) for storing, retrieving and displaying the information derived from PB insertions (INSERTs) in the mouse genome. This system is centered on INSERTs with information including their genomic locations and flanking genomic sequences, the expression levels of the hit genes, and the expression patterns of the trapped genes if a trapping vector was used. It also archives mouse phenotyping data linked to INSERTs, and allows users to conduct quick and advanced searches for genotypic and phenotypic information relevant to a particular or a set of INSERT(s). Sequence-based information can be cross-referenced with other genomic databases such as Ensembl, BLAST and GBrowse tools used in PBmice offer enhanced search and display for additional information relevant to INSERTs. The total number and genomic distribution of PB INSERTs, as well as the availability of each PB insertional LINE can also be viewed with user-friendly interfaces. PBmice is freely available at http://www.idmshanghai.cn/PBmice or http://www.scbit.org/PBmice/

    Comparative genomics of the syndecans defines an ancestral genomic context associated with matrilins in vertebrates

    Get PDF
    BACKGROUND: The syndecans are the major family of transmembrane proteoglycans in animals and are known for multiple roles in cell interactions and growth factor signalling during development, inflammatory response, wound-repair and tumorigenesis. Although syndecans have been cloned from several invertebrate and vertebrate species, the extent of conservation of the family across the animal kingdom is unknown and there are gaps in our knowledge of chordate syndecans. Here, we develop a new level of knowledge for the whole syndecan family, by combining molecular phylogeny of syndecan protein sequences with analysis of the genomic contexts of syndecan genes in multiple vertebrate organisms. RESULTS: We identified syndecan-encoding sequences in representative Cnidaria and throughout the Bilateria. The C1 and C2 regions of the cytoplasmic domain are highly conserved throughout the animal kingdom. We identified in the variable region a universally-conserved leucine residue and a tyrosine residue that is conserved throughout the Bilateria. Of all the genomes examined, only tetrapod and fish genomes encode multiple syndecans. No syndecan-1 was identified in fish. The genomic context of each vertebrate syndecan gene is syntenic between human, mouse and chicken, and this conservation clearly extends to syndecan-2 and -3 in T. nigroviridis. In addition, tetrapod syndecans were found to be encoded from paralogous chromosomal regions that also contain the four members of the matrilin family. Whereas the matrilin-3 and syndecan-1 genes are adjacent in tetrapods, this chromosomal region appears to have undergone extensive lineage-specific rearrangements in fish. CONCLUSION: Throughout the animal kingdom, syndecan extracellular domains have undergone rapid change and elements of the cytoplasmic domains have been very conserved. The four syndecan genes of vertebrates are syntenic across tetrapods, and synteny of the syndecan-2 and -3 genes is apparent between tetrapods and fish. In vertebrates, each of the four family members are encoded from paralogous genomic regions in which members of the matrilin family are also syntenic between tetrapods and fish. This genomic organization appears to have been set up after the divergence of urochordates (Ciona) and vertebrates. The syndecan-1 gene appears to have been lost relatively early in the fish lineage. These conclusions provide the basis for a new model of syndecan evolution in vertebrates and a new perspective for analyzing the roles of syndecans in cells and whole organisms
    corecore