238 research outputs found

    Improving interoperability between microbial information and sequence databases

    Get PDF
    BACKGROUND: Biological resources are essential tools for biomedical research. Their availability is promoted through on-line catalogues. Common Access to Biological Resources and Information (CABRI) is a service for distribution of biological resources and related data collected by 28 European culture collections. Linking this information to bioinformatics databanks can make the collections' holdings more visible after a search in molecular biology databanks and vice-versa. Identification of links to sequence databases can be useful, but annotation and indexing problems, together with compilation errors, immediately arise. In this paper, we present our efforts for the identification of cross-references between CABRI catalogues and the EMBL Data Library and related results. RESULTS: An SRS site with both EMBL and CABRI catalogues has been set up. Ad-hoc changes in indexing scripts allowed to achieve homogeneous index keys and SRS link features have been used to identify links between databases. After manual checking and comparison with an alternative procedure, about 67,500 valid cross-references were identified, added to the EMBL Data Library and are now distributed with it. HTML links can be established from EMBL to CABRI network service. Procedures can be executed whenever needed. CONCLUSION: Links between EMBL and CABRI catalogues constitute an improved access to micro-organisms of certified quality and can produce positive effects on biomedical research. Further links between CABRI catalogues and other bioinformatics databases can now easily be defined by using these cross-references. Linking genetic information onto natural resources information may stand model for the integration of other databases containing empirical data on these materials

    Primary structure of rat pancreatic lipase mRNA

    Get PDF
    AbstractThe sequence of a rat pancreatic lipase mRNA was determined. The data have been assigned the following accession number, X61925, in the EMBL data library. The total length of the messenger is 1531 nucleotides, plus a poly(A) stretch of about 60 nucleotides. A 72-nucleotides 5′-noncoding region is followed by a 1419-nucleotides open reading frame which encodes a protein of 473 amino acids, including the 17 amino acid signal peptide. The mature enzyme (456 residues) has 6 additional C-terminal amino acids, as compared with the amino acid sequence of pig (direct amino acid sequence), dog, man and rat isoenzyme from Genbank, M58369 (all deduced from the nucleotide sequence). A higher degree of homology exists between the amino acid sequence of rat mature enzyme with those of dog (88%), pig (75%) and man (75%) than with that of rat isolipase (74%)

    GenBank

    Get PDF
    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage ()

    Sequence of a putative human housekeeping gene (HK33) localized on chromosome 1

    Get PDF
    A gene (X33) localized on human chromosome 1 has been detected by crossreaction of its fusion protein with a monospecific antiserum directed against human vitamin-D-binding protein (hDBP; group-specific component). Its cDNA sequence analysis showed no evident homologies neither to the sequence encoding hDBP nor to any other sequence. The largest cDNA clone of 3.2 kb includes a 897-bp coding region and a large 3’ untranslated region with at least four polyadenylation sites. Further cDNA amplification using PCR demonstrated a total cDNA length of approx. 3.7 kb. Northern blot analysis revealed signals at about 2.2-2.5 kb and 4.0 kb, the shorter transcripts representing mRNAs using one of the two polyadenylation sites at about 2.0 kb. Synthesis of the 299-amino-acid polypeptide (33 kDa) in the bacterial host, with subsequent Western blot analysis, verified the sequence-specific recognition by the hDBPspecific antiserum. The search of protein databanks revealed no homology of HK33 to any known sequence. Since the gene is transcribed in all cells and tissues tested so far, it is a strong candidate for another housekeeping gene

    A compression mechanism for sequence databases to improve the efficiency of conventional tools

    Get PDF
    This paper describes a method to compress molecular biology databases that are characterized by an increasing proportion of data derived from genome projects. The performance of our tool has been tested on various data files of the EMBL nucleotide sequence database. The best compression ratios were achieved on EST (Expressed Sequence Tags) data, typically derived from large-scale sequence projects. The compression of sequence database updates was tested in combination with the common Unix compression program ‘compress'. Our tool improved the efficiency of ‘compress' on average by 16

    RNA ligands selected by cleavage stimulation factor contain distinct sequence motifs that function as downstream elements in 3'-end processing of pre-mRNA

    Get PDF
    Critical events in 3'-end processing of pre-mRNA are the recognition of the AAUAAA polyadenylation signal by cleavage and polyadenylation specificity factor (CPSF) and the binding of cleavage stimulation factor (CstF) via its 64-kDa subunit to the downstream element. The stability of this CPSF.CstF.RNA complex is thought to determine the efficiency of 3'-end processing. Since downstream elements reveal high sequence variability, in vitro selection experiments with highly purified CstF were performed to investigate the sequence requirements for CstF-RNA interaction. CstF was purified from calf thymus and from HeLa cells. Surprisingly, calf thymus CstF contained an additional, novel form of the 64-kDa subunit with a molecular mass of 70 kDa. RNA ligands selected by HeLa and calf thymus CstF contained three highly conserved sequence elements as follows: element 1 (AUGCGUUCCUCGUCC) and two closely related elements, element 2a (YGUGUYN0-4UUYAYUGYGU) and element 2b (UUGYUN0-4AUUUACU(U/G)N0-2YCU). All selected sequences tested functioned as downstream elements in 3'-end processing in vitro. A computer survey of the EMBL data library revealed significant homologies to all selected elements in naturally occurring 3'-untranslated regions. The majority of element 2a homologies was found downstream of coding sequences. Therefore, we postulate that this element represents a novel consensus sequence for downstream elements in 3'-end processing of pre-mRNA
    • …
    corecore