86 research outputs found

    DNA sequencing: bench to bedside and beyond†

    Get PDF
    Fifteen years elapsed between the discovery of the double helix (1953) and the first DNA sequencing (1968). Modern DNA sequencing began in 1977, with development of the chemical method of Maxam and Gilbert and the dideoxy method of Sanger, Nicklen and Coulson, and with the first complete DNA sequence (phage ϕX174), which demonstrated that sequence could give profound insights into genetic organization. Incremental improvements allowed sequencing of molecules >200 kb (human cytomegalovirus) leading to an avalanche of data that demanded computational analysis and spawned the field of bioinformatics. The US Human Genome Project spurred sequencing activity. By 1992 the first ‘sequencing factory’ was established, and others soon followed. The first complete cellular genome sequences, from bacteria, appeared in 1995 and other eubacterial, archaebacterial and eukaryotic genomes were soon sequenced. Competition between the public Human Genome Project and Celera Genomics produced working drafts of the human genome sequence, published in 2001, but refinement and analysis of the human genome sequence will continue for the foreseeable future. New ‘massively parallel’ sequencing methods are greatly increasing sequencing capacity, but further innovations are needed to achieve the ‘thousand dollar genome’ that many feel is prerequisite to personalized genomic medicine. These advances will also allow new approaches to a variety of problems in biology, evolution and the environment

    doi:10.1093/nar/gkm688 DNA sequencing: bench to bedside and beyond y

    Get PDF
    Fifteen years elapsed between the discovery of the double helix (1953) and the first DNA sequencing (1968). Modern DNA sequencing began in 1977, with development of the chemical method of Maxam and Gilbert and the dideoxy method of Sanger, Nicklen and Coulson, and with the first complete DNA sequence (phage rX174), which demonstrated that sequence could give profound insights into genetic organization. Incremental improvements allowed sequencing of molecules>200 kb (human cytomegalovirus) leading to an avalanche of data that demanded computational analysis and spawned the field of bioinformatics. The US Human Genome Project spurred sequencing activity. By 1992 the first ‘sequencing factory ’ was established, and others soon followed. The first complete cellular genome sequences, from bacteria, appeared in 1995 and other eubacterial, archaebacterial and eukaryotic genomes were soon sequenced. Competition between the public Human Genome Project and Celera Genomics produced working drafts of the human genome sequence, published in 2001, but refinement and analysis of the human genome sequence will continue for the foreseeable future. New ‘massively parallel ’ sequencing methods are greatly increasing sequencing capacity, but further innovations are needed to achieve the ‘thousand dollar genome ’ that many feel is prerequisite to personalized genomic medicine. These advances will also allow new approaches to a variety of problems in biology, evolution and the environment

    Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs

    Get PDF
    BACKGROUND: Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. RESULTS: "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. CONCLUSION: Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes

    Mapping phosphoproteins in Mycoplasma genitalium and Mycoplasma pneumoniae

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Little is known regarding the extent or targets of phosphorylation in mycoplasmas, yet in many other bacterial species phosphorylation is known to play an important role in signaling and regulation of cellular processes. To determine the prevalence of phosphorylation in mycoplasmas, we examined the CHAPS-soluble protein fractions of <it>Mycoplasma genitalium </it>and <it>Mycoplasma pneumoniae </it>by two-dimensional gel electrophoresis (2-DE), using a combination of Pro-Q Diamond phosphoprotein stain and <sup>33</sup>P labeling. Protein spots that were positive for phosphorylation were identified by peptide mass fingerprinting using MALDI-TOF-TOF mass spectrometry.</p> <p>Results</p> <p>We identified a total of 24 distinct phosphoproteins, about 3% and 5% of the total protein complement in <it>M. pneumoniae </it>and <it>M. genitalium</it>, respectively, indicating that phosphorylation occurs with prevalence similar to many other bacterial species. Identified phosphoproteins include pyruvate dehydrogenase E1 alpha and beta subunits, enolase, heat shock proteins DnaK and GroEL, elongation factor Tu, cytadherence accessory protein HMW3, P65, and several hypothetical proteins. These proteins are involved in energy metabolism, carbohydrate metabolism, translation/transcription and cytadherence. Interestingly, fourteen of the 24 phosphoproteins we identified (58%) were previously reported as putatively associated with a cytoskeleton-like structure that is present in the mycoplasmas, indicating a potential regulatory role for phosphorylation in this structure.</p> <p>Conclusion</p> <p>This study has shown that phosphorylation in mycoplasmas is comparable to that of other bacterial species. Our evidence supports a link between phosphorylation and cytadherence and/or a cytoskeleton-like structure, since over half of the proteins identified as phosphorylated have been previously associated with these functions. This opens the door to further research into the purposes and mechanisms of phosphorylation for mycoplasmas.</p

    Insertion site specificity of the transposon Tn3

    Get PDF
    The Tn3-deletion method [Davies and Hutchison

    The F-type 5′ motif of mouse L1 elements: a major class of L1 termini similar to the A-type in organization but unrelated in sequence

    Get PDF
    It has previously been shown that the L1 family in the mouse (L1Md) contains two alternative 5' ends called the A- and F-type sequences (1,2). We show here that the F-type element is a major class of murine L1 elements and report on the details of organization of the 5' motif of these F-type elements. Although the A- and F-type 5' sequences share no detectable sequence homology the organization of an F-type 5' end is strikingly similar to that of an A-type. That is, the F-type 5' sequences consist of a tandem array of a small number of 206 bp monomers while the A-type 5' motif consists of a tandem array of 208 bp monomers. All of the A-type elements characterized to date have a truncated monomer at the 5' end of the array. Many of the F-type elements are also terminated at the 5' end by a truncated copy but unlike the A-type elements some F-type elements terminate with a monomer which is within a few nucleotides of being complete. In addition the F-type consensus sequence, in contrast to the A-type sequence, shows homology (70%) to the body of the L1Md starting at the position where the monomer joins the rest of the L1 element
    corecore