15 research outputs found

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    The characterisation of three types of genes that overlie copy number variable regions.

    No full text
    Due to the increased accuracy of Copy Number Variable region (CNV) break point mapping, it is now possible to say with a reasonable degree of confidence whether a gene (i) falls entirely within a CNV; (ii) overlaps the CNV or (iii) actually contains the CNV. We classify these as type I, II and III CNV genes respectively.Here we show that although type I genes vary in copy number along with the CNV, most of these type I genes have the same expression levels as wild type copy numbers of the gene. These genes must, therefore, be under homeostatic dosage compensation control. Looking into possible mechanisms for the regulation of gene expression we found that type I genes have a significant paucity of genes regulated by miRNAs and are not significantly enriched for monoallelically expressed genes. Type III genes, on the other hand, have a significant excess of genes regulated by miRNAs and are enriched for genes that are monoallelically expressed.Many diseases and genomic disorders are associated with CNVs so a better understanding of the different ways genes are associated with normal CNVs will help focus on candidate genes in genome wide association studies

    Sequence search algorithms for single pass sequence identification: Does one size fit all?

    No full text
    Bioinformatic tools have become essential to biologists in their quest to understand the vast quantities of sequence data, and now whole genomes, which are being produced at an ever increasing rate. Much of these sequence data are single-pass sequences, such as sample sequences from organisms closely related to other organisms of interest which have already been sequenced, or cDNAs or expressed sequence tags (ESTs). These single-pass sequences often contain errors, including frameshifts, which complicate the identification of homologues, especially at the protein level. Therefore, sequence searches with this type of data are often performed at the nucleotide level. The most commonly used sequence search algorithms for the identification of homologues are Washington University's and the National Center for Biotechnology Information's (NCBI) versions of the BLAST suites of tools, which are to be found on websites all over the world. The work reported here examines the use of these tools for comparing sample sequence datasets to a known genome. It shows that care must be taken when choosing the parameters to use with the BLAST algorithms. NCBI's version of gapped BLASTn gives much shorter, and sometimes different, top alignments to those found using Washington University's version of BLASTn (which also allows for gaps), when both are used with their default parameters. Most of the differences in performance were found to be due to the choices of default parameters rather than underlying differences between the two algorithms. Washington University's version, used with defaults, compares very favourably with the results obtained using the accurate but computationally intensive Smith-Waterman algorithm. Copyright © 2001 John Wiley & Sons, Ltd

    Disruption of six novel ORFs on the left arm of chromosome XII reveals one gene essential for vegetative growth of Saccharomyces cerevisiae

    No full text
    Deletion via PCR-mediated gene replacement, together with basic functional and bioinformatic analyses, have been performed on six novel open reading-frames (ORFs) on the left arm of chromosome XII of Saccharomyces cerevisiae (YLL033w, YLL032c, YLL031c, YLL030c, YLL029w and YLL028w). ORF deletion was realized using either a short-flanking homology (SFH) or a long-flanking homology (LFH) replacement cassette in the diploid strain FY1679. Sporulation and tetrad analysis showed that YLL031c is the only essential gene of the six. Microscopic examination of the non-growing spores carrying a disrupted copy of the essential gene showed that most of them were blocked after one or two cell divisions with heterogeneous bud size. The standard EUROFAN growth tests failed to reveal any obvious phenotype resulting from the deletion of each the five non-essential ORFs. Bioinformatic analysis revealed that YLL029w is probably an aminopeptidase for mitochondrial or nuclear protein processing and YLL028w may be involved in drug resistance in S. cerevisiae. Replacement cassettes, comprising the promoter and terminator regions of each of the six ORFs, were cloned into pUG7 and demonstrated to efficiently mediate gene replacement in an alternative diploid strain, W303. All the cognate gene clones were constructed, using either PCR products amplified from genomic DNA, or gap-repair. All clones and strains generated have been deposited in the EUROFAN genetic stock centre (EUROSCARF, Frankfurt)
    corecore