782 research outputs found

    Prediction of effective genome size in metagenomic samples

    Get PDF
    We introduce a novel computational approach to predict effective genome size (EGS; a measure that includes multiple plasmid copies, inserted sequences, and associated phages and viruses) from short sequencing reads of environmental genomics (or metagenomics) projects. We observe considerable EGS differences between environments and link this with ecologic complexity as well as species composition (for instance, the presence of eukaryotes). For example, we estimate EGS in a complex, organism-dense farm soil sample at about 6.3 megabases (Mb) whereas that of the bacteria therein is only 4.7 Mb; for bacteria in a nutrient-poor, organism-sparse ocean surface water sample, EGS is as low as 1.6 Mb. The method also permits evaluation of completion status and assembly bias in single-genome sequencing projects

    Systematic Association of Genes to Phenotypes by Genome and Literature Mining

    Get PDF
    One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene–phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases

    Novel transcribed regions in the human genome

    Get PDF
    We have used genomic tiling arrays to identify transcribed regions throughout the human genome. Analysis of the mapping results of RNA isolated from five cell/tissue types, NB4 cells, NB4 cells treated with retinoic acid (RA), NB4 cells treated with 12-O-tetradecanoylphorbol-13 acetate (TPA), neutrophils, and placenta, throughout the ENCODE region reveals a large number of novel transcribed regions. Interestingly, neutrophils exhibit a great deal of novel expression in several intronic regions. Comparison of the hybridization results of NB4 cells treated with different stimuli relative to untreated cells reveals that many new regions are expressed upon cell differentiation. One such region is the Hox locus, which contains a large number of novel regions expressed in a number of cell types. Analysis of the trinucleotide composition of the novel transcribed regions reveals that it is similar to that of known exons. These results suggest that many of the novel transcribed regions may have a functional role. Copyright 2006, Cold Spring Harbor Laboratory Press © 2006 Cold Spring Harbor Laboratory Press

    High-Resolution Copy-Number Variation Map Reflects Human Olfactory Receptor Diversity and Evolution

    Get PDF
    Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction (∼55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that ∼50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences

    Systematic Inference of Copy-Number Genotypes from Personal Genome Sequencing Data Reveals Extensive Olfactory Receptor Gene Content Diversity

    Get PDF
    Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95–99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ∼15% and ∼20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing

    Differential (2+1) Jet Event Rates and Determination of alpha_s in Deep Inelastic Scattering at HERA

    Full text link
    Events with a (2+1) jet topology in deep-inelastic scattering at HERA are studied in the kinematic range 200 < Q^2< 10,000 GeV^2. The rate of (2+1) jet events has been determined with the modified JADE jet algorithm as a function of the jet resolution parameter and is compared with the predictions of Monte Carlo models. In addition, the event rate is corrected for both hadronization and detector effects and is compared with next-to-leading order QCD calculations. A value of the strong coupling constant of alpha_s(M_Z^2)= 0.118+- 0.002 (stat.)^(+0.007)_(-0.008) (syst.)^(+0.007)_(-0.006) (theory) is extracted. The systematic error includes uncertainties in the calorimeter energy calibration, in the description of the data by current Monte Carlo models, and in the knowledge of the parton densities. The theoretical error is dominated by the renormalization scale ambiguity.Comment: 25 pages, 6 figures, 3 tables, submitted to Eur. Phys.

    Multi-Jet Event Rates in Deep Inelastic Scattering and Determination of the Strong Coupling Constant

    Get PDF
    Jet event rates in deep inelastic ep scattering at HERA are investigated applying the modified JADE jet algorithm. The analysis uses data taken with the H1 detector in 1994 and 1995. The data are corrected for detector and hadronization effects and then compared with perturbative QCD predictions using next-to-leading order calculations. The strong coupling constant alpha_S(M_Z^2) is determined evaluating the jet event rates. Values of alpha_S(Q^2) are extracted in four different bins of the negative squared momentum transfer~\qq in the range from 40 GeV2 to 4000 GeV2. A combined fit of the renormalization group equation to these several alpha_S(Q^2) values results in alpha_S(M_Z^2) = 0.117+-0.003(stat)+0.009-0.013(syst)+0.006(jet algorithm).Comment: 17 pages, 4 figures, 3 tables, this version to appear in Eur. Phys. J.; it replaces first posted hep-ex/9807019 which had incorrect figure 4

    Measurements of Transverse Energy Flow in Deep-Inelastic Scattering at HERA

    Full text link
    Measurements of transverse energy flow are presented for neutral current deep-inelastic scattering events produced in positron-proton collisions at HERA. The kinematic range covers squared momentum transfers Q^2 from 3.2 to 2,200 GeV^2, the Bjorken scaling variable x from 8.10^{-5} to 0.11 and the hadronic mass W from 66 to 233 GeV. The transverse energy flow is measured in the hadronic centre of mass frame and is studied as a function of Q^2, x, W and pseudorapidity. A comparison is made with QCD based models. The behaviour of the mean transverse energy in the central pseudorapidity region and an interval corresponding to the photon fragmentation region are analysed as a function of Q^2 and W.Comment: 26 pages, 8 figures, submitted to Eur. Phys.

    Searches at HERA for Squarks in R-Parity Violating Supersymmetry

    Get PDF
    A search for squarks in R-parity violating supersymmetry is performed in e^+p collisions at HERA at a centre of mass energy of 300 GeV, using H1 data corresponding to an integrated luminosity of 37 pb^(-1). The direct production of single squarks of any generation in positron-quark fusion via a Yukawa coupling lambda' is considered, taking into account R-parity violating and conserving decays of the squarks. No significant deviation from the Standard Model expectation is found. The results are interpreted in terms of constraints within the Minimal Supersymmetric Standard Model (MSSM), the constrained MSSM and the minimal Supergravity model, and their sensitivity to the model parameters is studied in detail. For a Yukawa coupling of electromagnetic strength, squark masses below 260 GeV are excluded at 95% confidence level in a large part of the parameter space. For a 100 times smaller coupling strength masses up to 182 GeV are excluded.Comment: 32 pages, 14 figures, 3 table
    corecore