85 research outputs found

    Construction, visualisation, and clustering of transcription networks from microarray expression data.

    Get PDF
    Network analysis transcends conventional pairwise approaches to data analysis as the context of components in a network graph can be taken into account. Such approaches are increasingly being applied to genomics data, where functional linkages are used to connect genes or proteins. However, while microarray gene expression datasets are now abundant and of high quality, few approaches have been developed for analysis of such data in a network context. We present a novel approach for 3-D visualisation and analysis of transcriptional networks generated from microarray data. These networks consist of nodes representing transcripts connected by virtue of their expression profile similarity across multiple conditions. Analysing genome-wide gene transcription across 61 mouse tissues, we describe the unusual topography of the large and highly structured networks produced, and demonstrate how they can be used to visualise, cluster, and mine large datasets. This approach is fast, intuitive, and versatile, and allows the identification of biological relationships that may be missed by conventional analysis techniques. This work has been implemented in a freely available open-source application named BioLayout Express(3D)

    The risk of cryptorchidism among sons of women working in horticulture in Denmark: a cohort study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Androgens are crucial for normal testicular descent. Studies show that some pesticides have estrogenic or antiandrogenic effects, and that female workers exposed to pesticides have increased risk of having a boy with cryptorchidism. The main objective of the present study was to investigate whether pregnant women exposed to pesticides due to their work in horticulture experience excess risk of having sons with cryptorchidism.</p> <p>Methods</p> <p>We conducted a cohort study of pregnant women working in horticulture using four cohorts including one cohort established with data from the departments of occupational medicine in Jutland and Funen and three existing mother-child cohorts (n = 1,468). A reference group was established from the entire Danish population of boys born in the period of 1986-2007 (n = 783,817). Nationwide Danish health registers provided information on birth outcome, cryptorchidism diagnosis and orchiopexy. The level of occupational exposure to pesticides was assessed by expert judgment blinded towards outcome status. Risk of cryptorchidism among exposed horticulture workers compared to the background population and to unexposed horticulture workers was assessed by Cox regression models.</p> <p>Results</p> <p>Pesticide exposed women employed in horticulture had a hazard ratio (HR) of having cryptorchid sons of 1.39 (95% CI 0.84; 2.31) and a HR of orchiopexy of 1.34 (0.72; 2.49) compared to the background population. Analysis divided into separate cohorts revealed a significantly increased risk of cryptorchidism in cohort 2: HR 2.58 (1.07;6.20) and increased risk of orchiopexy in cohort 4: HR 2.76 (1.03;7.35), but no significant associations in the other cohorts. Compared to unexposed women working in horticulture, pesticide exposed women had a risk of having sons with cryptorchidism of 1.34 (0.30; 5.96) and of orchiopexy of 1.93 (0.24;15.4).</p> <p>Conclusions</p> <p>The data are compatible with a slightly increased risk of cryptorchidism in sons of women exposed to pesticides by working in horticulture.</p

    Factors influencing success of clinical genome sequencing across a broad spectrum of disorders

    Get PDF
    To assess factors influencing the success of whole-genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases or families across a broad spectrum of disorders in whom previous screening had identified no pathogenic variants. We quantified the number of candidate variants identified using different strategies for variant calling, filtering, annotation and prioritization. We found that jointly calling variants across samples, filtering against both local and external databases, deploying multiple annotation tools and using familial transmission above biological plausibility contributed to accuracy. Overall, we identified disease-causing variants in 21% of cases, with the proportion increasing to 34% (23/68) for mendelian disorders and 57% (8/14) in family trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, although only 4 were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis but also highlight many outstanding challenges

    NOX1 loss-of-function genetic variants in patients with inflammatory bowel disease.

    Get PDF
    Genetic defects that affect intestinal epithelial barrier function can present with very early-onset inflammatory bowel disease (VEOIBD). Using whole-genome sequencing, a novel hemizygous defect in NOX1 encoding NAPDH oxidase 1 was identified in a patient with ulcerative colitis-like VEOIBD. Exome screening of 1,878 pediatric patients identified further seven male inflammatory bowel disease (IBD) patients with rare NOX1 mutations. Loss-of-function was validated in p.N122H and p.T497A, and to a lesser degree in p.Y470H, p.R287Q, p.I67M, p.Q293R as well as the previously described p.P330S, and the common NOX1 SNP p.D360N (rs34688635) variant. The missense mutation p.N122H abrogated reactive oxygen species (ROS) production in cell lines, ex vivo colonic explants, and patient-derived colonic organoid cultures. Within colonic crypts, NOX1 constitutively generates a high level of ROS in the crypt lumen. Analysis of 9,513 controls and 11,140 IBD patients of non-Jewish European ancestry did not reveal an association between p.D360N and IBD. Our data suggest that loss-of-function variants in NOX1 do not cause a Mendelian disorder of high penetrance but are a context-specific modifier. Our results implicate that variants in NOX1 change brush border ROS within colonic crypts at the interface between the epithelium and luminal microbes

    Factors influencing success of clinical genome sequencing across a broad spectrum of disorders

    Get PDF
    To assess factors influencing the success of whole-genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases or families across a broad spectrum of disorders in whom previous screening had identified no pathogenic variants. We quantified the number of candidate variants identified using different strategies for variant calling, filtering, annotation and prioritization. We found that jointly calling variants across samples, filtering against both local and external databases, deploying multiple annotation tools and using familial transmission above biological plausibility contributed to accuracy. Overall, we identified disease-causing variants in 21% of cases, with the proportion increasing to 34% (23/68) for mendelian disorders and 57% (8/14) in family trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, although only 4 were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis but also highlight many outstanding challenges

    Quantification of codon selection for comparative bacterial genomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Statistics measuring codon selection seek to compare genes by their sensitivity to selection for translational efficiency, but existing statistics lack a model for testing the significance of differences between genes. Here, we introduce a new statistic for measuring codon selection, the Adaptive Codon Enrichment (ACE).</p> <p>Results</p> <p>This statistic represents codon usage bias in terms of a probabilistic distribution, quantifying the extent that preferred codons are over-represented in the gene of interest relative to the mean and variance that would result from stochastic sampling of codons. Expected codon frequencies are derived from the observed codon usage frequencies of a broad set of genes, such that they are likely to reflect nonselective, genome wide influences on codon usage (<it>e.g</it>. mutational biases). The relative adaptiveness of synonymous codons is deduced from the frequency of codon usage in a pre-selected set of genes relative to the expected frequency. The ACE can predict both transcript abundance during rapid growth and the rate of synonymous substitutions, with accuracy comparable to or greater than existing metrics. We further examine how the composition of reference gene sets affects the accuracy of the statistic, and suggest methods for selecting appropriate reference sets for any genome, including bacteriophages. Finally, we demonstrate that the ACE may naturally be extended to quantify the genome-wide influence of codon selection in a manner that is sensitive to a large fraction of codons in the genome. This reveals substantial variation among genomes, correlated with the tRNA gene number, even among groups of bacteria where previously proposed whole-genome measures show little variation.</p> <p>Conclusions</p> <p>The statistical framework of the ACE allows rigorous comparison of the level of codon selection acting on genes, both within a genome and between genomes.</p

    Translational Selection Is Ubiquitous in Prokaryotes

    Get PDF
    Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome—between 5% and 33%, depending on genome size—while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl–tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an “adaptome” by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea

    100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care — Preliminary Report

    Get PDF
    BACKGROUND: The U.K. 100,000 Genomes Project is in the process of investigating the role of genome sequencing in patients with undiagnosed rare diseases after usual care and the alignment of this research with health care implementation in the U.K. National Health Service. Other parts of this project focus on patients with cancer and infection. METHODS: We conducted a pilot study involving 4660 participants from 2183 families, among whom 161 disorders covering a broad spectrum of rare diseases were present. We collected data on clinical features with the use of Human Phenotype Ontology terms, undertook genome sequencing, applied automated variant prioritization on the basis of applied virtual gene panels and phenotypes, and identified novel pathogenic variants through research analysis. RESULTS: Diagnostic yields varied among family structures and were highest in family trios (both parents and a proband) and families with larger pedigrees. Diagnostic yields were much higher for disorders likely to have a monogenic cause (35%) than for disorders likely to have a complex cause (11%). Diagnostic yields for intellectual disability, hearing disorders, and vision disorders ranged from 40 to 55%. We made genetic diagnoses in 25% of the probands. A total of 14% of the diagnoses were made by means of the combination of research and automated approaches, which was critical for cases in which we found etiologic noncoding, structural, and mitochondrial genome variants and coding variants poorly covered by exome sequencing. Cohortwide burden testing across 57,000 genomes enabled the discovery of three new disease genes and 19 new associations. Of the genetic diagnoses that we made, 25% had immediate ramifications for clinical decision making for the patients or their relatives. CONCLUSIONS: Our pilot study of genome sequencing in a national health care system showed an increase in diagnostic yield across a range of rare diseases. (Funded by the National Institute for Health Research and others.)

    Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

    Get PDF
    A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved
    corecore