113 research outputs found

    Interpretation of multiple probe sets mapping to the same gene in Affymetrix GeneChips

    Get PDF
    BACKGROUND: Affymetrix GeneChip technology enables the parallel observations of tens of thousands of genes. It is important that the probe set annotations are reliable so that biological inferences can be made about genes which undergo differential expression. Probe sets representing the same gene might be expected to show similar fold changes/z-scores, however this is in fact not the case. RESULTS: We have made a case study of the mouse Surf4, chosen because it is a gene that was reported to be represented by the same eight probe sets on the MOE430A array by both Affymetrix and Bioconductor in early 2004. Only five of the probe sets actually detect Surf4 transcripts. Two of the probe sets detect splice variants of Surf2. We have also studied the expression changes of the eight probe sets in a public-domain microarray experiment. The transcripts for Surf4 are correlated in time, and similarly the transcripts for Surf2 are also correlated in time. However, the transcripts for Surf4 and Surf2 are not correlated. This proof of principle shows that observations of expression can be used to confirm, or otherwise, annotation discrepancies. We have also investigated groups of probe sets on the RAE230A array that are assigned to the same LocusID, but which show large variances in differential expression in any one of three different experiments on rat. The probe set groups with high variances are found to represent cases of alternative splicing, use of alternative poly(A) signals, or incorrect annotations. CONCLUSION: Our results indicate that some probe sets should not be considered as unique measures of transcription, because the individual probes map to more than one transcript dependent upon the biological condition. Our results highlight the need for care when assessing whether groups of probe sets all measure the same transcript

    Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

    Get PDF
    The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Nucleic Acids Res 2018 Jan 4; 46(D1):D221-D228

    Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systematic processing noise, which includes batch effects, is very common in microarray experiments but is often ignored despite its potential to confound or compromise experimental results. Compromised results are most likely when re-analysing or integrating datasets from public repositories due to the different conditions under which each dataset is generated. To better understand the relative noise-contributions of various factors in experimental-design, we assessed several Illumina and Affymetrix datasets for technical variation between replicate hybridisations of Universal Human Reference (UHRR) and individual or pooled breast-tumour RNA.</p> <p>Results</p> <p>A varying degree of systematic noise was observed in each of the datasets, however in all cases the relative amount of variation between standard control RNA replicates was found to be greatest at earlier points in the sample-preparation workflow. For example, 40.6% of the total variation in reported expressions were attributed to replicate extractions, compared to 13.9% due to amplification/labelling and 10.8% between replicate hybridisations. Deliberate probe-wise batch-correction methods were effective in reducing the magnitude of this variation, although the level of improvement was dependent on the sources of noise included in the model. Systematic noise introduced at the chip, run, and experiment levels of a combined Illumina dataset were found to be highly dependant upon the experimental design. Both UHRR and pools of RNA, which were derived from the samples of interest, modelled technical variation well although the pools were significantly better correlated (4% average improvement) and better emulated the effects of systematic noise, over all probes, than the UHRRs. The effect of this noise was not uniform over all probes, with low GC-content probes found to be more vulnerable to batch variation than probes with a higher GC-content.</p> <p>Conclusions</p> <p>The magnitude of systematic processing noise in a microarray experiment is variable across probes and experiments, however it is generally the case that procedures earlier in the sample-preparation workflow are liable to introduce the most noise. Careful experimental design is important to protect against noise, detailed meta-data should always be provided, and diagnostic procedures should be routinely performed prior to downstream analyses for the detection of bias in microarray studies.</p

    The Consensus Coding Sequence (Ccds) Project: Identifying a Common Protein-Coding Gene Set for the Human and Mouse Genomes

    Get PDF
    Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.National Human Genome Research Institute (U.S.) (Grant number 1U54HG004555-01)Wellcome Trust (London, England) (Grant number WT062023)Wellcome Trust (London, England) (Grant number WT077198

    Ensembl’s 10th year

    Get PDF
    Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure

    Predicting Hemolytic Uremic Syndrome and Renal Replacement Therapy in Shiga Toxin-producing Escherichia coli-infected Children.

    Get PDF
    BACKGROUND: Shiga toxin-producing Escherichia coli (STEC) infections are leading causes of pediatric acute renal failure. Identifying hemolytic uremic syndrome (HUS) risk factors is needed to guide care. METHODS: We conducted a multicenter, historical cohort study to identify features associated with development of HUS (primary outcome) and need for renal replacement therapy (RRT) (secondary outcome) in STEC-infected children without HUS at initial presentation. Children agedeligible. RESULTS: Of 927 STEC-infected children, 41 (4.4%) had HUS at presentation; of the remaining 886, 126 (14.2%) developed HUS. Predictors (all shown as odds ratio [OR] with 95% confidence interval [CI]) of HUS included younger age (0.77 [.69-.85] per year), leukocyte count ≥13.0 × 103/μL (2.54 [1.42-4.54]), higher hematocrit (1.83 [1.21-2.77] per 5% increase) and serum creatinine (10.82 [1.49-78.69] per 1 mg/dL increase), platelet count \u3c250 \u3e× 103/μL (1.92 [1.02-3.60]), lower serum sodium (1.12 [1.02-1.23 per 1 mmol/L decrease), and intravenous fluid administration initiated ≥4 days following diarrhea onset (2.50 [1.14-5.46]). A longer interval from diarrhea onset to index visit was associated with reduced HUS risk (OR, 0.70 [95% CI, .54-.90]). RRT predictors (all shown as OR [95% CI]) included female sex (2.27 [1.14-4.50]), younger age (0.83 [.74-.92] per year), lower serum sodium (1.15 [1.04-1.27] per mmol/L decrease), higher leukocyte count ≥13.0 × 103/μL (2.35 [1.17-4.72]) and creatinine (7.75 [1.20-50.16] per 1 mg/dL increase) concentrations, and initial intravenous fluid administration ≥4 days following diarrhea onset (2.71 [1.18-6.21]). CONCLUSIONS: The complex nature of STEC infection renders predicting its course a challenge. Risk factors we identified highlight the importance of avoiding dehydration and performing close clinical and laboratory monitoring

    A new look at the LTR retrotransposon content of the chicken genome

    Get PDF
    BACKGROUND: LTR retrotransposons contribute approximately 10 % of the mammalian genome, but it has been previously reported that there is a deficit of these elements in the chicken relative to both mammals and other birds. A novel LTR retrotransposon classification pipeline, LocaTR, was developed and subsequently utilised to re-examine the chicken LTR retrotransposon annotation, and determine if the proposed chicken deficit is biologically accurate or simply a technical artefact. RESULTS: Using LocaTR 3.01 % of the chicken galGal4 genome assembly was annotated as LTR retrotransposon-derived elements (nearly double the previous annotation), including 1,073 that were structurally intact. Element distribution is significantly correlated with chromosome size and is non-random within each chromosome. Elements are significantly depleted within coding regions and enriched in gene sparse areas of the genome. Over 40 % of intact elements are found in clusters, unrelated by age or genera, generally in poorly recombining regions. The transcription of most LTR retrotransposons were suppressed or incomplete, but individual domain and full length retroviral transcripts were produced in some cases, although mostly with regularly interspersed stop codons in all reading frames. Furthermore, RNAseq data from 23 diverse tissues enabled greater characterisation of the co-opted endogenous retrovirus Ovex1. This gene was shown to be expressed ubiquitously but at variable levels across different tissues. LTR retrotransposon content was found to be very variable across the avian lineage and did not correlate with either genome size or phylogenetic position. However, the extent of previous, species-specific LTR retrotransposon annotation appears to be a confounding factor. CONCLUSIONS: Use of the novel LocaTR pipeline has nearly doubled the annotated LTR retrotransposon content of the chicken genome compared to previous estimates. Further analysis has described element distribution, clustering patterns and degree of expression in a variety of adult tissues, as well as in three embryonic stages. This study also enabled better characterisation of the co-opted gamma retroviral envelope gene Ovex1. Additionally, this work suggests that there is no deficit of LTR retrotransposons within the Galliformes relative to other birds, or to mammalian genomes when scaled for the three-fold difference in genome size

    Standards recommendations for the Earth BioGenome Project

    Get PDF
    A global international initiative, such as the Earth BioGenome Project (EBP), requires both agreement and coordination on standards to ensure that the collective effort generates rapid progress toward its goals. To this end, the EBP initiated five technical standards committees comprising volunteer members from the global genomics scientific community: Sample Collection and Processing, Sequencing and Assembly, Annotation, Analysis, and IT and Informatics. The current versions of the resulting standards documents are available on the EBP website, with the recognition that opportunities, technologies, and challenges may improve or change in the future, requiring flexibility for the EBP to meet its goals. Here, we describe some highlights from the proposed standards, and areas where additional challenges will need to be met

    Gram Negative Wound Infection in Hospitalised Adult Burn Patients-Systematic Review and Metanalysis-

    Get PDF
    BACKGROUND: Gram negative infection is a major determinant of morbidity and survival. Traditional teaching suggests that burn wound infections in different centres are caused by differing sets of causative organisms. This study established whether Gram-negative burn wound isolates associated to clinical wound infection differ between burn centres. METHODS: Studies investigating adult hospitalised patients (2000-2010) were critically appraised and qualified to a levels of evidence hierarchy. The contribution of bacterial pathogen type, and burn centre to the variance in standardised incidence of Gram-negative burn wound infection was analysed using two-way analysis of variance. PRIMARY FINDINGS: Pseudomonas aeruginosa, Klebsiella pneumoniae, Acinetobacter baumanni, Enterobacter spp., Proteus spp. and Escherichia coli emerged as the commonest Gram-negative burn wound pathogens. Individual pathogens' incidence did not differ significantly between burn centres (F (4, 20) = 1.1, p = 0.3797; r2 = 9.84). INTERPRETATION: Gram-negative infections predominate in burn surgery. This study is the first to establish that burn wound infections do not differ significantly between burn centres. It is the first study to report the pathogens responsible for the majority of Gram-negative infections in these patients. Whilst burn wound infection is not exclusive to these bacteria, it is hoped that reporting the presence of this group of common Gram-negative "target organisms" facilitate clinical practice and target research towards a defined clinical demand.peer-reviewe
    corecore