66 research outputs found
ChromaSig: A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome
Computational methods to identify functional genomic elements using genetic information have been very successful in determining gene structure and in identifying a handful of cis-regulatory elements. But the vast majority of regulatory elements have yet to be discovered, and it has become increasingly apparent that their discovery will not come from using genetic information alone. Recently, high-throughput technologies have enabled the creation of information-rich epigenetic maps, most notably for histone modifications. However, tools that search for functional elements using this epigenetic information have been lacking. Here, we describe an unsupervised learning method called ChromaSig to find, in an unbiased fashion, commonly occurring chromatin signatures in both tiling microarray and sequencing data. Applying this algorithm to nine chromatin marks across a 1% sampling of the human genome in HeLa cells, we recover eight clusters of distinct chromatin signatures, five of which correspond to known patterns associated with transcriptional promoters and enhancers. Interestingly, we observe that the distinct chromatin signatures found at enhancers mark distinct functional classes of enhancers in terms of transcription factor and coactivator binding. In addition, we identify three clusters of novel chromatin signatures that contain evolutionarily conserved sequences and potential cis-regulatory elements. Applying ChromaSig to a panel of 21 chromatin marks mapped genomewide by ChIP-Seq reveals 16 classes of genomic elements marked by distinct chromatin signatures. Interestingly, four classes containing enrichment for repressive histone modifications appear to be locally heterochromatic sites and are enriched in quickly evolving regions of the genome. The utility of this approach in uncovering novel, functionally significant genomic elements will aid future efforts of genome annotation via chromatin modifications
Molecular and cellular mechanisms underlying the evolution of form and function in the amniote jaw.
The amniote jaw complex is a remarkable amalgamation of derivatives from distinct embryonic cell lineages. During development, the cells in these lineages experience concerted movements, migrations, and signaling interactions that take them from their initial origins to their final destinations and imbue their derivatives with aspects of form including their axial orientation, anatomical identity, size, and shape. Perturbations along the way can produce defects and disease, but also generate the variation necessary for jaw evolution and adaptation. We focus on molecular and cellular mechanisms that regulate form in the amniote jaw complex, and that enable structural and functional integration. Special emphasis is placed on the role of cranial neural crest mesenchyme (NCM) during the species-specific patterning of bone, cartilage, tendon, muscle, and other jaw tissues. We also address the effects of biomechanical forces during jaw development and discuss ways in which certain molecular and cellular responses add adaptive and evolutionary plasticity to jaw morphology. Overall, we highlight how variation in molecular and cellular programs can promote the phenomenal diversity and functional morphology achieved during amniote jaw evolution or lead to the range of jaw defects and disease that affect the human condition
Identification of CD8+ T Cell Epitopes in the West Nile Virus Polyprotein by Reverse-Immunology Using NetCTL
West Nile virus (WNV) is a growing threat to public health and a greater understanding of the immune response raised against WNV is important for the development of prophylactic and therapeutic strategies.In a reverse-immunology approach, we used bioinformatics methods to predict WNV-specific CD8(+) T cell epitopes and selected a set of peptides that constitutes maximum coverage of 20 fully-sequenced WNV strains. We then tested these putative epitopes for cellular reactivity in a cohort of WNV-infected patients. We identified 26 new CD8(+) T cell epitopes, which we propose are restricted by 11 different HLA class I alleles. Aiming for optimal coverage of human populations, we suggest that 11 of these new WNV epitopes would be sufficient to cover from 48% to 93% of ethnic populations in various areas of the World.The 26 identified CD8(+) T cell epitopes contribute to our knowledge of the immune response against WNV infection and greatly extend the list of known WNV CD8(+) T cell epitopes. A polytope incorporating these and other epitopes could possibly serve as the basis for a WNV vaccine
Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments
<p>Abstract</p> <p>Background</p> <p>Systematic processing noise, which includes batch effects, is very common in microarray experiments but is often ignored despite its potential to confound or compromise experimental results. Compromised results are most likely when re-analysing or integrating datasets from public repositories due to the different conditions under which each dataset is generated. To better understand the relative noise-contributions of various factors in experimental-design, we assessed several Illumina and Affymetrix datasets for technical variation between replicate hybridisations of Universal Human Reference (UHRR) and individual or pooled breast-tumour RNA.</p> <p>Results</p> <p>A varying degree of systematic noise was observed in each of the datasets, however in all cases the relative amount of variation between standard control RNA replicates was found to be greatest at earlier points in the sample-preparation workflow. For example, 40.6% of the total variation in reported expressions were attributed to replicate extractions, compared to 13.9% due to amplification/labelling and 10.8% between replicate hybridisations. Deliberate probe-wise batch-correction methods were effective in reducing the magnitude of this variation, although the level of improvement was dependent on the sources of noise included in the model. Systematic noise introduced at the chip, run, and experiment levels of a combined Illumina dataset were found to be highly dependant upon the experimental design. Both UHRR and pools of RNA, which were derived from the samples of interest, modelled technical variation well although the pools were significantly better correlated (4% average improvement) and better emulated the effects of systematic noise, over all probes, than the UHRRs. The effect of this noise was not uniform over all probes, with low GC-content probes found to be more vulnerable to batch variation than probes with a higher GC-content.</p> <p>Conclusions</p> <p>The magnitude of systematic processing noise in a microarray experiment is variable across probes and experiments, however it is generally the case that procedures earlier in the sample-preparation workflow are liable to introduce the most noise. Careful experimental design is important to protect against noise, detailed meta-data should always be provided, and diagnostic procedures should be routinely performed prior to downstream analyses for the detection of bias in microarray studies.</p
De novo assembly and characterization of a maternal and developmental transcriptome for the emerging model crustacean Parhyale hawaiensis
<p>Abstract</p> <p>Background</p> <p>Arthropods are the most diverse animal phylum, but their genomic resources are relatively few. While the genome of the branchiopod <it>Daphnia pulex </it>is now available, no other large-scale crustacean genomic resources are available for comparison. In particular, genomic resources are lacking for the most tractable laboratory model of crustacean development, the amphipod <it>Parhyale hawaiensis</it>. Insight into shared and divergent characters of crustacean genomes will facilitate interpretation of future developmental, biomedical, and ecological research using crustacean models.</p> <p>Results</p> <p>To generate a transcriptome enriched for maternally provided and zygotically transcribed developmental genes, we created cDNA from ovaries and embryos of <it>P. hawaiensis</it>. Using 454 pyrosequencing, we sequenced over 1.1 billion bases of this cDNA, and assembled them <it>de novo </it>to create, to our knowledge, the second largest crustacean genomic resource to date. We found an unusually high proportion of C2H2 zinc finger-containing transcripts, as has also been reported for the genome of the pea aphid <it>Acyrthosiphon pisum</it>. Consistent with previous reports, we detected trans-spliced transcripts, but found that they did not noticeably impact transcriptome assembly. Our assembly products yielded 19,067 unique BLAST hits against <b>nr </b>(E-value cutoff e-10). These included over 400 predicted transcripts with significant similarity to <it>D. pulex </it>sequences but not to sequences of any other animal. Annotation of several hundred genes revealed <it>P. hawaiensis </it>homologues of genes involved in development, gametogenesis, and a majority of the members of six major conserved metazoan signaling pathways.</p> <p>Conclusions</p> <p>The amphipod <it>P. hawaiensis </it>has higher transcript complexity than known insect transcriptomes, and trans-splicing does not appear to be a major contributor to this complexity. We discuss the importance of a reliable comparative genomic framework within which to consider findings from new crustacean models such as <it>D. pulex </it>and <it>P. hawaiensis</it>, as well as the need for development of further substantial crustacean genomic resources.</p
Can Survival Prediction Be Improved By Merging Gene Expression Data Sets?
BACKGROUND:High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. RESULTS:Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. CONCLUSIONS:Merging did not deteriorate performance on average despite (a) The diversity of microarray platforms used. (b) The heterogeneity of patients cohorts. (c) The heterogeneity of breast cancer disease. (d) Substantial variation of time to death or relapse. (e) The reduced number of genes in the merged data sets. Predictors derived from the merged data sets were more robust, consistent and reproducible across microarray platforms. Moreover, merging data sets from different studies helps to better understand the biases of individual studies and can lead to the identification of strong survival factors like CYB5D1 expression
- …