25 research outputs found
Hybridization biases of microarray expression data - A model-based analysis of RNA quality and sequence effects
Modern high-throughput technologies like DNA microarrays are powerful
tools that are widely used in biomedical research. They target a
variety of genomics applications ranging from gene expression
profiling over DNA genotyping to gene regulation studies. However, the
recent discovery of false positives among prominent research findings
indicates a lack of awareness or understanding of the non-biological
factors negatively affecting the accuracy of data produced using these
technologies. The aim of this thesis is to study the origins, effects
and potential correction methods for selected methodical biases in
microarray data.
The two-species Langmuir model serves as the basal physicochemical
model of microarray hybridization describing the fluorescence signal
response of oligonucleotide probes. The so-called hook method allows
to estimate essential model parameters and to compute summary
parameters characterizing a particular microarray sample. We show that
this method can be applied successfully to various types of
microarrays which share the same basic mechanism of multiplexed
nucleic acid hybridization.
Using appropriate modifications of the model we study RNA quality and
sequence effects using publicly available data from Affymetrix
GeneChip expression arrays. Varying amounts of hybridized RNA result
in systematic changes of raw intensity signals and appropriate
indicator variables computed from these. Varying RNA quality strongly
affects intensity signals of probes which are located at the 3\'' end of
transcripts. We develop new methods that help assessing the RNA
quality of a particular microarray sample. A new metric for
determining RNA quality, the degradation index, is proposed which
improves previous RNA quality metrics. Furthermore, we present a
method for the correction of the 3\'' intensity bias. These
functionalities have been implemented in the freely available program
package AffyRNADegradation.
We show that microarray probe signals are affected by sequence effects
which are studied systematically using positional-dependent
nearest-neighbor models. Analysis of the resulting sensitivity
profiles reveals that specific sequence patterns such as runs of
guanines at the solution end of the probes have a strong impact on the
probe signals. The sequence effects differ for different chip- and
target-types, probe types and hybridization modes. Theoretical and
practical solutions for the correction of the introduced sequence bias
are provided.
Assessment of RNA quality and sequence biases in a representative
ensemble of over 8000 available microarray samples reveals that RNA
quality issues are prevalent: about 10% of the samples have
critically low RNA quality. Sequence effects exhibit considerable
variation within the investigated samples but have limited impact on
the most common patterns in the expression space. Variations in RNA
quality and quantity in contrast have a significant impact on the
obtained expression measurements.
These hybridization biases should be considered and controlled in
every microarray experiment to ensure reliable results. Application of
rigorous quality control and signal correction methods is strongly
advised to avoid erroneous findings. Also, incremental refinement of
physicochemical models is a promising way to improve signal
calibration paralleled with the opportunity to better understand the
fundamental processes in microarray hybridization
G-stack modulated probe intensities on expression arrays - sequence corrections and signal calibration
<p>Abstract</p> <p>Background</p> <p>The brightness of the probe spots on expression microarrays intends to measure the abundance of specific mRNA targets. Probes with runs of at least three guanines (G) in their sequence show abnormal high intensities which reflect rather probe effects than target concentrations. This G-bias requires correction prior to downstream expression analysis.</p> <p>Results</p> <p>Longer runs of three or more consecutive G along the probe sequence and in particular triple degenerated G at its solution end ((<it>GGG</it>)<sub>1</sub>-effect) are associated with exceptionally large probe intensities on GeneChip expression arrays. This intensity bias is related to non-specific hybridization and affects both perfect match and mismatch probes. The (<it>GGG</it>)<sub>1</sub>-effect tends to increase gradually for microarrays of later GeneChip generations. It was found for DNA/RNA as well as for DNA/DNA probe/target-hybridization chemistries. Amplification of sample RNA using T7-primers is associated with strong positive amplitudes of the G-bias whereas alternative amplification protocols using random primers give rise to much smaller and partly even negative amplitudes.</p> <p>We applied positional dependent sensitivity models to analyze the specifics of probe intensities in the context of all possible short sequence motifs of one to four adjacent nucleotides along the 25meric probe sequence. Most of the longer motifs are adequately described using a nearest-neighbor (NN) model. In contrast, runs of degenerated guanines require explicit consideration of next nearest neighbors (GGG terms). Preprocessing methods such as vsn, RMA, dChip, MAS5 and gcRMA only insufficiently remove the G-bias from data.</p> <p>Conclusions</p> <p>Positional and motif dependent sensitivity models accounts for sequence effects of oligonucleotide probe intensities. We propose a positional dependent NN+GGG hybrid model to correct the intensity bias associated with probes containing poly-G motifs. It is implemented as a single-chip based calibration algorithm for GeneChips which can be applied in a pre-correction step prior to standard preprocessing.</p
DARIO: a ncRNA detection and analysis tool for next-generation sequencing experiments
Small non-coding RNAs (ncRNAs) such as microRNAs, snoRNAs and tRNAs are a diverse collection of molecules with several important biological functions. Current methods for high-throughput sequencing for the first time offer the opportunity to investigate the entire ncRNAome in an essentially unbiased way. However, there is a substantial need for methods that allow a convenient analysis of these overwhelmingly large data sets. Here, we present DARIO, a free web service that allows to study short read data from small RNA-seq experiments. It provides a wide range of analysis features, including quality control, read normalization, ncRNA quantification and prediction of putative ncRNA candidates. The DARIO web site can be accessed at http://dario.bioinf.uni-leipzig.de/
[Avian cytogenetics goes functional] Third report on chicken genes and chromosomes 2015
High-density gridded libraries of large-insert clones using bacterial artificial chromosome (BAC) and other vectors are essential tools for genetic and genomic research in chicken and other avian species... Taken together, these studies demonstrate that applications of large-insert clones and BAC libraries derived from birds are, and will continue to be, effective tools to aid high-throughput and state-of-the-art genomic efforts and the important biological insight that arises from them
The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons
To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before teleost genome duplication (TGD). The slowly evolving gar genome has conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization and development (mediated, for example, by Hox, ParaHox and microRNA genes). Numerous conserved noncoding elements (CNEs; often cis regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles for such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses showed that the sums of expression domains and expression levels for duplicated teleost genes often approximate the patterns and levels of expression for gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes and the function of human regulatory sequences
Introducing evolutionary biologists to the analysis of big data: guidelines to organize extended bioinformatics training courses
Research in evolutionary biology has been progressively influenced by big data such as massive genome and transcriptome sequencing data, scalar measurements of several phenotypes on tens to thousands of individuals, as well as from collecting worldwide environmental data at an increasingly detailed scale. The handling and analysis of such data require computational skills that usually exceed the abilities of most traditionally trained evolutionary biologists. Here we discuss the advantages, challenges and considerations for organizing and running bioinformatics training courses of 2â3Â weeks in length to introduce evolutionary biologists to the computational analysis of big data. Extended courses have the advantage of offering trainees the opportunity to learn a more comprehensive set of complementary topics and skills and allowing for more time to practice newly acquired competences. Many organizational aspects are common to any course, as the need to define precise learning objectives and the selection of appropriate and highly motivated instructors and trainees, among others. However, other features assume particular importance in extended bioinformatics training courses. To successfully implement a learning-by-doing philosophy, sufficient and enthusiastic teaching assistants (TAs) are necessary to offer prompt help to trainees. Further, a good balance between theoretical background and practice time needs to be provided and assured that the schedule includes enough flexibility for extra review sessions or further discussions if desired. A final project enables trainees to apply their newly learned skills to real data or case studies of their interest. To promote a friendly atmosphere throughout the course and to build a close-knit community after the course, allow time for some scientific discussions and social activities. In addition, to not exhaust trainees and TAs, some leisure time needs to be organized. Finally, all organization should be done while keeping the budget within fair limits. In order to create a sustainable course that constantly improves and adapts to the traineesâ needs, gathering short- and long-term feedback after the end of the course is important. Based on our experience we have collected a set of recommendations to effectively organize and run extended bioinformatics training courses for evolutionary biologists, which we here want to share with the community. They offer a complementary way for the practical teaching of modern evolutionary biology and reaching out to the biological community.Peer reviewe
Variation of RNA Quality and Quantity Are Major Sources of Batch Effects in Microarray Expression Data
The great utility of microarrays for genome-scale expression analysis is challenged by the widespread presence of batch effects, which bias expression measurements in particular within large data sets. These unwanted technical artifacts can obscure biological variation and thus significantly reduce the reliability of the analysis results. It is largely unknown which are the predominant technical sources leading to batch effects. We here quantitatively assess the prevalence and impact of several known technical effects on microarray expression results. Particularly, we focus on important factors such as RNA degradation, RNA quantity, and sequence biases including multiple guanine effects. We find that the common variation of RNA quality and RNA quantity can not only yield low-quality expression results, but that both factors also correlate with batch effects and biological characteristics of the samples
Hybridization biases of microarray expression data - A model-based analysis of RNA quality and sequence effects
Modern high-throughput technologies like DNA microarrays are powerful
tools that are widely used in biomedical research. They target a
variety of genomics applications ranging from gene expression
profiling over DNA genotyping to gene regulation studies. However, the
recent discovery of false positives among prominent research findings
indicates a lack of awareness or understanding of the non-biological
factors negatively affecting the accuracy of data produced using these
technologies. The aim of this thesis is to study the origins, effects
and potential correction methods for selected methodical biases in
microarray data.
The two-species Langmuir model serves as the basal physicochemical
model of microarray hybridization describing the fluorescence signal
response of oligonucleotide probes. The so-called hook method allows
to estimate essential model parameters and to compute summary
parameters characterizing a particular microarray sample. We show that
this method can be applied successfully to various types of
microarrays which share the same basic mechanism of multiplexed
nucleic acid hybridization.
Using appropriate modifications of the model we study RNA quality and
sequence effects using publicly available data from Affymetrix
GeneChip expression arrays. Varying amounts of hybridized RNA result
in systematic changes of raw intensity signals and appropriate
indicator variables computed from these. Varying RNA quality strongly
affects intensity signals of probes which are located at the 3\'' end of
transcripts. We develop new methods that help assessing the RNA
quality of a particular microarray sample. A new metric for
determining RNA quality, the degradation index, is proposed which
improves previous RNA quality metrics. Furthermore, we present a
method for the correction of the 3\'' intensity bias. These
functionalities have been implemented in the freely available program
package AffyRNADegradation.
We show that microarray probe signals are affected by sequence effects
which are studied systematically using positional-dependent
nearest-neighbor models. Analysis of the resulting sensitivity
profiles reveals that specific sequence patterns such as runs of
guanines at the solution end of the probes have a strong impact on the
probe signals. The sequence effects differ for different chip- and
target-types, probe types and hybridization modes. Theoretical and
practical solutions for the correction of the introduced sequence bias
are provided.
Assessment of RNA quality and sequence biases in a representative
ensemble of over 8000 available microarray samples reveals that RNA
quality issues are prevalent: about 10% of the samples have
critically low RNA quality. Sequence effects exhibit considerable
variation within the investigated samples but have limited impact on
the most common patterns in the expression space. Variations in RNA
quality and quantity in contrast have a significant impact on the
obtained expression measurements.
These hybridization biases should be considered and controlled in
every microarray experiment to ensure reliable results. Application of
rigorous quality control and signal correction methods is strongly
advised to avoid erroneous findings. Also, incremental refinement of
physicochemical models is a promising way to improve signal
calibration paralleled with the opportunity to better understand the
fundamental processes in microarray hybridization