16 research outputs found

    Optimal cDNA microarray design using expressed sequence tags for organisms with limited genomic information

    Get PDF
    BACKGROUND: Expression microarrays are increasingly used to characterize environmental responses and host-parasite interactions for many different organisms. Probe selection for cDNA microarrays using expressed sequence tags (ESTs) is challenging due to high sequence redundancy and potential cross-hybridization between paralogous genes. In organisms with limited genomic information, like marine organisms, this challenge is even greater due to annotation uncertainty. No general tool is available for cDNA microarray probe selection for these organisms. Therefore, the goal of the design procedure described here is to select a subset of ESTs that will minimize sequence redundancy and characterize potential cross-hybridization while providing functionally representative probes. RESULTS: Sequence similarity between ESTs, quantified by the E-value of pair-wise alignment, was used as a surrogate for expected hybridization between corresponding sequences. Using this value as a measure of dissimilarity, sequence redundancy reduction was performed by hierarchical cluster analyses. The choice of how many microarray probes to retain was made based on an index developed for this research: a sequence diversity index (SDI) within a sequence diversity plot (SDP). This index tracked the decreasing within-cluster sequence diversity as the number of clusters increased. For a given stage in the agglomeration procedure, the EST having the highest similarity to all the other sequences within each cluster, the centroid EST, was selected as a microarray probe. A small dataset of ESTs from Atlantic white shrimp (Litopenaeus setiferus) was used to test this algorithm so that the detailed results could be examined. The functional representative level of the selected probes was quantified using Gene Ontology (GO) annotations. CONCLUSIONS: For organisms with limited genomic information, combining hierarchical clustering methods to analyze ESTs can yield an optimal cDNA microarray design. If biomarker discovery is the goal of the microarray experiments, the average linkage method is more effective, while single linkage is more suitable if identification of physiological mechanisms is more of interest. This general design procedure is not limited to designing single-species cDNA microarrays for marine organisms, and it can equally be applied to multiple-species microarrays of any organisms with limited genomic information

    Marine Genomics: A clearing-house for genomic and transcriptomic data of marine organisms

    Get PDF
    BACKGROUND: The Marine Genomics project is a functional genomics initiative developed to provide a pipeline for the curation of Expressed Sequence Tags (ESTs) and gene expression microarray data for marine organisms. It provides a unique clearing-house for marine specific EST and microarray data and is currently available at . DESCRIPTION: The Marine Genomics pipeline automates the processing, maintenance, storage and analysis of EST and microarray data for an increasing number of marine species. It currently contains 19 species databases (over 46,000 EST sequences) that are maintained by registered users from local and remote locations in Europe and South America in addition to the USA. A collection of analysis tools are implemented. These include a pipeline upload tool for EST FASTA file, sequence trace file and microarray data, an annotative text search, automated sequence trimming, sequence quality control (QA/QC) editing, sequence BLAST capabilities and a tool for interactive submission to GenBank. Another feature of this resource is the integration with a scientific computing analysis environment implemented by MATLAB. CONCLUSION: The conglomeration of multiple marine organisms with integrated analysis tools enables users to focus on the comprehensive descriptions of transcriptomic responses to typical marine stresses. This cross species data comparison and integration enables users to contain their research within a marine-oriented data management and analysis environment

    ILOOP – a web application for two-channel microarray interwoven loop design

    Get PDF
    Microarray technology is widely applied to address complex scientific questions. However, there remain fundamental issues on how to design experiments to ensure that the resulting data enables robust statistical analysis. Interwoven loop design has several advantages over other designs. However it suffers in the complexity of design. We have implemented an online web application which allows users to find optimal loop designs for two-color microarray experiments. Given a number of conditions (such as treatments or time points) and replicates, the application will find the best possible design of the experiment and output experimental parameters. It is freely available from

    Improvement in the Reproducibility and Accuracy of DNA Microarray Quantification by Optimizing Hybridization Conditions

    Get PDF
    BACKGROUND: DNA microarrays, which have been increasingly used to monitor mRNA transcripts at a global level, can provide detailed insight into cellular processes involved in response to drugs and toxins. This is leading to new understandings of signaling networks that operate in the cell, and the molecular basis of diseases. Custom printed oligonucleotide arrays have proven to be an effective way to facilitate the applications of DNA microarray technology. A successful microarray experiment, however, involves many steps: well-designed oligonucleotide probes, printing, RNA extraction and labeling, hybridization, and imaging. Optimization is essential to generate reliable microarray data. RESULTS: Hybridization and washing steps are crucial for a successful microarray experiment. By following the hybridization and washing conditions recommended by an oligonucleotide provider, it was found that the expression ratios were compressed greater than expected and data analysis revealed a high degree of non-specific binding. A series of experiments was conducted using rat mixed tissue RNA reference material (MTRRM) and other RNA samples to optimize the hybridization and washing conditions. The optimized hybridization and washing conditions greatly reduced the non-specific binding and improved the accuracy of spot intensity measurements. CONCLUSION: The results from the optimized hybridization and washing conditions greatly improved the reproducibility and accuracy of expression ratios. These experiments also suggested the importance of probe designs using better bioinformatics approaches and the need for common reference RNA samples for platform performance evaluation in order to fulfill the potential of DNA microarray technology

    A multivariate prediction model for microarray cross-hybridization

    Get PDF
    BACKGROUND: Expression microarray analysis is one of the most popular molecular diagnostic techniques in the post-genomic era. However, this technique faces the fundamental problem of potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate predictive modeling has been performed to understand how multiple variables contribute to (cross-) hybridization. RESULTS: We propose a systematic search strategy using multiple multivariate models [multiple linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an effective set of predictors for hybridization. We validate this approach on a set of DNA microarrays with cytochrome p450 family genes. The performance of our multiple multivariate models is compared with that of a recently proposed third-order polynomial regression method that uses percent identity as the sole predictor. All multivariate models agree that the 'most contiguous base pairs between probe and target sequences,' rather than percent identity, is the best univariate predictor. The predictive power is improved by inclusion of additional nonlinear effects, in particular target GC content, when regression trees or ANNs are used. CONCLUSION: A systematic multivariate approach is provided to assess the importance of multiple sequence features for hybridization and of relationships among these features. This approach can easily be applied to larger datasets. This will allow future developments of generalized hybridization models that will be able to correct for false-positive cross-hybridization signals in expression experiments

    Derivation of species-specific hybridization-like knowledge out of cross-species hybridization results

    Get PDF
    BACKGROUND: One of the approaches for conducting genomics research in organisms without extant microarray platforms is to profile their expression patterns by using Cross-Species Hybridization (CSH). Several different studies using spotted microarray and CSH produced contradicting conclusions in the ability of CSH to reflect biological processes described by species-specific hybridization (SSH). RESULTS: We used a tomato-spotted cDNA microarray to examine the ability of CSH to reflect SSH data. Potato RNA was hybridized to spotted cDNA tomato and potato microarrays to generate CSH and SSH data, respectively. Difficulties arose in obtaining transcriptomic data from CSH that reflected those obtained from SSH. Nevertheless, once the data was filtered for those corresponding to matching probe sets, by restricting proper cutoffs of probe homology, the CSH transcriptome data showed improved reflection of those of the SSH. CONCLUSIONS: This study evaluated the relative performance of CSH compared to SSH, and proposes methods to ensure that CSH closely reflects the biological process analyzed by SSH

    Optimal cDNA microarray design using expressed sequence tags for organisms with limited genomic information

    No full text
    Abstract Background Expression microarrays are increasingly used to characterize environmental responses and host-parasite interactions for many different organisms. Probe selection for cDNA microarrays using expressed sequence tags (ESTs) is challenging due to high sequence redundancy and potential cross-hybridization between paralogous genes. In organisms with limited genomic information, like marine organisms, this challenge is even greater due to annotation uncertainty. No general tool is available for cDNA microarray probe selection for these organisms. Therefore, the goal of the design procedure described here is to select a subset of ESTs that will minimize sequence redundancy and characterize potential cross-hybridization while providing functionally representative probes. Results Sequence similarity between ESTs, quantified by the E-value of pair-wise alignment, was used as a surrogate for expected hybridization between corresponding sequences. Using this value as a measure of dissimilarity, sequence redundancy reduction was performed by hierarchical cluster analyses. The choice of how many microarray probes to retain was made based on an index developed for this research: a sequence diversity index (SDI) within a sequence diversity plot (SDP). This index tracked the decreasing within-cluster sequence diversity as the number of clusters increased. For a given stage in the agglomeration procedure, the EST having the highest similarity to all the other sequences within each cluster, the centroid EST, was selected as a microarray probe. A small dataset of ESTs from Atlantic white shrimp (Litopenaeus setiferus) was used to test this algorithm so that the detailed results could be examined. The functional representative level of the selected probes was quantified using Gene Ontology (GO) annotations. Conclusions For organisms with limited genomic information, combining hierarchical clustering methods to analyze ESTs can yield an optimal cDNA microarray design. If biomarker discovery is the goal of the microarray experiments, the average linkage method is more effective, while single linkage is more suitable if identification of physiological mechanisms is more of interest. This general design procedure is not limited to designing single-species cDNA microarrays for marine organisms, and it can equally be applied to multiple-species microarrays of any organisms with limited genomic information.</p

    Decoding heterogeneous big data in an integrative way

    Get PDF
    Biotechnologies in post-genomic era, especially those that generate data in high-throughput, bring opportunities and challenges that are never faced before. And one of them is how to decode big heterogeneous data for clues that are useful for biological questions. With the exponential growth of a variety of data, comes with more and more applications of systematic approaches that investigate biological questions in an integrative way. Systematic approaches inherently require integration of heterogeneous information, which is urgently calling for a lot more efforts. In this thesis, the effort is mainly devoted to the development of methods and tools that help to integrate big heterogeneous information. In Chapter 2, we employed a heuristic strategy to summarize/integrate genes that are essential for the determination of mouse retinal cells in the format of network. These networks with experimental evidence could be rediscovered in the analysis of high-throughput data set and thus would be useful in the leverage of high-throughput data. In Chapter 3, we described EnRICH, a tool that we developed to help qualitatively integrate heterogeneous intro-organism information. We also introduced how EnRICH could be applied to the construction of a composite network from different sources, and demonstrated how we used EnRICH to successfully prioritize retinal disease genes. Following the work of Chapter 3 (intro-organism information integration), in Chapter 4 we stepped to the development of method and tool that can help deal with inter-organism information integration. The method we proposed is able to match genes in a one-to-one fashion between any two genomes. In summary, this thesis contributes to integrative analysis of big heterogeneous data by its work on the integration of intro- and inter-organism information