66 research outputs found

    Similarity Queries for Temporal Toxicogenomic Expression Profiles

    Get PDF
    We present an approach for answering similarity queries about gene expression time series that is motivated by the task of characterizing the potential toxicity of various chemicals. Our approach involves two key aspects. First, our method employs a novel alignment algorithm based on time warping. Our time warping algorithm has several advantages over previous approaches. It allows the user to impose fairly strong biases on the form that the alignments can take, and it permits a type of local alignment in which the entirety of only one series has to be aligned. Second, our method employs a relaxed spline interpolation to predict expression responses for unmeasured time points, such that the spline does not necessarily exactly fit every observed point. We evaluate our approach using expression time series from the Edge toxicology database. Our experiments show the value of using spline representations for sparse time series. More significantly, they show that our time warping method provides more accurate alignments and classifications than previous standard alignment methods for time series

    EDGE3: A web-based solution for management and analysis of Agilent two color microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ability to generate transcriptional data on the scale of entire genomes has been a boon both in the improvement of biological understanding and in the amount of data generated. The latter, the amount of data generated, has implications when it comes to effective storage, analysis and sharing of these data. A number of software tools have been developed to store, analyze, and share microarray data. However, a majority of these tools do not offer all of these features nor do they specifically target the commonly used two color Agilent DNA microarray platform. Thus, the motivating factor for the development of EDGE<sup>3 </sup>was to incorporate the storage, analysis and sharing of microarray data in a manner that would provide a means for research groups to collaborate on Agilent-based microarray experiments without a large investment in software-related expenditures or extensive training of end-users.</p> <p>Results</p> <p>EDGE<sup>3 </sup>has been developed with two major functions in mind. The first function is to provide a workflow process for the generation of microarray data by a research laboratory or a microarray facility. The second is to store, analyze, and share microarray data in a manner that doesn't require complicated software. To satisfy the first function, EDGE<sup>3 </sup>has been developed as a means to establish a well defined experimental workflow and information system for microarray generation. To satisfy the second function, the software application utilized as the user interface of EDGE<sup>3 </sup>is a web browser. Within the web browser, a user is able to access the entire functionality, including, but not limited to, the ability to perform a number of bioinformatics based analyses, collaborate between research groups through a user-based security model, and access to the raw data files and quality control files generated by the software used to extract the signals from an array image.</p> <p>Conclusion</p> <p>Here, we present EDGE<sup>3</sup>, an open-source, web-based application that allows for the storage, analysis, and controlled sharing of transcription-based microarray data generated on the Agilent DNA platform. In addition, EDGE<sup>3 </sup>provides a means for managing RNA samples and arrays during the hybridization process. EDGE<sup>3 </sup>is freely available for download at <url>http://edge.oncology.wisc.edu/</url>.</p

    Adaptation of a bioinformatics microarray analysis workflow for a toxicogenomic study in rainbow trout

    Get PDF
    Sex steroids play a key role in triggering sex differentiation in fish, the use of exogenous hormone treatment leading to partial or complete sex reversal. This phenomenon has attracted attention since the discovery that even low environmental doses of exogenous steroids can adversely affect gonad morphology (ovotestis development) and induce reproductive failure. Modern genomic-based technologies have enhanced opportunities to find out mechanisms of actions (MOA) and identify biomarkers related to the toxic action of a compound. However, high throughput data interpretation relies on statistical analysis, species genomic resources, and bioinformatics tools. The goals of this study are to improve the knowledge of feminisation in fish, by the analysis of molecular responses in the gonads of rainbow trout fry after chronic exposure to several doses (0.01, 0.1, 1 and 10 μg/L) of ethynylestradiol (EE2) and to offer target genes as potential biomarkers of ovotestis development. We successfully adapted a bioinformatics microarray analysis workflow elaborated on human data to a toxicogenomic study using rainbow trout, a fish species lacking accurate functional annotation and genomic resources. The workflow allowed to obtain lists of genes supposed to be enriched in true positive differentially expressed genes (DEGs), which were subjected to over-representation analysis methods (ORA). Several pathways and ontologies, mostly related to cell division and metabolism, sexual reproduction and steroid production, were found significantly enriched in our analyses. Moreover, two sets of potential ovotestis biomarkers were selected using several criteria. The first group displayed specific potential biomarkers belonging to pathways/ontologies highlighted in the experiment. Among them, the early ovarian differentiation gene foxl2a was overexpressed. The second group, which was highly sensitive but not specific, included the DEGs presenting the highest fold change and lowest p-value of the statistical workflow output. The methodology can be generalized to other (non-model) species and various types of microarray platforms

    Integrating Overlapping Structures and Background Information of Words Significantly Improves Biological Sequence Comparison

    Get PDF
    Word-based models have achieved promising results in sequence comparison. However, as the important statistical properties of words in biological sequence, how to use the overlapping structures and background information of the words to improve sequence comparison is still a problem. This paper proposed a new statistical method that integrates the overlapping structures and the background information of the words in biological sequences. To assess the effectiveness of this integration for sequence comparison, two sets of evaluation experiments were taken to test the proposed model. The first one, performed via receiver operating curve analysis, is the application of proposed method in discrimination between functionally related regulatory sequences and unrelated sequences, intron and exon. The second experiment is to evaluate the performance of the proposed method with f-measure for clustering Hepatitis E virus genotypes. It was demonstrated that the proposed method integrating the overlapping structures and the background information of words significantly improves biological sequence comparison and outperforms the existing models

    Technical Variables in High-Throughput miRNA Expression Profiling: Much Work Remains to Be Done

    Get PDF
    MicroRNA (miRNA) gene expression profiling has provided important insights into plant and animal biology. However, there has not been ample published work about pitfalls associated with technical parameters in miRNA gene expression profiling. One source of pertinent information about technical variables in gene expression profiling is the separate and more well-established literature regarding mRNA expression profiling. However, many aspects of miRNA biochemistry are unique. For example, the cellular processing and compartmentation of miRNAs, the differential stability of specific miRNAs, and aspects of global miRNA expression regulation require specific consideration. Additional possible sources of systematic bias in miRNA expression studies include the differential impact of pre-analytical variables, substrate specificity of nucleic acid processing enzymes used in labeling and amplification, and issues regarding new miRNA discovery and annotation. We conclude that greater focus on technical parameters is required to bolster the validity, reliability, and cultural credibility of miRNA gene expression profiling studies

    Integrative Systems Approaches Towards Brain Pharmacology and Polypharmacology

    Get PDF
    Polypharmacology is considered as the future of drug discovery and emerges as the next paradigm of drug discovery. The traditional drug design is primarily based on a “one target-one drug” paradigm. In polypharmacology, drug molecules always interact with multiple targets, and therefore it imposes new challenges in developing and designing new and effective drugs that are less toxic by eliminating the unexpected drug-target interactions. Although still in its infancy, the use of polypharmacology ideas appears to already have a remarkable impact on modern drug development. The current thesis is a detailed study on various pharmacology approaches at systems level to understand polypharmacology in complex brain and neurodegnerative disorders. The research work in this thesis focuses on the design and construction of a dedicated knowledge base for human brain pharmacology. This pharmacology knowledge base, referred to as the Human Brain Pharmacome (HBP) is a unique and comprehensive resource that aggregates data and knowledge around current drug treatments that are available for major brain and neurodegenerative disorders. The HBP knowledge base provides data at a single place for building models and supporting hypotheses. The HBP also incorporates new data obtained from similarity computations over drugs and proteins structures, which was analyzed from various aspects including network pharmacology and application of in-silico computational methods for the discovery of novel multi-target drug candidates. Computational tools and machine learning models were developed to characterize protein targets for their polypharmacological profiles and to distinguish indications specific or target specific drugs from other drugs. Systems pharmacology approaches towards drug property predictions provided a highly enriched compound library that was virtually screened against an array of network pharmacology based derived protein targets by combined docking and molecular dynamics simulation workflows. The developed approaches in this work resulted in the identification of novel multi-target drug candidates that are backed up by existing experimental knowledge, and propose repositioning of existing drugs, that are undergoing further experimental validations

    Gene Regulatory Network Analysis and Web-based Application Development

    Get PDF
    Microarray data is a valuable source for gene regulatory network analysis. Using earthworm microarray data analysis as an example, this dissertation demonstrates that a bioinformatics-guided reverse engineering approach can be applied to analyze time-series data to uncover the underlying molecular mechanism. My network reconstruction results reinforce previous findings that certain neurotransmitter pathways are the target of two chemicals - carbaryl and RDX. This study also concludes that perturbations to these pathways by sublethal concentrations of these two chemicals were temporary, and earthworms were capable of fully recovering. Moreover, differential networks (DNs) analysis indicates that many pathways other than those related to synaptic and neuronal activities were altered during the exposure phase. A novel differential networks (DNs) approach is developed in this dissertation to connect pathway perturbation with toxicity threshold setting from Live Cell Array (LCA) data. Findings from this proof-of-concept study suggest that this DNs approach has a great potential to provide a novel and sensitive tool for threshold setting in chemical risk assessment. In addition, a web-based tool “Web-BLOM” was developed for the reconstruction of gene regulatory networks from time-series gene expression profiles including microarray and LCA data. This tool consists of several modular components: a database, the gene network reconstruction model and a user interface. The Bayesian Learning and Optimization Model (BLOM), originally implemented in MATLAB, was adopted by Web-BLOM to provide an online reconstruction of large-scale gene regulation networks. Compared to other network reconstruction models, BLOM can infer larger networks with compatible accuracy, identify hub genes and is much more computationally efficient

    Modelling-based experiment retrieval: A case study with gene expression clustering

    Get PDF
    Motivation: Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case vs. control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. Results: We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework, where each experiment is modelled separately and the retrieval is done by finding related models. For retrieval of gene expression experiments, we use a probabilistic model called product partition model, which induces a clustering of genes that show similar expression patterns across a number of samples. The suggested metric for retrieval using clusterings is the normalized information distance. Empirical results finally suggest that inference for the full probabilistic model can be approximated with good performance using computationally faster heuristic clustering approaches (e.g. kk-means). The method is highly scalable and straightforward to apply to construct a general-purpose gene expression experiment retrieval method. Availability: The method can be implemented using standard clustering algorithms and normalized information distance, available in many statistical software packages.Comment: Updated figures. The final version of this article will appear in Bioinformatics (https://bioinformatics.oxfordjournals.org/
    corecore