1,267 research outputs found

    GenomeGraphs: integrated genomic data visualization with R.

    Get PDF
    BackgroundBiological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses.ResultsWe developed GenomeGraphs, as an add-on software package for the statistical programming environment R, to facilitate integrated visualization of genomic datasets. GenomeGraphs uses the biomaRt package to perform on-line annotation queries to Ensembl and translates these to gene/transcript structures in viewports of the grid graphics package. This allows genomic annotation to be plotted together with experimental data. GenomeGraphs can also be used to plot custom annotation tracks in combination with different experimental data types together in one plot using the same genomic coordinate system.ConclusionGenomeGraphs is a flexible and extensible software package which can be used to visualize a multitude of genomic datasets within the statistical programming environment R

    Applying weighted network measures to microarray distance matrices

    Full text link
    In recent work we presented a new approach to the analysis of weighted networks, by providing a straightforward generalization of any network measure defined on unweighted networks. This approach is based on the translation of a weighted network into an ensemble of edges, and is particularly suited to the analysis of fully connected weighted networks. Here we apply our method to several such networks including distance matrices, and show that the clustering coefficient, constructed by using the ensemble approach, provides meaningful insights into the systems studied. In the particular case of two data sets from microarray experiments the clustering coefficient identifies a number of biologically significant genes, outperforming existing identification approaches.Comment: Accepted for publication in J. Phys.

    Non-equilibrium dynamics of gene expression and the Jarzynski equality

    Full text link
    In order to express specific genes at the right time, the transcription of genes is regulated by the presence and absence of transcription factor molecules. With transcription factor concentrations undergoing constant changes, gene transcription takes place out of equilibrium. In this paper we discuss a simple mapping between dynamic models of gene expression and stochastic systems driven out of equilibrium. Using this mapping, results of nonequilibrium statistical mechanics such as the Jarzynski equality and the fluctuation theorem are demonstrated for gene expression dynamics. Applications of this approach include the determination of regulatory interactions between genes from experimental gene expression data

    Integrating biological knowledge into variable selection : an empirical Bayes approach with an application in cancer biology

    Get PDF
    Background: An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. Results: We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. Conclusions: The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge

    Modelling time course gene expression data with finite mixtures of linear additive models

    Get PDF
    Summary: A model class of finite mixtures of linear additive models is presented. The component-specific parameters in the regression models are estimated using regularized likelihood methods. The advantages of the regularization are that (i) the pre-specified maximum degrees of freedom for the splines is less crucial than for unregularized estimation and that (ii) for each component individually a suitable degree of freedom is selected in an automatic way. The performance is evaluated in a simulation study with artificial data as well as on a yeast cell cycle dataset of gene expression levels over time

    A phylogeographic and population genetic analysis of a widespread, sedentary North American bird: The Hairy Woodpecker (Picoides villosus)

    Full text link
    The Hairy Woodpecker (Picoides villosus) has one of the broadest breeding distributions of any North American bird and is also one of the most morphologically variable with as many as 21 described subspecies. This wide distribution and high degree of phenotypic diversity suggests the presence of underlying genetic structure. We used ND2 sequence from 296 individuals from 89 localities throughout the Hairy Woodpecker distribution to address this question and to explore this species’ evolutionary history. Phylogenetic analyses identified three main Hairy Woodpecker clades, each ~1.5% divergent from one another. One clade was comprised of birds from boreal and eastern zones of North America (N&E); the second, of birds from western and southwestern North America (S&W), and the third included only birds from a disjunct population in Costa Rica and Panama. Population genetic analyses and climatic niche models indicated that the N&E and S&W clades have very different recent evolutionary histories. Populations in the N&E are characterized by a lack of genetic structure and a genetic signature of recent population expansion. In contrast, S&W populations are highly structured and relative population stability was inferred. The S&W clade is further structured into three additional geographically and genetically isolated groups: Pacific Coast ranges, interior ranges, and southern Mexico. The continental scale patterns of genetic variation observed suggest that the complex topography of the montane west has probably been more important than latitude in generating phylogenetic diversity within this species

    The XBabelPhish MAGE-ML and XML Translator

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MAGE-ML has been promoted as a standard format for describing microarray experiments and the data they produce. Two characteristics of the MAGE-ML format compromise its use as a universal standard: First, MAGE-ML files are exceptionally large – too large to be easily read by most people, and often too large to be read by most software programs. Second, the MAGE-ML standard permits many ways of representing the same information. As a result, different producers of MAGE-ML create different documents describing the same experiment and its data. Recognizing all the variants is an unwieldy software engineering task, resulting in software packages that can read and process MAGE-ML from some, but not all producers. This Tower of MAGE-ML Babel bars the unencumbered exchange of microarray experiment descriptions couched in MAGE-ML.</p> <p>Results</p> <p>We have developed XBabelPhish – an XQuery-based technology for translating one MAGE-ML variant into another. XBabelPhish's use is not restricted to translating MAGE-ML documents. It can transform XML files independent of their DTD, XML schema, or semantic content. Moreover, it is designed to work on very large (> 200 Mb.) files, which are common in the world of MAGE-ML.</p> <p>Conclusion</p> <p>XBabelPhish provides a way to inter-translate MAGE-ML variants for improved interchange of microarray experiment information. More generally, it can be used to transform most XML files, including very large ones that exceed the capacity of most XML tools.</p

    Dynamics of gene expression and the regulatory inference problem

    Full text link
    From the response to external stimuli to cell division and death, the dynamics of living cells is based on the expression of specific genes at specific times. The decision when to express a gene is implemented by the binding and unbinding of transcription factor molecules to regulatory DNA. Here, we construct stochastic models of gene expression dynamics and test them on experimental time-series data of messenger-RNA concentrations. The models are used to infer biophysical parameters of gene transcription, including the statistics of transcription factor-DNA binding and the target genes controlled by a given transcription factor.Comment: revised version to appear in Europhys. Lett., new titl
    corecore