221 research outputs found

    pGQL: A probabilistic graphical query language for gene expression time courses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data. As queries can be very easily defined, an exploratory analysis of time course data is greatly facilitated. While timeboxes are effective, they have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. In particular, this is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. From a data mining point of view the robust handling of data through a sound statistical model is of great importance.</p> <p>Results</p> <p>We propose probabilistic timeboxes, which correspond to a specific class of Hidden Markov Models, that constitutes an established method in data mining. Since HMMs are a particular class of probabilistic graphical models we call our method Probabilistic Graphical Query Language. Its implementation was realized in the free software package pGQL. We evaluate its effectiveness in exploratory analysis on a yeast sporulation data set.</p> <p>Conclusions</p> <p>We introduce a new approach to define dynamic, statistical queries on time course data. It supports an interactive exploration of reasonably large amounts of data and enables users without expert knowledge to specify fairly complex statistical models with ease. The expressivity of our approach is by its statistical nature greater and more robust with respect to amplitude and frequency fluctuation than the prior, deterministic timeboxes.</p

    apex: phylogenetics with multiple genes.

    Get PDF
    Genetic sequences of multiple genes are becoming increasingly common for a wide range of organisms including viruses, bacteria and eukaryotes. While such data may sometimes be treated as a single locus, in practice, a number of biological and statistical phenomena can lead to phylogenetic incongruence. In such cases, different loci should, at least as a preliminary step, be examined and analysed separately. The r software has become a popular platform for phylogenetics, with several packages implementing distance-based, parsimony and likelihood-based phylogenetic reconstruction, and an even greater number of packages implementing phylogenetic comparative methods. Unfortunately, basic data structures and tools for analysing multiple genes have so far been lacking, thereby limiting potential for investigating phylogenetic incongruence. In this study, we introduce the new r package apex to fill this gap. apex implements new object classes, which extend existing standards for storing DNA and amino acid sequences, and provides a number of convenient tools for handling, visualizing and analysing these data. In this study, we introduce the main features of the package and illustrate its functionalities through the analysis of a simple data set

    Conformational rearrangements upon start codon recognition in human 48S translation initiation complex

    Get PDF
    Selection of the translation start codon is a key step during protein synthesis in human cells. We obtained cryo-EM structures of human 48S initiation complexes and characterized the intermediates of codon recognition by kinetic methods using eIF1A as a reporter. Both approaches capture two distinct ribosome populations formed on an mRNA with a cognate AUG codon in the presence of eIF1, eIF1A, eIF2–GTP–Met-tRNAiMet and eIF3. The ‘open’ 40S subunit conformation differs from the human 48S scanning complex and represents an intermediate preceding the codon recognition step. The ‘closed’ form is similar to reported structures of complexes from yeast and mammals formed upon codon recognition, except for the orientation of eIF1A, which is unique in our structure. Kinetic experiments show how various initiation factors mediate the population distribution of open and closed conformations until 60S subunit docking. Our results provide insights into the timing and structure of human translation initiation intermediates and suggest the differences in the mechanisms of start codon selection between mammals and yeast

    Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

    Get PDF
    Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns. Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results. Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance

    Fast MCMC sampling for hidden markov models to determine copy number variations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems.</p> <p>Results</p> <p>We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by <it>kd</it>-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling.</p> <p>Conclusions</p> <p>We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches.</p> <p><it>Availability: </it>An implementation of our method will be made available as part of the open source GHMM library from <url>http://ghmm.org</url>.</p

    Novel selective ÎČ1-adrenoceptor antagonists for concomitant cardiovascular and respiratory disease

    Get PDF
    ÎČ-Blockers reduce mortality and improve symptoms in people with heart disease. However, current clinically available ÎČ-blockers have poor selectivity for the cardiac ÎČ1-adrenoceptor (AR) over the lung ÎČ2-AR. Unwanted ÎČ2-blockade risks causing life-threatening bronchospasm and a reduction in the efficacy of ÎČ2-agonist emergency rescue therapy. Thus current life-prolonging ÎČ-blockers are contraindicated in people with both heart disease and asthma. Here we describe NDD-713 and NDD-825, novel highly ÎČ1-selective neutral antagonists with good pharmaceutical properties that can potentially overcome this limitation. Radioligand binding studies and functional assays using human receptors expressed in CHO cells demonstrate that NDD-713 and NDD-825 have nanomolar ÎČ1-AR affinity, greater than 500-fold ÎČ1-AR vs ÎČ2-AR selectivity and no agonism. Studies in conscious rats demonstrated that they are orally bioavailable and cause pronounced ÎČ1-mediated reduction of heart rate while showing no effect on ÎČ2-mediated hindquarters vasodilatation. The compounds also have good disposition properties and show no adverse toxicological effects. They potentially offer a truly cardioselective ÎČ-blocker therapy for the large number of people with heart and respiratory, or peripheral vascular comorbidities

    Dynamic metabolomic data analysis: a tutorial review

    Get PDF
    In metabolomics, time-resolved, dynamic or temporal data is more and more collected. The number of methods to analyze such data, however, is very limited and in most cases the dynamic nature of the data is not even taken into account. This paper reviews current methods in use for analyzing dynamic metabolomic data. Moreover, some methods from other fields of science that may be of use to analyze such dynamic metabolomics data are described in some detail. The methods are put in a general framework after providing a formal definition on what constitutes a ‘dynamic’ method. Some of the methods are illustrated with real-life metabolomics examples

    Multiconstrained gene clustering based on generalized projections

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem.</p> <p>Results</p> <p>We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods.</p> <p>Conclusions</p> <p>The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions.</p

    Computational archaeology of the Pristionchus pacificus genome reveals evidence of horizontal gene transfers from insects

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The recent sequencing of nematode genomes has laid the basis for comparative genomics approaches to study the impact of horizontal gene transfer (HGT) on the adaptation to new environments and the evolution of parasitism. In the beetle associated nematode <it>Pristionchus pacificus </it>HGT events were found to involve cellulase genes of microbial origin and Diapausin genes that are known from beetles, but not from other nematodes. The insect-to-nematode horizontal transfer is of special interest given that <it>P. pacificus </it>shows a tight association with insects.</p> <p>Results</p> <p>In this study we utilized the observation that horizontally transferred genes often exhibit codon usage patterns more similar to that of the donor than that of the acceptor genome. We introduced GC-normalized relative codon frequencies as a measure to detect characteristic features of <it>P. pacificus </it>orphan genes that show no homology to other nematode genes. We found that atypical codon usage is particularly prevalent in <it>P. pacificus </it>orphans. By comparing codon usage profiles of 71 species, we detected the most significant enrichment in insect-like codon usage profiles. In cross-species comparisons, we identified 509 HGT candidates that show a significantly higher similarity to insect-like profiles than genes with nematode homologs. The most abundant gene family among these genes are non-LTR retrotransposons. Speculating that retrotransposons might have served as carriers of foreign genetic material, we found a significant local clustering tendency of orphan genes in the vicinity of retrotransposons.</p> <p>Conclusions</p> <p>Our study combined codon usage bias, phylogenetic analysis, and genomic colocalization into a general picture of the computational archaeology of the <it>P. pacificus </it>genome and suggests that a substantial fraction of the gene repertoire is of insect origin. We propose that the <it>Pristionchus</it>-beetle association has facilitated HGT and discuss potential vectors of these events.</p

    A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called <it>biclustering</it>. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed.</p> <p>Methods</p> <p>We introduce <it>BiMine</it>, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, <it>BiMine </it>relies on a new evaluation function called <it>Average Spearman's rho </it>(ASR). Second, <it>BiMine </it>uses a new tree structure, called <it>Bicluster Enumeration Tree </it>(BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, <it>BiMine </it>introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters.</p> <p>Results</p> <p>The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that <it>BiMine </it>competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.</p
    • 

    corecore