30 research outputs found

    Optimized mixed Markov models for motif identification

    Get PDF
    BACKGROUND: Identifying functional elements, such as transcriptional factor binding sites, is a fundamental step in reconstructing gene regulatory networks and remains a challenging issue, largely due to limited availability of training samples. RESULTS: We introduce a novel and flexible model, the Optimized Mixture Markov model (OMiMa), and related methods to allow adjustment of model complexity for different motifs. In comparison with other leading methods, OMiMa can incorporate more than the NNSplice's pairwise dependencies; OMiMa avoids model over-fitting better than the Permuted Variable Length Markov Model (PVLMM); and OMiMa requires smaller training samples than the Maximum Entropy Model (MEM). Testing on both simulated and actual data (regulatory cis-elements and splice sites), we found OMiMa's performance superior to the other leading methods in terms of prediction accuracy, required size of training data or computational time. Our OMiMa system, to our knowledge, is the only motif finding tool that incorporates automatic selection of the best model. OMiMa is freely available at [1]. CONCLUSION: Our optimized mixture of Markov models represents an alternative to the existing methods for modeling dependent structures within a biological motif. Our model is conceptually simple and effective, and can improve prediction accuracy and/or computational speed over other leading methods

    Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models

    Get PDF
    Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions

    Nanomechanical properties of α-synuclein amyloid fibrils: a comparative study by nanoindentation, harmonic force microscopy, and Peakforce QNM

    Get PDF
    We report on the use of three different atomic force spectroscopy modalities to determine the nanomechanical properties of amyloid fibrils of the human α-synuclein protein. α-Synuclein forms fibrillar nanostructures of approximately 10 nm diameter and lengths ranging from 100 nm to several microns, which have been associated with Parkinson's disease. Atomic force microscopy (AFM) has been used to image the morphology of these protein fibrils deposited on a flat surface. For nanomechanical measurements, we used single-point nanoindentation, in which the AFM tip as the indenter is moved vertically to the fibril surface and back while the force is being recorded. We also used two recently developed AFM surface property mapping techniques: Harmonic force microscopy (HarmoniX) and Peakforce QNM. These modalities allow extraction of mechanical parameters of the surface with a lateral resolution and speed comparable to tapping-mode AFM imaging. Based on this phenomenological study, the elastic moduli of the α-synuclein fibrils determined using these three different modalities are within the range 1.3-2.1 GPa. We discuss the relative merits of these three methods for the determination of the elastic properties of protein fibrils, particularly considering the differences and difficulties of each method

    Non-Coding RNA Prediction and Verification in Saccharomyces cerevisiae

    Get PDF
    Non-coding RNA (ncRNA) play an important and varied role in cellular function. A significant amount of research has been devoted to computational prediction of these genes from genomic sequence, but the ability to do so has remained elusive due to a lack of apparent genomic features. In this work, thermodynamic stability of ncRNA structural elements, as summarized in a Z-score, is used to predict ncRNA in the yeast Saccharomyces cerevisiae. This analysis was coupled with comparative genomics to search for ncRNA genes on chromosome six of S. cerevisiae and S. bayanus. Sets of positive and negative control genes were evaluated to determine the efficacy of thermodynamic stability for discriminating ncRNA from background sequence. The effect of window sizes and step sizes on the sensitivity of ncRNA identification was also explored. Non-coding RNA gene candidates, common to both S. cerevisiae and S. bayanus, were verified using northern blot analysis, rapid amplification of cDNA ends (RACE), and publicly available cDNA library data. Four ncRNA transcripts are well supported by experimental data (RUF10, RUF11, RUF12, RUF13), while one additional putative ncRNA transcript is well supported but the data are not entirely conclusive. Six candidates appear to be structural elements in 5â€Č or 3â€Č untranslated regions of annotated protein-coding genes. This work shows that thermodynamic stability, coupled with comparative genomics, can be used to predict ncRNA with significant structural elements

    Abnormal Dosage Compensation of Reporter Genes Driven by the Drosophila Glass Multiple Reporter (GMR) Enhancer-Promoter

    Get PDF
    In Drosophila melanogaster the male specific lethal (MSL) complex is required for upregulation of expression of most X-linked genes in males, thereby achieving X chromosome dosage compensation. The MSL complex is highly enriched across most active X-linked genes with a bias towards the 3â€Č end. Previous studies have shown that gene transcription facilitates MSL complex binding but the type of promoter did not appear to be important. We have made the surprising observation that genes driven by the glass multiple reporter (GMR) enhancer-promoter are not dosage compensated at X-linked sites. The GMR promoter is active in all cells in, and posterior to, the morphogenetic furrow of the developing eye disc. Using phiC31 integrase-mediated targeted integration, we measured expression of lacZ reporter genes driven by either the GMR or armadillo (arm) promoters at each of three X-linked sites. At all sites, the arm-lacZ reporter gene was dosage compensated but GMR-lacZ was not. We have investigated why GMR-driven genes are not dosage compensated. Earlier or constitutive expression of GMR-lacZ did not affect the level of compensation. Neither did proximity to a strong MSL binding site. However, replacement of the hsp70 minimal promoter with a minimal promoter from the X-linked 6-Phosphogluconate dehydrogenase gene did restore partial dosage compensation. Similarly, insertion of binding sites for the GAGA and DREF factors upstream of the GMR promoter led to significantly higher lacZ expression in males than females. GAGA and DREF have been implicated to play a role in dosage compensation. We conclude that the gene promoter can affect MSL complex-mediated upregulation and dosage compensation. Further, it appears that the nature of the basal promoter and the presence of binding sites for specific factors influence the ability of a gene promoter to respond to the MSL complex

    Elevational Patterns of Species Richness, Range and Body Size for Spiny Frogs

    Get PDF
    Quantifying spatial patterns of species richness is a core problem in biodiversity theory. Spiny frogs of the subfamily Painae (Anura: Dicroglossidae) are widespread, but endemic to Asia. Using spiny frog distribution and body size data, and a digital elevation model data set we explored altitudinal patterns of spiny frog richness and quantified the effect of area on the richness pattern over a large altitudinal gradient from 0–5000 m a.s.l. We also tested two hypotheses: (i) the Rapoport's altitudinal effect is valid for the Painae, and (ii) Bergmann's clines are present in spiny frogs. The species richness of Painae across four different altitudinal band widths (100 m, 200 m, 300 m and 400 m) all showed hump-shaped patterns along altitudinal gradient. The altitudinal changes in species richness of the Paini and Quasipaini tribes further confirmed this finding, while the peak of Quasipaini species richness occurred at lower elevations than the maxima of Paini. The area did not explain a significant amount of variation in total, nor Paini species richness, but it did explain variation in Quasipaini. Five distinct groups across altitudinal gradient were found. Species altitudinal ranges did not expand with an increase in the midpoints of altitudinal ranges. A significant negative correlation between body size and elevation was exhibited. Our findings demonstrate that Rapoport's altitudinal rule is not a compulsory attribute of spiny frogs and also suggest that Bergmann's rule is not generally applicable to amphibians. The study highlights a need to explore the underlying mechanisms of species richness patterns, particularly for amphibians in macroecology

    Integrating Diverse Datasets Improves Developmental Enhancer Prediction

    Get PDF
    Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology. © 2014 Erwin et al

    Measurement of the cross-section for producing a W boson in association with a single top quark in pp collisions at √s = 13 TeV with ATLAS

    Get PDF
    The inclusive cross-section for the associated production of a W boson and top quark is measured using data from proton-proton collisions at √ s = 13 TeV. The dataset corresponds to an integrated luminosity of 3.2 fb−1 , and was collected in 2015 by the ATLAS detector at the Large Hadron Collider at CERN. Events are selected requiring two opposite sign isolated leptons and at least one jet; they are separated into signal and control regions based on their jet multiplicity and the number of jets that are identified as containing b hadrons. The W t signal is then separated from the ttÂŻ background using boosted decision tree discriminants in two regions. The cross-section is extracted by fitting templates to the data distributions, and is measured to be σW t = 94±10 (stat.) +28 −22 (syst.)±2 (lumi.) pb. The measured value is in good agreement with the SM prediction of σtheory = 71.7±1.8 (scale)± 3.4 (PDF) pb [1]
    corecore