551 research outputs found

    Doubly stochastic continuous-time hidden Markov approach for analyzing genome tiling arrays

    Full text link
    Microarrays have been developed that tile the entire nonrepetitive genomes of many different organisms, allowing for the unbiased mapping of active transcription regions or protein binding sites across the entire genome. These tiling array experiments produce massive correlated data sets that have many experimental artifacts, presenting many challenges to researchers that require innovative analysis methods and efficient computational algorithms. This paper presents a doubly stochastic latent variable analysis method for transcript discovery and protein binding region localization using tiling array data. This model is unique in that it considers actual genomic distance between probes. Additionally, the model is designed to be robust to cross-hybridized and nonresponsive probes, which can often lead to false-positive results in microarray experiments. We apply our model to a transcript finding data set to illustrate the consistency of our method. Additionally, we apply our method to a spike-in experiment that can be used as a benchmark data set for researchers interested in developing and comparing future tiling array methods. The results indicate that our method is very powerful, accurate and can be used on a single sample and without control experiments, thus defraying some of the overhead cost of conducting experiments on tiling arrays.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS248 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Getting Started in Tiling Microarray Analysis

    Get PDF

    Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species

    Get PDF
    We developed a fast, integrative pipeline to identify cis natural antisense transcripts (cis-NATs) at genome scale. The pipeline mapped mRNAs and ESTs in UniGene to genome sequences in GoldenPath to find overlapping transcripts and combining information from coding sequence, poly(A) signal, poly(A) tail and splicing sites to deduce transcription orientation. We identified cis-NATs in 10 eukaryotic species, including 7830 candidate sense–antisense (SA) genes in 3915 SA pairs in human. The abundance of SA genes is remarkably low in worm and does not seem to be caused by the prevalence of operons. Hundreds of SA pairs are conserved across different species, even maintaining the same overlapping patterns. The convergent SA class is prevalent in fly, worm and sea squirt, but not in human or mouse as reported previously. The percentage of SA genes among imprinted genes in human and mouse is 24–47%, a range between the two previous reports. There is significant shortage of SA genes on Chromosome X in human and mouse but not in fly or worm, supporting X-inactivation in mammals as a possible cause. SA genes are over-represented in the catalytic activities and basic metabolism functions. All candidate cis-NATs can be downloaded from

    Computational Discovery of Gene Regulatory Binding Motifs: A Bayesian Perspective

    Get PDF
    The Bayesian approach together with Markov chain Monte Carlo techniques has provided an attractive solution to many important bioinformatics problems such as multiple sequence alignment, microarray analysis and the discovery of gene regulatory binding motifs. The employment of such methods and, more broadly, explicit statistical modeling, has revolutionized the field of computational biology. After reviewing several heuristics-based computational methods, this article presents a systematic account of Bayesian formulations and solutions to the motif discovery problem. Generalizations are made to further enhance the Bayesian approach. Motivated by the need of a speedy algorithm, we also provide a perspective of the problem from the viewpoint of optimizing a scoring function. We observe that scoring functions resulting from proper posterior distributions, or approximations to such distributions, showed the best performance and can be used to improve upon existing motif-finding programs. Simulation analyses and a real-data example are used to support our observation

    CEAS: cis-regulatory element annotation system

    Get PDF
    The recent availability of high-density human genome tiling arrays enables biologists to conduct ChIP–chip experiments to locate the in vivo-binding sites of transcription factors in the human genome and explore the regulatory mechanisms. Once genomic regions enriched by transcription factor ChIP–chip are located, genome-scale downstream analyses are crucial but difficult for biologists without strong bioinformatics support. We designed and implemented the first web server to streamline the ChIP–chip downstream analyses. Given genome-scale ChIP regions, the cis-regulatory element annotation system (CEAS) retrieves repeat-masked genomic sequences, calculates GC content, plots evolutionary conservation, maps nearby genes and identifies enriched transcription factor-binding motifs. Biologists can utilize CEAS to retrieve useful information for ChIP–chip validation, assemble important knowledge to include in their publication and generate novel hypotheses (e.g. transcription factor cooperative partner) for further study. CEAS helps the adoption of ChIP–chip in mammalian systems and provides insights towards a more comprehensive understanding of transcriptional regulatory mechanisms. The URL of the server is

    Dynamic Fano Resonance of Quasienergy Excitons in Superlattices

    Full text link
    The dynamic Fano resonance (DFR) between discrete quasienergy excitons and sidebands of their ionization continua is predicted and investigated in dc- and ac-driven semiconductor superlattices. This DFR, well controlled by the ac field, delocalizes the excitons and opens an intrinsic decay channel in nonlinear four-wave mixing signals.Comment: 4pages, 4figure

    A systematic approach identifies FOXA1 as a key factor in the loss of epithelial traits during the epithelial-to-mesenchymal transition in lung cancer

    Get PDF
    Background: The epithelial-to-mesenchymal transition is an important mechanism in cancer metastasis. Although transcription factors including SNAIL, SLUG, and TWIST1 regulate the epithelial-to-mesenchymal transition, other unknown transcription factors could also be involved. Identification of the full complement of transcription factors is essential for a more complete understanding of gene regulation in this process. Chromatin immunoprecipitation-sequencing (ChIP-Seq) technologies have been used to detect genome-wide binding of transcription factors; here, we developed a systematic approach to integrate existing ChIP-Seq and transcriptome data. We scanned multiple transcription factors to investigate their functional impact on the epithelial-to-mesenchymal transition in the human A549 lung adenocarcinoma cell line. Results: Among the transcription factors tested, impact scores identified the forkhead box protein A1 (FOXA1) as the most significant transcription factor in the epithelial-to-mesenchymal transition. FOXA1 physically associates with the promoters of its predicted target genes. Several critical epithelial-to-mesenchymal transition effectors involved in cellular adhesion and cellular communication were identified in the regulatory network of FOXA1, including FOXA2, FGA, FGB, FGG, and FGL1. The implication of FOXA1 in the epithelial-to-mesenchymal transition via its regulatory network indicates that FOXA1 may play an important role in the initiation of lung cancer metastasis. Conclusions: We identified FOXA1 as a potentially important transcription factor and negative regulator in the initial stages of lung cancer metastasis. FOXA1 may modulate the epithelial-to-mesenchymal transition via its transcriptional regulatory network. Further, this study demonstrates how ChIP-Seq and expression data could be integrated to delineate the impact of transcription factors on a specific biological process

    Model-based analysis of two-color arrays (MA2C)

    Get PDF
    A normalization method based on probe GC content for two-color tiling arrays and an algorithm for detecting peak regions are presented. They are available in a stand-alone Java program

    MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens

    Get PDF
    We propose the Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK) method for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. MAGeCK demonstrates better performance compared with existing methods, identifies both positively and negatively selected genes simultaneously, and reports robust results across different experimental conditions. Using public datasets, MAGeCK identified novel essential genes and pathways, including EGFR in vemurafenib-treated A375 cells harboring a BRAF mutation. MAGeCK also detected cell type-specific essential genes, including BCR and ABL1, in KBM7 cells bearing a BCR-ABL fusion, and IGF1R in HL-60 cells, which depends on the insulin signaling pathway for proliferation. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0554-4) contains supplementary material, which is available to authorized users
    • …
    corecore