79 research outputs found

    cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

    Get PDF
    The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4)

    BM-BC: A Bayesian Method of Base Calling for Solexa Sequence Data

    Get PDF
    Base calling is a critical step in the Solexa next-generation sequencing procedure. It compares the position-specific intensity measurements that reflect the signal strength of four possible bases (A, C, G, T) at each genomic position, and outputs estimates of the true sequences for short reads of DNA or RNA. We present a Bayesian method of base calling, BM-BC, for Solexa-GA sequencing data. The Bayesian method builds on a hierarchical model that accounts for three sources of noise in the data, which are known to affect the accuracy of the base calls: fading, phasing, and cross-talk between channels. We show that the new method improves the precision of base calling compared with currently leading methods. Furthermore, the proposed method provides a probability score that measures the confidence of each base call. This probability score can be used to estimate the false discovery rate of the base calling or to rank the precision of the estimated DNA sequences, which in turn can be useful for downstream analysis such as sequence alignment.NIH/NCI R01 CA132897, K25 CA123344FONDECYT 1100010Institute for Computational Engineering and Sciences (ICES

    fREDUCE: Detection of degenerate regulatory elements using correlation with expression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The precision of transcriptional regulation is made possible by the specificity of physical interactions between transcription factors and their cognate binding sites on DNA. A major challenge is to decipher transcription factor binding sites from sequence and functional genomic data using computational means. While current methods can detect strong binding sites, they are less sensitive to degenerate motifs.</p> <p>Results</p> <p>We present fREDUCE, a computational method specialized for the detection of weak or degenerate binding motifs from gene expression or ChIP-chip data. fREDUCE is built upon the widely applied program REDUCE, which elicits motifs by global statistical correlation of motif counts with expression data. fREDUCE introduces several algorithmic refinements that allow efficient exhaustive searches of oligonucleotides with a specified number of degenerate IUPAC symbols. On yeast ChIP-chip benchmarks, fREDUCE correctly identified motifs and their degeneracies with accuracies greater than its predecessor REDUCE as well as other known motif-finding programs. We have also used fREDUCE to make novel motif predictions for transcription factors with poorly characterized binding sites.</p> <p>Conclusion</p> <p>We demonstrate that fREDUCE is a valuable tool for the prediction of degenerate transcription factor binding sites, especially from array datasets with weak signals that may elude other motif detection methods.</p

    Charge and spin dynamics of the Hubbard chains

    Full text link
    We calculate the local correlation functions of charge and spin for the one-chain and two-chain Hubbard model using the density matrix renormalization group method and the recursion technique. Keeping only finite number of states we get good accuracy for the low energy excitations. We study the charge and spin gaps, bandwidths and weights of the spectra for various values of the on-site Coulomb interaction U and the electron filling. In the low energy part, the local correlation functions are different for the charge and spin. The bandwidths are proportional to t for the charge and J for the spin, respectively.Comment: 19 latex pages, 9 figure

    Phase diagram of the two-chain Hubbard model

    Full text link
    We have calculated the charge gap and spin gap for the two-chain Hubbard model as a function of the on-site Coulomb interaction and the interchain hopping amplitude. We used the density matrix renormalization group method and developed a method to calculate separately the gaps numerically for the symmetric and antisymmetric modes with respect to the exchange of the chain indices. We have found very different behaviors for the weak and strong interaction cases. Our calculated phase diagram is compared to the one obtained by Balents and Fisher using the weak coupling renormalization group technique.Comment: 4 pages, 6 figures, to appear in PR

    Nonparametric Bayesian Bi-Clustering for Next Generation Sequencing Count Data

    Get PDF
    Histone modifications (HMs) play important roles in transcription through post-translational modifications. Combinations of HMs, known as chromatin signatures, encode specific messages for gene regulation. We therefore expect that inference on possible clustering of HMs and an annotation of genomic locations on the basis of such clustering can contribute new insights about the functions of regulatory elements and their relationships to combinations of HMs. We propose a nonparametric Bayesian local clustering Poisson model (NoB-LCP) to facilitate posterior inference on two-dimensional clustering of HMs and genomic locations. The NoB-LCP clusters HMs into HM sets and lets each HM set define its own clustering of genomic locations. Furthermore, it probabilistically excludes HMs and genomic locations that are irrelevant to clustering. By doing so, the proposed model effectively identifies important sets of HMs and groups regulatory elements with similar functionality based on HM patterns.NIH R01 CA132897NCI 5 K25 CA123344Mathematic

    Identification of SOX9 Interaction Sites in the Genome of Chondrocytes

    Get PDF
    Our previous work has provided strong evidence that the transcription factor SOX9 is completely needed for chondrogenic differentiation and cartilage formation acting as a "master switch" in this differentiation. Heterozygous mutations in SOX9 cause campomelic dysplasia, a severe skeletal dysmorphology syndrome in humans characterized by a generalized hypoplasia of endochondral bones. To obtain insights into the logic used by SOX9 to control a network of target genes in chondrocytes, we performed a ChIP-on-chip experiment using SOX9 antibodies.The ChIP DNA was hybridized to a microarray, which covered 80 genes, many of which are involved in chondrocyte differentiation. Hybridization peaks were detected in a series of cartilage extracellular matrix (ECM) genes including Col2a1, Col11a2, Aggrecan and Cdrap as well as in genes for specific transcription factors and signaling molecules. Our results also showed SOX9 interaction sites in genes that code for proteins that enhance the transcriptional activity of SOX9. Interestingly, a strong SOX9 signal was also observed in genes such as Col1a1 and Osx, whose expression is strongly down regulated in chondrocytes but is high in osteoblasts. In the Col2a1 gene, in addition to an interaction site on a previously identified enhancer in intron 1, another strong interaction site was seen in intron 6. This site is free of nucleosomes specifically in chondrocytes suggesting an important role of this site on Col2a1 transcription regulation by SOX9.Our results provide a broad understanding of the strategies used by a "master" transcription factor of differentiation in control of the genetic program of chondrocytes

    Accuracy of RNA-Seq and its Dependence on Sequencing Depth

    Get PDF
    The cost of DNA sequencing has undergone a dramatical reduction in the past decade. As a result, sequencing technologies have been increasingly applied to genomic research. RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. As it is not clear how increased sequencing capacity has affected measurement accuracy of mRNA, we sought to investigate that relationship. Result: We empirically evaluate the accuracy of repeated gene expression measurements using RNA-Seq. We identify library preparation steps prior to DNA sequencing as the main source of error in this process. Studying three datasets, we show that the accuracy indeed improves with the sequencing depth. However, the rate of improvement as a function of sequence reads is generally slower than predicted by the binomial distribution. We therefore used the beta-binomial distribution to model the overdispersion. The overdispersion parameters we introduced depend explicitly on the number of reads so that the resulting statistical uncertainty is consistent with the empirical data that measurement accuracy increases with the sequencing depth. The overdispersion parameters were determined by maximizing the likelihood. We shown that our modified beta-binomial model had lower false discovery rate than the binomial or the pure beta-binomial models. Conclusion: We proposed a novel form of overdispersion guaranteeing that the accuracy improves with sequencing depth. We demonstrated that the new form provides a better fit to the data.NIH/NCI 5K25CA123344Keck Center for Quantitative Biomedical Sciences of the Gulf Coast Consortia from the Cancer Prevention and Research Institute of Texas (CPRIT) RP101489Center for Computational Biology and Bioinformatic
    • …
    corecore