19 research outputs found

    The value of position-specific priors in motif discovery using MEME

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types–including sequence conservation, nucleosome positioning, and negative examples–can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM).</p> <p>Results</p> <p>We extend the popular EM-based MEME algorithm to utilize position-specific priors and demonstrate their effectiveness for discovering transcription factor (TF) motifs in yeast and mouse DNA sequences. Utilizing a discriminative, conservation-based prior dramatically improves MEME's ability to discover motifs in 156 yeast TF ChIP-chip datasets, more than doubling the number of datasets where it finds the correct motif. On these datasets, MEME using the prior has a higher success rate than eight other conservation-based motif discovery approaches. We also show that the same type of prior improves the accuracy of motifs discovered by MEME in mouse TF ChIP-seq data, and that the motifs tend to be of slightly higher quality those found by a Gibbs sampling algorithm using the same prior.</p> <p>Conclusions</p> <p>We conclude that using position-specific priors can substantially increase the power of EM-based motif discovery algorithms such as MEME algorithm.</p

    Methods for identifying regulatory grammars

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. [37]-40).Recent advancements in sequencing technology have made it possible to study the mechanisms of gene regulation, such as protein-DNA binding, at greater resolution and on a greater scale than was previously possible. We present an expectation-maximization learning algorithm that identifies enriched spatial relationships between motifs in sets of DNA sequences. For example, the method will identify spatially constrained motifs colocated in the same regulatory region. We apply our method to biological sequence data and recover previously known prokaryotic promoter spacing constraints demonstrating that joint learning of motifs and spacing constraints is superior to other methods for this task.by Tahin Fahmid Syed.S.M

    AUDIT TIME PRESSURE AND DUE PROFESSIONAL CARE ON AUDIT QUALITY

    Get PDF
    Dalam setiap kegiatan audit, seringkali auditor menemui kendala dalam mengatur waktu proses audit yang sangat singkat dibandingkan dengan tahapan yang harus dilakukan dalam proses audit. Tekanan waktu proses audit yang sangat ketat dapat berdampak pada penurunan kualitas audit. Masih ada akuntan publik yang salah menilai akun-akun dalam laporan keuangan. Dalam hal ini, penelitian ini lebih pada kajian grounded theory yang menekankan pada upaya peneliti dalam melakukan analisis abstrak terhadap suatu fenomena, dengan harapan analisis ini dapat membuktikan teori tertentu yang dapat menjelaskan fenomena secara spesifik. Adapun permasalahan mengenai kualitas audit yang terjadi di Indonesia pada umumnya dan audit di kota Bandung. Hasil penelitian ini memberikan bukti empiris bahwa semakin tinggi tekanan waktu audit maka semakin rendah kualitas hasil audit dan sebaliknya semakin rendah tekanan waktu audit akan meningkatkan kualitas audit. Hasil penelitian ini memberikan bukti empiris bahwa semakin tinggi due professional care maka semakin tinggi kualitas hasil auditIn each audit activity, the auditor often encounters obstacles in managing the audit process time which is very short compared to the stages that must be carried out in the audit process. The time pressure of a very tight audit process can affect a decrease in audit quality. There are still public accountants who misjudge the accounts in the financial statements. In this case, this research is more on the study of grounded theory which emphasizes the efforts of researchers in conducting an abstract analysis of a phenomenon, with the hope that this analysis can prove certain theories that can explain phenomena specifically. As for the problems regarding the quality of audits that occur in Indonesia in general and audits in the city of Bandung. The results of this study provide empirical evidence that the higher the audit time pressure, the lower the quality of audit results and conversely the lower the audit time pressure will improve audit quality. The results of this study provide empirical evidence that the higher the due professional care, the higher the quality of the audit results. &nbsp; &nbsp; &nbsp

    An Entropy-Based Position Projection Algorithm for Motif Discovery

    Get PDF

    Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the key goals of oak genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of forests to increase their health and productivity. Deep-coverage large-insert genomic libraries are a crucial tool for attaining this objective. We report herein the construction of a BAC library for <it>Quercus robur</it>, its characterization and an analysis of BAC end sequences.</p> <p>Results</p> <p>The <it>Eco</it>RI library generated consisted of 92,160 clones, 7% of which had no insert. Levels of chloroplast and mitochondrial contamination were below 3% and 1%, respectively. Mean clone insert size was estimated at 135 kb. The library represents 12 haploid genome equivalents and, the likelihood of finding a particular oak sequence of interest is greater than 99%. Genome coverage was confirmed by PCR screening of the library with 60 unique genetic loci sampled from the genetic linkage map. In total, about 20,000 high-quality BAC end sequences (BESs) were generated by sequencing 15,000 clones. Roughly 5.88% of the combined BAC end sequence length corresponded to known retroelements while <it>ab initio </it>repeat detection methods identified 41 additional repeats. Collectively, characterized and novel repeats account for roughly 8.94% of the genome. Further analysis of the BESs revealed 1,823 putative genes suggesting at least 29,340 genes in the oak genome. BESs were aligned with the genome sequences of <it>Arabidopsis thaliana</it>, <it>Vitis vinifera </it>and <it>Populus trichocarpa</it>. One putative collinear microsyntenic region encoding an alcohol acyl transferase protein was observed between oak and chromosome 2 of <it>V. vinifera.</it></p> <p>Conclusions</p> <p>This BAC library provides a new resource for genomic studies, including SSR marker development, physical mapping, comparative genomics and genome sequencing. BES analysis provided insight into the structure of the oak genome. These sequences will be used in the assembly of a future genome sequence for oak.</p

    A Genome-Wide Survey of Switchgrass Genome Structure and Organization

    Get PDF
    The perennial grass, switchgrass (Panicum virgatum L.), is a promising bioenergy crop and the target of whole genome sequencing. We constructed two bacterial artificial chromosome (BAC) libraries from the AP13 clone of switchgrass to gain insight into the genome structure and organization, initiate functional and comparative genomic studies, and assist with genome assembly. Together representing 16 haploid genome equivalents of switchgrass, each library comprises 101,376 clones with average insert sizes of 144 (HindIII-generated) and 110 kb (BstYI-generated). A total of 330,297 high quality BAC-end sequences (BES) were generated, accounting for 263.2 Mbp (16.4%) of the switchgrass genome. Analysis of the BES identified 279,099 known repetitive elements, >50,000 SSRs, and 2,528 novel repeat elements, named switchgrass repetitive elements (SREs). Comparative mapping of 47 full-length BAC sequences and 330K BES revealed high levels of synteny with the grass genomes sorghum, rice, maize, and Brachypodium. Our data indicate that the sorghum genome has retained larger microsyntenous regions with switchgrass besides high gene order conservation with rice. The resources generated in this effort will be useful for a broad range of applications

    GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied.</p> <p>Results</p> <p>We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote.</p> <p>Conclusions</p> <p>The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.</p

    Endothelial cell-glucocorticoid receptor interactions and regulation of Wnt signaling

    Get PDF
    Vascular inflammation is present in many cardiovascular diseases, and exogenous glucocorticoids have traditionally been used as a therapy to suppress inflammation. However, recent data have shown that endogenous glucocorticoids, acting through the endothelial glucocorticoid receptor, act as negative regulators of inflammation. Here, we performed ChIP for the glucocorticoid receptor, followed by next-generation sequencing in mouse endothelial cells to investigate how the endothelial glucocorticoid receptor regulates vascular inflammation. We identified a role of the Wnt signaling pathway in this setting and show that loss of the endothelial glucocorticoid receptor results in upregulation of Wnt signaling both in vitro and in vivo using our validated mouse model. Furthermore, we demonstrate glucocorticoid receptor regulation of a key gene in the Wnt pathway, Frzb, via a glucocorticoid response element gleaned from our genomic data. These results suggest a role for endothelial Wnt signaling modulation in states of vascular inflammation.</p

    Drug-Target Interaction Networks Prediction Using Short-linear Motifs

    Get PDF
    Drug-target interaction (DTI) prediction is a fundamental step in drug discovery and genomic research and contributes to medical treatment. Various computational methods have been developed to find potential DTIs. Machine learning (ML) has been currently used for new DTIs identification from existing DTI networks. There are mainly two ML-based approaches for DTI network prediction: similarity-based methods and feature-based methods. In this thesis, we propose a feature-based approach, and firstly use short-linear motifs (SLiMs) as descriptors of protein. Additionally, chemical substructure fingerprints are used as features of drug. Moreover, another challenge in this field is the lack of negative data for the training set because most data which can be found in public databases is interaction samples. Many researchers regard unknown drug-target pairs as non-interaction, which is incorrect, and may cause serious consequences. To solve this problem, we introduce a strategy to select reliable negative samples according to the features of positive data. We use the same benchmark datasets as previous research in order to compare with them. After trying three classifiers k nearest neighbours (k-NN), Random Forest (RF) and Support Vector Machine (SVM), we find that the results of k-NN are satisfied but not as excellent as RF and SVM. Compared with existing approaches using the same datasets to solve the same problem, our method performs the best under most circumstance
    corecore