230 research outputs found

    Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation.</p> <p>Results</p> <p>The conservation-independent approach is based on an empirical potential energy between interacting transcription factors (TFs). In this analysis, the potential energy is defined as a function of the number of TF interactions in a genomic region and the strength of the interactions. By identifying sets of interacting TFs, the analysis locates regions enriched with the binding sites of these interacting TFs. We applied this approach to 30 human tissues and identified 6232 putative cis-regulatory modules (CRMs) regulating 2130 tissue-specific genes. Interestingly, some genes appear to be regulated by different CRMs in different tissues. Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions. We also find that conserved and non-conserved CRMs regulate distinct gene groups. Conserved CRMs control more essential genes and genes involved in fundamental cellular activities such as transcription. In contrast, non-conserved CRMs, in general, regulate more non-essential genes, such as genes related to neural activity.</p> <p>Conclusion</p> <p>These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation.</p

    Approximating Human-Like Few-shot Learning with GPT-based Compression

    Full text link
    In this work, we conceptualize the learning process as information compression. We seek to equip generative pre-trained models with human-like learning capabilities that enable data compression during inference. We present a novel approach that utilizes the Generative Pre-trained Transformer (GPT) to approximate Kolmogorov complexity, with the aim of estimating the optimal Information Distance for few-shot learning. We first propose using GPT as a prior for lossless text compression, achieving a noteworthy compression ratio. Experiment with LLAMA2-7B backbone achieves a compression ratio of 15.5 on enwik9. We justify the pre-training objective of GPT models by demonstrating its equivalence to the compression length, and, consequently, its ability to approximate the information distance for texts. Leveraging the approximated information distance, our method allows the direct application of GPT models in quantitative text similarity measurements. Experiment results show that our method overall achieves superior performance compared to embedding and prompt baselines on challenging NLP tasks, including semantic similarity, zero and one-shot text classification, and zero-shot text ranking

    Few-Shot Non-Parametric Learning with Deep Latent Variable Model

    Full text link
    Most real-world problems that machine learning algorithms are expected to solve face the situation with 1) unknown data distribution; 2) little domain-specific knowledge; and 3) datasets with limited annotation. We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV), a learning framework for any dataset with abundant unlabeled data but very few labeled ones. By only training a generative model in an unsupervised way, the framework utilizes the data distribution to build a compressor. Using a compressor-based distance metric derived from Kolmogorov complexity, together with few labeled data, NPC-LV classifies without further training. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime and even outperform semi-supervised learning methods on CIFAR-10. We demonstrate how and when negative evidence lowerbound (nELBO) can be used as an approximate compressed length for classification. By revealing the correlation between compression rate and classification accuracy, we illustrate that under NPC-LV, the improvement of generative models can enhance downstream classification accuracy.Comment: Accepted to NeurIPS202

    Analysis of regulatory network topology reveals functionally distinct classes of microRNAs

    Get PDF
    MicroRNAs (miRNAs) negatively regulate the expression of target genes at the post-transcriptional level. Little is known about the crosstalk between miRNAs and transcription factors (TFs). Here we provide data suggesting that the interaction patterns between TFs and miRNAs can influence the biological functions of miRNAs. From this global survey, we find that a regulated feedback loop, in which two TFs regulate each other and one miRNA regulates both of the factors, is the most significantly overrepresented network motif. Mathematical modeling shows that the miRNA in this motif stabilizes the feedback loop to resist environmental perturbation, providing one mechanism to explain the robustness of developmental programs that is contributed by miRNAs. Furthermore, on the basis of a network motif profile analysis, we demonstrate the existence of two classes of miRNAs with distinct network topological properties. The first class of miRNAs is regulated by a large number of TFs, whereas the second is regulated by only a few TFs. The differential expression level of the two classes of miRNAs in embryonic developmental stages versus adult tissues suggests that the two classes may have fundamentally different biological functions. Our results demonstrate that the TFs and miRNAs extensively interact with each other and the biological functions of miRNAs may be wired in the regulatory network topology
    corecore