722 research outputs found

    Mapping Transcription Factor Networks and Elucidating Their Biological Determinants

    Get PDF
    A central goal in systems biology is to accurately map the transcription factor (TF) network of a cell. Such a network map is a key component for many downstream applications, from developmental biology to transcriptome engineering, and from disease modeling to drug discovery. Building a reliable network map requires a wide range of data sources including TF binding locations and gene expression data after direct TF perturbations. However, we are facing two roadblocks. First, rich resources are available only for a few well-studied systems and cannot be easily replicated for new organisms or cell types. Second, when TF binding and TF- perturbation response data are available, they rarely converge on a common set of direct and functional targets for a TF. This dissertation explores and validates the best combination of experimental and analytic techniques to map TF networks. First, we introduce an unsupervised inference algorithm that maps TF networks by exploiting only gene expression and genome sequence data. We show that our “data light” method is more accurate at identifying direct targets of TFs than other similar methods. Second, we develop an optimization method to search for a convergent set of target genes that are independently identified by binding locations and perturbation responses of each TF. Combining this method with network inference greatly expanded the high-confidence network maps, especially when applied on datasets obtained by using recently developed experimental methods. Third, we describe a framework for predicting each gene’s responsiveness to a TF perturbation from genomic features. Using this framework, we identified properties of each gene that are independent of the perturbed TF as the major determinants of TF-perturbation responsiveness. This may lead to improvements in network mapping algorithms that exploit TF perturbation responses. Overall, this dissertation provides a scalable framework for mapping high-quality TF networks for a variety of organisms and cell types

    Dynamics of transmission in disordered topological insulators

    Full text link
    Here we show in simulations of the Haldane model that pulse propagation in disordered topological insulators is robust throughout the central portion of the band gap where localized modes do not arise. Since transmission is robust in topological insulators, the essential field variable is the phase of the transmitted field, or, equivalently, its spectral derivative, which is the transmission time. Except near resonances with bulk localized modes that couple the upper and lower edges of a topological insulator, the transmission time in a topological insulator is proportional to the density of states and to the energy excited within the sample. The average transmission time is enhanced in disordered TIs near the band edge and slightly suppressed in the center of the band gap. The variance of the transmission time at the band edge for a random ensemble with moderate disorder is dominated by fluctuations at resonances with localized states, and initially scales quadratically. When modes are absent, such as in the center of the band gap, the transmission time self-averages and its variance scales linearly. This leads to significant sample-to-sample fluctuations in the transmission time. However, because the transmission time is the sum of contributions from the continuum edge mode, which stretches across the band gap, and far-off-resonance modes near the band edge, there are no sharp features in the spectrum of transmission time in the center of the band gap. As a result, ultrashort, broadband pulses are faithfully transmitted in the center of the band gap of topological insulators with moderate disorder and bent paths. This allows for robust signal propagation in complex topological metawaveguides for applications in high-speed optoelectronics and telecommunications

    Predicting which genes will respond to transcription factor perturbations

    Get PDF
    The ability to predict which genes will respond to the perturbation of a transcription factor serves as a benchmark for our systems-level understanding of transcriptional regulatory networks. In previous work, machine learning models have been trained to predict static gene expression levels in a biological sample by using data from the same or similar samples, including data on their transcription factor binding locations, histone marks, or DNA sequence. We report on a different challenge-training machine learning models to predict which genes will respond to the perturbation of a transcription factor without using any data from the perturbed cells. We find that existing transcription factor location data (ChIP-seq) from human cells have very little detectable utility for predicting which genes will respond to perturbation of a transcription factor. Features of genes, including their preperturbation expression level and expression variation, are very useful for predicting responses to perturbation of any transcription factor. This shows that some genes are poised to respond to transcription factor perturbations and others are resistant, shedding light on why it has been so difficult to predict responses from binding locations. Certain histone marks, including H3K4me1 and H3K4me3, have some predictive power when located downstream of the transcription start site. However, the predictive power of histone marks is much less than that of gene expression level and expression variation. Sequence-based or epigenetic properties of genes strongly influence their tendency to respond to direct transcription factor perturbations, partially explaining the oft-noted difficulty of predicting responsiveness from transcription factor binding location data. These molecular features are largely reflected in and summarized by the gene\u27s expression level and expression variation. Code is available at https://github.com/BrentLab/TFPertRespExplainer

    Unsupervised Text Style Transfer with Deep Generative Models

    Full text link
    We present a general framework for unsupervised text style transfer with deep generative models. The framework models each sentence-label pair in the non-parallel corpus as partially observed from a complete quadruplet which additionally contains two latent codes representing the content and style, respectively. These codes are learned by exploiting dependencies inside the observed data. Then a sentence is transferred by manipulating them. Our framework is able to unify previous embedding and prototype methods as two special forms. It also provides a principled perspective to explain previously proposed techniques in the field such as aligned encoder and adversarial training. We further conduct experiments on three benchmarks. Both automatic and human evaluation results show that our methods achieve better or competitive results compared to several strong baselines

    NetProphet 2.0: Mapping transcription factor networks by exploiting scalable data resources

    Get PDF
    MOTIVATION: Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and \u27integrative\u27 algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. RESULTS: We present NetProphet 2.0, a \u27data light\u27 algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. AVAILABILITY AND IMPLEMENTATION: Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models

    Full text link
    The rapid development of Large Language Models (LLMs) has led to great strides in model capabilities like reasoning and long-context understanding. However, as LLMs are able to process longer contexts, it becomes more challenging to evaluate whether they have acquired certain capabilities, since the length of text (e.g., 100K tokens) they can process far exceeds what humans can reliably assess in a reasonable duration. In this paper, we propose using complex synthetic tasks as a proxy evaluation method, and present S3Eval, a Synthetic, Scalable, Systematic evaluation suite for LLMs evaluation. As a synthetic benchmark, S3Eval enables the creation of any number of evaluation examples that are theoretically invisible to LLMs, mitigating the test set contamination issue. The synthetic nature of S3Eval provides users full control over the dataset, allowing them to systematically probe LLM capabilities by scaling text length and varying task difficulty across diverse scenarios. The strong correlation between S3Eval performance and scores of real-world benchmarks like Big-Bench Hard (BBH) demonstrates the soundness of using S3Eval for evaluation of LLMs. The in-depth analysis also uncover additional insights, including performance drop when the answer is sparsely distributed or located in the middle context, as well as some counter-intuitive trends of model performance.Comment: Work in progres

    NMI inhibits cancer stem cell traits by downregulating hTERT in breast cancer.

    Get PDF
    N-myc and STAT interactor (NMI) has been proved to bind to different transcription factors to regulate a variety of signaling mechanisms including DNA damage, cell cycle and epithelial-mesenchymal transition. However, the role of NMI in the regulation of cancer stem cells (CSCs) remains poorly understood. In this study, we investigated the regulation of NMI on CSCs traits in breast cancer and uncovered the underlying molecular mechanisms. We found that NMI was lowly expressed in breast cancer stem cells (BCSCs)-enriched populations. Knockdown of NMI promoted CSCs traits while its overexpression inhibited CSCs traits, including the expression of CSC-related markers, the number of CD44+CD24- cell populations and the ability of mammospheres formation. We also found that NMI-mediated regulation of BCSCs traits was at least partially realized through the modulation of hTERT signaling. NMI knockdown upregulated hTERT expression while its overexpression downregulated hTERT in breast cancer cells, and the changes in CSCs traits and cell invasion ability mediated by NMI were rescued by hTERT. The in vivo study also validated that NMI knockdown promoted breast cancer growth by upregulating hTERT signaling in a mouse model. Moreover, further analyses for the clinical samples demonstrated that NMI expression was negatively correlated with hTERT expression and the low NMI/high hTERT expression was associated with the worse status of clinical TNM stages in breast cancer patients. Furthermore, we demonstrated that the interaction of YY1 protein with NMI and its involvement in NMI-mediated transcriptional regulation of hTERT in breast cancer cells. Collectively, our results provide new insights into understanding the regulatory mechanism of CSCs and suggest that the NMI-YY1-hTERT signaling axis may be a potential therapeutic target for breast cancers
    • …
    corecore