Search CORE

722 research outputs found

Mapping Transcription Factor Networks and Elucidating Their Biological Determinants

Author: Kang Yiming
Publication venue: Washington University Open Scholarship
Publication date: 15/01/2021
Field of study

A central goal in systems biology is to accurately map the transcription factor (TF) network of a cell. Such a network map is a key component for many downstream applications, from developmental biology to transcriptome engineering, and from disease modeling to drug discovery. Building a reliable network map requires a wide range of data sources including TF binding locations and gene expression data after direct TF perturbations. However, we are facing two roadblocks. First, rich resources are available only for a few well-studied systems and cannot be easily replicated for new organisms or cell types. Second, when TF binding and TF- perturbation response data are available, they rarely converge on a common set of direct and functional targets for a TF. This dissertation explores and validates the best combination of experimental and analytic techniques to map TF networks. First, we introduce an unsupervised inference algorithm that maps TF networks by exploiting only gene expression and genome sequence data. We show that our “data light” method is more accurate at identifying direct targets of TFs than other similar methods. Second, we develop an optimization method to search for a convergent set of target genes that are independently identified by binding locations and perturbation responses of each TF. Combining this method with network inference greatly expanded the high-confidence network maps, especially when applied on datasets obtained by using recently developed experimental methods. Third, we describe a framework for predicting each gene’s responsiveness to a TF perturbation from genomic features. Using this framework, we identified properties of each gene that are independent of the perturbed TF as the major determinants of TF-perturbation responsiveness. This may lead to improvements in network mapping algorithms that exploit TF perturbation responses. Overall, this dissertation provides a scalable framework for mapping high-quality TF networks for a variety of organisms and cell types

Washington University St. Louis: Open Scholarship

Dynamics of transmission in disordered topological insulators

Author: Genack Azriel Z.
Huang Yiming
Kang Yuhao
Publication venue
Publication date: 28/12/2020
Field of study

Here we show in simulations of the Haldane model that pulse propagation in disordered topological insulators is robust throughout the central portion of the band gap where localized modes do not arise. Since transmission is robust in topological insulators, the essential field variable is the phase of the transmitted field, or, equivalently, its spectral derivative, which is the transmission time. Except near resonances with bulk localized modes that couple the upper and lower edges of a topological insulator, the transmission time in a topological insulator is proportional to the density of states and to the energy excited within the sample. The average transmission time is enhanced in disordered TIs near the band edge and slightly suppressed in the center of the band gap. The variance of the transmission time at the band edge for a random ensemble with moderate disorder is dominated by fluctuations at resonances with localized states, and initially scales quadratically. When modes are absent, such as in the center of the band gap, the transmission time self-averages and its variance scales linearly. This leads to significant sample-to-sample fluctuations in the transmission time. However, because the transmission time is the sum of contributions from the continuum edge mode, which stretches across the band gap, and far-off-resonance modes near the band edge, there are no sharp features in the spectrum of transmission time in the center of the band gap. As a result, ultrashort, broadband pulses are faithfully transmitted in the center of the band gap of topological insulators with moderate disorder and bent paths. This allows for robust signal propagation in complex topological metawaveguides for applications in high-speed optoelectronics and telecommunications

arXiv.org e-Print Archive

City University of New York

Predicting which genes will respond to transcription factor perturbations

Author: Brent Michael R
Jung Wooseok J
Kang Yiming
Publication venue: Digital Commons@Becker
Publication date: 06/06/2022
Field of study

The ability to predict which genes will respond to the perturbation of a transcription factor serves as a benchmark for our systems-level understanding of transcriptional regulatory networks. In previous work, machine learning models have been trained to predict static gene expression levels in a biological sample by using data from the same or similar samples, including data on their transcription factor binding locations, histone marks, or DNA sequence. We report on a different challenge-training machine learning models to predict which genes will respond to the perturbation of a transcription factor without using any data from the perturbed cells. We find that existing transcription factor location data (ChIP-seq) from human cells have very little detectable utility for predicting which genes will respond to perturbation of a transcription factor. Features of genes, including their preperturbation expression level and expression variation, are very useful for predicting responses to perturbation of any transcription factor. This shows that some genes are poised to respond to transcription factor perturbations and others are resistant, shedding light on why it has been so difficult to predict responses from binding locations. Certain histone marks, including H3K4me1 and H3K4me3, have some predictive power when located downstream of the transcription start site. However, the predictive power of histone marks is much less than that of gene expression level and expression variation. Sequence-based or epigenetic properties of genes strongly influence their tendency to respond to direct transcription factor perturbations, partially explaining the oft-noted difficulty of predicting responsiveness from transcription factor binding location data. These molecular features are largely reflected in and summarized by the gene\u27s expression level and expression variation. Code is available at https://github.com/BrentLab/TFPertRespExplainer

Digital Commons@Becker

PubMed Central

Noninvasive detection of high-risk adenomas using stool-derived eukaryotic RNA sequences as biomarkers

Author: Barnell Erika K
Chaudhuri Aadel A
Griffith Malachi
Griffith Obi L
Kang Yiming
Wurtzler Elizabeth M
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

Digital Commons@Becker

Unsupervised Text Style Transfer with Deep Generative Models

Author: Jiang Zhongtao
Ju Yiming
Liu Kang
Zhang Yuanzhe
Publication venue
Publication date: 31/08/2023
Field of study

We present a general framework for unsupervised text style transfer with deep generative models. The framework models each sentence-label pair in the non-parallel corpus as partially observed from a complete quadruplet which additionally contains two latent codes representing the content and style, respectively. These codes are learned by exploiting dependencies inside the observed data. Then a sentence is transferred by manipulating them. Our framework is able to unify previous embedding and prototype methods as two special forms. It also provides a principled perspective to explain previously proposed techniques in the field such as aligned encoder and adversarial training. We further conduct experiments on three benchmarks. Both automatic and human evaluation results show that our methods achieve better or competitive results compared to several strong baselines

arXiv.org e-Print Archive

NetProphet 2.0: Mapping transcription factor networks by exploiting scalable data resources

Author: Brent Michael R
Kang Yiming
Liow Hien-Haw
Maier Ezekiel J
Publication venue: 'Oxford University Press (OUP)'
Publication date: 15/01/2018
Field of study

MOTIVATION: Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and \u27integrative\u27 algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. RESULTS: We present NetProphet 2.0, a \u27data light\u27 algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. AVAILABILITY AND IMPLEMENTATION: Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Digital Commons@Becker

S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models

Author: He Shizhu
Huang Yiming
Lei Fangyu
Liu Kang
Liu Qian
Zhao Jun
Publication venue
Publication date: 23/10/2023
Field of study

The rapid development of Large Language Models (LLMs) has led to great strides in model capabilities like reasoning and long-context understanding. However, as LLMs are able to process longer contexts, it becomes more challenging to evaluate whether they have acquired certain capabilities, since the length of text (e.g., 100K tokens) they can process far exceeds what humans can reliably assess in a reasonable duration. In this paper, we propose using complex synthetic tasks as a proxy evaluation method, and present S3Eval, a Synthetic, Scalable, Systematic evaluation suite for LLMs evaluation. As a synthetic benchmark, S3Eval enables the creation of any number of evaluation examples that are theoretically invisible to LLMs, mitigating the test set contamination issue. The synthetic nature of S3Eval provides users full control over the dataset, allowing them to systematically probe LLM capabilities by scaling text length and varying task difficulty across diverse scenarios. The strong correlation between S3Eval performance and scores of real-world benchmarks like Big-Bench Hard (BBH) demonstrates the soundness of using S3Eval for evaluation of LLMs. The in-depth analysis also uncover additional insights, including performance drop when the answer is sparsely distributed or located in the middle context, as well as some counter-intuitive trends of model performance.Comment: Work in progres

arXiv.org e-Print Archive

NMI inhibits cancer stem cell traits by downregulating hTERT in breast cancer.

Author: Chen Yiming
Deng Wuguo
Feng Xu
Gao Yue
Guo Wei
Hao Jiaojiao
Huang Wenlin
Kang Lan
Liao Yina
Tang Ranran
Wu Jiali
Xiao Xiangsheng
Xu Xiangdong
Yu Wendan
Zhao Xinrui
Zou Kun
Publication venue: eScholarship, University of California
Publication date: 01/05/2017
Field of study

N-myc and STAT interactor (NMI) has been proved to bind to different transcription factors to regulate a variety of signaling mechanisms including DNA damage, cell cycle and epithelial-mesenchymal transition. However, the role of NMI in the regulation of cancer stem cells (CSCs) remains poorly understood. In this study, we investigated the regulation of NMI on CSCs traits in breast cancer and uncovered the underlying molecular mechanisms. We found that NMI was lowly expressed in breast cancer stem cells (BCSCs)-enriched populations. Knockdown of NMI promoted CSCs traits while its overexpression inhibited CSCs traits, including the expression of CSC-related markers, the number of CD44+CD24- cell populations and the ability of mammospheres formation. We also found that NMI-mediated regulation of BCSCs traits was at least partially realized through the modulation of hTERT signaling. NMI knockdown upregulated hTERT expression while its overexpression downregulated hTERT in breast cancer cells, and the changes in CSCs traits and cell invasion ability mediated by NMI were rescued by hTERT. The in vivo study also validated that NMI knockdown promoted breast cancer growth by upregulating hTERT signaling in a mouse model. Moreover, further analyses for the clinical samples demonstrated that NMI expression was negatively correlated with hTERT expression and the low NMI/high hTERT expression was associated with the worse status of clinical TNM stages in breast cancer patients. Furthermore, we demonstrated that the interaction of YY1 protein with NMI and its involvement in NMI-mediated transcriptional regulation of hTERT in breast cancer cells. Collectively, our results provide new insights into understanding the regulatory mechanism of CSCs and suggest that the NMI-YY1-hTERT signaling axis may be a potential therapeutic target for breast cancers

Crossref

eScholarship - University of California