35 research outputs found

    Development of Computational Techniques for Identification of Regulatory DNA Motif

    Get PDF
    Identifying precise transcription factor binding sites (TFBS) or regulatory DNA motif (motif) plays a fundamental role in researching transcriptional regulatory mechanism in cells and helping construct regulatory networks for biological investigation. Chromatin immunoprecipitation combined with sequencing (ChIP-seq) and lambda exonuclease digestion followed by high-throughput sequencing (ChIP-exo) enables researchers to identify TFBS on a genome-scale with improved resolution. Several algorithms have been developed to perform motif identification, employing widely different methods and often giving divergent results. In addition, these existing methods still suffer from prediction accuracy. Thesis focuses on the development of improved regulatory DNA motif identification techniques. We designed an integrated framework, WTSA, that can reliably combine the experimental signals from ChIP-exo data in base pair (bp) resolution to predict the statistically significant DNA motifs. The algorithm improves the prediction accuracy and extends the scope of applicability of the existing methods. We have applied the framework to Escherichia coli k12 genome and evaluated WTSA prediction performance through comparison with seven existing programs. The performance evaluation indicated that WTSA provides reliable predictive power for regulatory motifs using ChIP-exo data. An important application of DNA motif identification is to identify transcriptional regulatory mechanisms. The rapid development of single-cell RNA-Sequencing (scRNAseq) technologies provides an unprecedented opportunity to discover the gene transcriptional regulation at the single-cell level. In the scRNA-seq analyses, a critical step is to identify the cell-type-specific regulons (CTS-Rs), each of which is a group of genes co-regulated by the same transcription regulator in a specific cell type. We developed a web server, IRIS3 (Integrated Cell-type-specific Regulon Inference Server from Single-cell RNA-Seq), to solve this problem by the integration of data preprocessing, cell type prediction, gene module identification, and cis-regulatory motif analyses. Compared with other packages, IRIS3 predicts more efficiently and provides more accurate regulon from scRNA-seq data. These CTS-Rs can substantially improve the elucidation of heterogeneous regulatory mechanisms among various cell types and allow reliable constructions of global transcriptional regulation networks encoded in a specific cell type. Also presented in this thesis is DESSO (DEep Sequence and Shape mOtif (DESSO), using deep neural networks and the binomial distribution model to identify DNA motifs, DESSO outperformed existing tools, including DeepBind, in 690 human ENCODE ChIP-Sequencing datasets. DESSO also further expanded motif identification power by integrating the detection of DNA shape features

    IRIS-EDA: An Integrated RNA-Seq Interpretation System for Gene Expression Data Analysis

    Get PDF
    Next-Generation Sequencing has made available substantial amounts of large-scale Omics data, providing unprecedented opportunities to understand complex biological systems. Specifically, the value of RNA-Sequencing (RNA-Seq) data has been confirmed in inferring how gene regulatory systems will respond under various conditions (bulk data) or cell types (single-cell data). RNA-Seq can generate genome-scale gene expression profiles that can be further analyzed using correlation analysis, co-expression analysis, clustering, differential gene expression (DGE), among many other studies. While these analyses can provide invaluable information related to gene expression, integration and interpretation of the results can prove challenging. Here we present a tool called IRIS-EDA, which is a Shiny web server for expression data analysis. It provides a straightforward and user-friendly platform for performing numerous computational analyses on user-provided RNA-Seq or Single-cell RNA-Seq (scRNA-Seq) data. Specifically, three commonly used R packages (edgeR, DESeq2, and limma) are implemented in the DGE analysis with seven unique experimental design functionalities, including a user-specified design matrix option. Seven discovery-driven methods and tools (correlation analysis, heatmap, clustering, biclustering, Principal Component Analysis (PCA), Multidimensional Scaling (MDS), and t-distributed Stochastic Neighbor Embedding (t-SNE)) are provided for gene expression exploration which is useful for designing experimental hypotheses and determining key factors for comprehensive DGE analysis. Furthermore, this platform integrates seven visualization tools in a highly interactive manner, for improved interpretation of the analyses. It is noteworthy that, for the first time, IRIS-EDA provides a framework to expedite submission of data and results to NCBI’s Gene Expression Omnibus following the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles. IRIS-EDA is freely available at http://bmbl.sdstate.edu/IRIS/

    Prediction of regulatory motifs from human Chip-sequencing data using a deep learning framework

    Get PDF
    The identification of transcription factor binding sites and cis-regulatory motifs is a frontier whereupon the rules governing protein-DNA binding are being revealed. Here, we developed a new method (DEep Sequence and Shape mOtif or DESSO) for cis-regulatory motif prediction using deep neural networks and the binomial distribution model. DESSO outperformed existing tools, including DeepBind, in predicting motifs in 690 human ENCODE ChIP-sequencing datasets. Furthermore, the deep-learning framework of DESSO expanded motif discovery beyond the state-of-the-art by allowing the identification of known and new protein-protein-DNA tethering interactions in human transcription factors (TFs). Specifically, 61 putative tethering interactions were identified among the 100 TFs expressed in the K562 cell line. In this work, the power of DESSO was further expanded by integrating the detection of DNA shape features. We found that shape information has strong predictive power for TF-DNA binding and provides new putative shape motif information for human TFs. Thus, DESSO improves in the identification and structural analysis of TF binding sites, by integrating the complexities of DNA binding into a deep-learning framework

    Water Deficit Transcriptomic Responses Differ in the Invasive Tamarix chinensis and T. ramosissima Established in the Southern and Northern United States

    Get PDF
    Tamarix spp. (saltcedar) were introduced from Asia to the southern United States as windbreak and ornamental plants and have spread into natural areas. This study determined differential gene expression responses to water deficit (WD) in seedlings of T. chinensis and T. ramosissima from established invasive stands in New Mexico and Montana, respectively. A reference de novo transcriptome was developed using RNA sequences from WD and well-watered samples. Blast2GO analysis of the resulting 271,872 transcripts yielded 89,389 homologs. The reference Tamarix (Tamaricaceae, Carophyllales order) transcriptome showed homology with 14,247 predicted genes of the Beta vulgaris subsp. vulgaris (Amaranthaceae, Carophyllales order) genome assembly. T. ramosissima took longer to show water stress symptoms than T. chinensis. There were 2068 and 669 differentially expressed genes (DEG) in T. chinensis and T. ramosissima, respectively; 332 were DEG in common between the two species. Network analysis showed large biological process networks of similar gene content for each of the species under water deficit. Two distinct molecular function gene ontology networks (binding and transcription factor-related) encompassing multiple up-regulated transcription factors (MYB, NAC, and WRKY) and a cellular components network containing many down-regulated photosynthesis-related genes were identified in T. chinensis, in contrast to one small molecular function network in T. ramosissima

    Improved Draft Genome Sequence of \u3cem\u3eBacillus\u3c/em\u3e sp. Strain YF23, Which Has Plant Growth-Promoting Activity

    Get PDF
    We report here the improved draft genome sequence of Bacillus sp. strain YF23, a bacterium originally isolated from switchgrass (Panicum virgatum) plants and shown to exhibit plant growth-promoting activity. The genome comprised 5.82 Mbp, containing 5,933 genes, with 193 as RNA genes, and a GC content of 35.10%

    IRIS3: integrated cell-type-specific regulon inference server from single-cell RNA-Seq

    Get PDF
    group of genes controlled as a unit, usually by the same repressor or activator gene, is known as a regulon. The ability to identify active regulons within a specific cell type, i.e., cell-type-specific regulons (CTSR), provides an extraordinary opportunity to pinpoint crucial regulators and target genes responsible for complex diseases. However, the identification of CTSRs from single-cell RNA-Seq (scRNA-Seq) data is computationally challenging. We introduce IRIS3, the first-of-its-kind web server for CTSR inference from scRNA-Seq data for human and mouse. IRIS3 is an easy-to-use server empowered by over 20 functionalities to support comprehensive interpretations and graphical visualizations of identified CTSRs. CTSR data can be used to reliably characterize and distinguish the corresponding cell type from others and can be combined with other computational or experimental analyses for biomedical studies. CTSRs can, therefore, aid in the discovery of major regulatory mechanisms and allow reliable constructions of global transcriptional regulation networks encoded in a specific cell type. The broader impact of IRIS3 includes, but is not limited to, investigation of complex diseases hierarchies and heterogeneity, causal gene regulatory network construction, and drug development

    Use of scREAD to explore and analyze single-cell and single-nucleus RNA-seq data for Alzheimer’s disease

    No full text
    Summary: Single-cell RNA-sequencing (scRNA-seq) and single-nucleus RNA-sequencing (snRNA-seq) studies have provided remarkable insights into understanding the molecular pathogenesis of Alzheimer's disease. We recently developed scREAD, a database to provide comprehensive analyses of all the existing AD scRNA-seq and snRNA-seq data from the public domain. Here, we report protocols for using the scREAD web interface and running the backend workflow locally. Our protocols enable custom analyses of AD single-cell and single-nucleus gene expression profiles.For complete details on the use and execution of this protocol, please refer to Jiang et al. (2020)

    The Research Development of Hedonic Price Model-Based Real Estate Appraisal in the Era of Big Data

    No full text
    In the era of big data, advances in relevant technologies are profoundly impacting the field of real estate appraisal. Many scholars regard the integration of big data technology as an inevitable future trend in the real estate appraisal industry. In this paper, we summarize 124 studies investigating the use of big data technology to optimize real estate appraisal through the hedonic price model (HPM). We also list a variety of big data resources and key methods widely used in the real estate appraisal field. On this basis, the development of real estate appraisal moving forward is analyzed. The results obtained in the current studies are as follows: First, the big data resources currently applied to real estate appraisal include more than a dozen big data types from three data sources; the internet, remote sensing, and the Internet of things (IoT). Additionally, it was determined that web crawler technology represents the most important data acquisition method. Second, methods such as data pre-processing, spatial modeling, Geographic information system (GIS) spatial analysis, and the evolving machine learning methods with higher valuation accuracy were successfully introduced into the HPM due to the features of real estate big data. Finally, although the application of big data has greatly expanded the amount of available data and feature dimensions, this has caused a new problem: uneven data quality. Uneven data quality can reduce the accuracy of appraisal results, and, to date, insufficient attention has been paid to this issue. Future research should pay greater attention to the data integration of multi-source big data and absorb the applications developed in other disciplines. It is also important to combine various methods to form a new united evaluation model based on taking advantage of, and avoiding shortcomings to compensate for, the mechanism defects of a single model

    Research on property and burning behavior of flammable casing for underground coal gasification

    No full text
    In this work, the comprehensive properties of flammable casing for underground coal gasification is systematically investigated, including flammable casing material physical, chemical and mechanical properties and full-size flammable casing mechanical properties and burning behavior. The flammable casing material consists of magnesium alloy matrix and rare earth particles, thermal conductivity and expansion property of which are weak. Results of high-temperature tensile test reveal that flammable casing material has good high temperature strength which declines by 30 % at 300 °C. Corrosion rate of flammable casing material is relatively high without extra protection. The full-size flammable casing possesses considerable mechanical property, thread property and high temperature collapse resistance. Burning of flammable casing is safe and stable. Burning rate of flammable casing material can be effectively controlled by water flow. Combustion product of flammable casing presents powder condition, which has no risk of blocking the gasification channel. To sum up, flammable casing is necessary to the realization of underground coal gasifying process, which plays the significant role of the development and application of underground coal gasification technology

    3D-printed hierarchical porous and multidimensional conductive network based on conducting polymer/graphene oxide

    No full text
    Designing ultrathick and hierarchical electrodes is effective to deal with the challenge of high areal capacity and high power density for lithium-ion batteries (LIBs) manufacturing. Here, a thick electrode with hierarchical porous and multidimensional conductive network is fabricated by 3D printing technology, in which both the conducting polymer of poly(3,4-ethylene dioxythiophene):polystyrene sulfonate (PEDOT:PSS) and graphene oxide (GO) play the dual roles as binders and conductive agents. As a consequence, the 3D-printed thick electrode (∼900 μm) with a mass loading of ∼47 mg/cm2 exhibits a good rate capability of 122 mA·h/g at 2 C, a high areal capacity of up to 5.8 mA·h/cm2, and stable cycling performance of ∼95% capacity retention after 100 cycles. Moreover, the C-O-S bond is further confirmed by the spectral analysis and the DFT calculation, which not only hinders the stack of nanosheets but enhances the mechanical stability and electronic conductivity of electrodes. A stable covalent multidimensional conductive network constructed by 3D-printing technology provides a new design strategy to improve the performance of LIBs
    corecore