14 research outputs found

    HTRgene: a computational method to perform the integrated analysis of multiple heterogeneous time-series data: case analysis of cold and heat stress response signaling genes in Arabidopsis

    Get PDF
    Background Integrated analysis that uses multiple sample gene expression data measured under the same stress can detect stress response genes more accurately than analysis of individual sample data. However, the integrated analysis is challenging since experimental conditions (strength of stress and the number of time points) are heterogeneous across multiple samples. Results HTRgene is a computational method to perform the integrated analysis of multiple heterogeneous time-series data measured under the same stress condition. The goal of HTRgene is to identify response order preserving DEGs that are defined as genes not only which are differentially expressed but also whose response order is preserved across multiple samples. The utility of HTRgene was demonstrated using 28 and 24 time-series sample gene expression data measured under cold and heat stress in Arabidopsis. HTRgene analysis successfully reproduced known biological mechanisms of cold and heat stress in Arabidopsis. Also, HTRgene showed higher accuracy in detecting the documented stress response genes than existing tools. Conclusions HTRgene, a method to find the ordering of response time of genes that are commonly observed among multiple time-series samples, successfully integrated multiple heterogeneous time-series gene expression datasets. It can be applied to many research problems related to the integration of time series data analysis.This work, including publication costs, was supported by National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT (No.NRF-2017M3C4A7065887). This work was also supported by the Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) (No. NRF-2014M3C9A3063541), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI15C3224). This work was supported for W.J. by the Agenda program (No.PJ012465032019), Rural Development of dministration of Republic of Korea

    StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis

    Get PDF
    Background Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built. Results In this study, we developed StressGenePred, a neural network-based machine learning method, to integrate time-series transcriptome data of multiple stress types. StressGenePred is designed to detect single stress-specific biomarker genes by using a simple feature embedding method, a twin neural network model, and Confident Multiple Choice Learning (CMCL) loss. The twin neural network model consists of a biomarker gene discovery and a stress type prediction model that share the same logical layer to reduce training complexity. The CMCL loss is used to make the twin model select biomarker genes that respond specifically to a single stress. In experiments using Arabidopsis gene expression data for four major environmental stresses, such as heat, cold, salt, and drought, StressGenePred classified the types of stress more accurately than the limma feature embedding method and the support vector machine and random forest classification methods. In addition, StressGenePred discovered known stress-related genes with higher specificity than the Fisher method. Conclusions StressGenePred is a machine learning method for identifying stress-related genes and predicting stress types for an integrated analysis of multiple stress time-series transcriptome data. This method can be used to other phenotype-gene associated studies.This work and publication costs were supported by National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (No. NRF2017M3C4A7065887), and the Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) (No. NRF-2014M3C9A3063541). This work was supported for W.J. by the Agenda program (No. PJ014307), Rural Development of Administration of Republic of Korea

    Impact of mutations in DNA methylation modification genes on genome-wide methylation landscapes and downstream gene activations in pan-cancer

    Get PDF
    In cancer, mutations of DNA methylation modification genes have crucial roles for epigenetic modifications genome-wide, which lead to the activation or suppression of important genes including tumor suppressor genes. Mutations on the epigenetic modifiers could affect the enzyme activity, which would result in the difference in genome-wide methylation profiles and, activation of downstream genes. Therefore, we investigated the effect of mutations on DNA methylation modification genes such as DNMT1, DNMT3A, MBD1, MBD4, TET1, TET2 and TET3 through a pan-cancer analysis. First, we investigated the effect of mutations in DNA methylation modification genes on genome-wide methylation profiles. We collected 3,644 samples that have both of mRNA and methylation data from 12 major cancer types in The Cancer Genome Atlas (TCGA). The samples were divided into two groups according to the mutational signature. Differentially methylated regions (DMR) that overlapped with the promoter region were selected using minfi and differentially expressed genes (DEG) were identified using EBSeq. By integrating the DMR and DEG results, we constructed a comprehensive DNA methylome profiles on a pan-cancer scale. Second, we investigated the effect of DNA methylations in the promoter regions on downstream genes by comparing the two groups of samples in 11 cancer types. To investigate the effects of promoter methylation on downstream gene activations, we performed clustering analysis of DEGs. Among the DEGs, we selected highly correlated gene set that had differentially methylated promoter regions using graph based sub-network clustering methods. We chose an up-regulated DEGs cluster where had hypomethylated promoter in acute myeloid leukemia (LAML) and another down-regulated DEGs cluster where had hypermethylated promoter in colon adenocarcinoma (COAD). To rule out effects of gene regulation by transcription factor (TF), if differentially expressed TFs bound to the promoter of DEGs, that DEGs did not included to the gene set that effected by DNA methylation modifiers. Consequently, we identified 54 hypomethylated promoter DMR up-regulated DEGs in LAML and 45 hypermethylated promoter DMR down-regulated DEGs in COAD. Our study on DNA methylation modification genes in mutated vs. non-mutated groups could provide useful insight into the epigenetic regulation of DEGs in cancer.This research is supported by National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (No. NRF-2017M3C4A7065887), the Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) (No. NRF-2014M3C9A3063541), and a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI15C3224). The funding bodies provided financial support but had no other role in the design of the study, data collection, analysis, and interpretation of data, decision to publish, or preparation of the manuscript

    Ensemble Learning to Identify Depression Indicators for Korean Farmers

    No full text
    Understanding the factors contributing to depression in farmers is crucial for ensuring their well-being and productivity. To address this issue, our study delves into depression factors among farmers, employing advanced tree-based machine learning (ML) algorithms, specifically focusing on the Category Boosting (CatB) algorithm. Applying the Patient Health Questionnaire-9 (PHQ-9) criteria, 2,446 individuals among 14,810 repondents were classified into depression including mild symptoms. In the classification, CatB achieved an impressive 79.7% accuracy and 81.4% F1 score compared to the other tree-based ensemble models (Random Forest - RF, Extra Trees - ET, and XGBoost - XGB). RF showed the highest sensitivity at 90.0% and the 81.3% F1 score followed by CatB. For the feature importances, the Gini impurity was predominantly used to assess in the RF and ET models. Through the analysis of feature importances, ‘Health’, ‘Sleep time’, ‘Busyness’, ‘Income’, and ‘Frequency of wearing protective gear’ were identified as significant features. These results highlighted the significance of treatment strategies for individuals at high risk. and developing treatment strategies for high-risk individuals in the agricultural sector. Empowering healthcare providers by giving them access to this tool can lead to more effective interventions, potentially reducing the burden of depression and enhancing farmers’ productivity

    IDEA: Integrating Divisive and Ensemble-Agglomerate hierarchical clustering framework for arbitrary shape data

    No full text
    © 2021 IEEE.Hierarchical clustering, a traditional clustering method, has been getting attention again. Among several reasons, a credit goes to a recent paper by Dasgupta in 2016 that proposed a cost function that quantitatively evaluates hierarchical clustering trees. An important question is how to combine this recent advance with existing successful clustering methods. In this paper, we propose a hierarchical clustering method to minimize the cost function of clustering tree by incorporating existing clustering techniques. First, we developed an ensemble tree-search method that finds an integrated tree with reduced cost by integrating multiple existing hierarchical clustering methods. Second, to operate on large and arbitrary shape data, we designed an efficient hierarchical clustering framework, called integrating divisive and ensemble-agglomerate (IDEA) by combining it with advanced clustering techniques such as nearest neighbor graph construction, divisive-agglomerate hybridization, and dynamic cut tree. The IDEA clustering method showed better performance in minimizing Dasgupta's cost and improving accuracy (adjusted rand index) over existing cost-minimization-based, and density-based hierarchical clustering methods in experiments using arbitrary shape datasets and complex biology-domain datasets.N

    Clustering and evolutionary analysis of small RNAs identify regulatory siRNA clusters induced under drought stress in rice

    Get PDF
    This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were madeMotivation Drought tolerance is an important trait related to growth and yield in crop. Until now, drought related research has focused on coding genes. However, non-coding RNAs also respond significantly to environmental stimuli such as drought stress. Unfortunately, characterizing the role of siRNAs under drought stress is difficult since a large number of heterogenous siRNA species are expressed under drought stress and non-coding RNAs have very weak evolutionary conservation. Thus, to characterize the role of siRNAs, we need a well designed biological and bioinformatics strategy. In this paper, to characterize the function of siRNAs we developed and used a bioinformatics pipeline that includes a genomic-location based clustering technique and an evolutionary conservation tool. Results By comparing the wild type Nipponbare and two drought resistant rice varities, we found that 21 nt and 24 nt siRNAs are significantly expressed in the three rice plants but at different time points under a short-term (0, 1, and 6 hrs) drought treatment. siRNAs were up-regulated in the wild type at an early stage while the up-regulation was delayed in the two drought tolerant plants. Genes targeted by up-regulated siRNAs were related to oxidation reduction and proteolysis, which are well known to be associated with water deficit phenotypes. More interestingly, we found that siRNAs were located in intronic regions as clusters and were of high evolutionary conservation among monocot grass plants. In summary, we show that siRNAs are important respondents to drought stress and regulate genes related to the drought tolerance in water deficit conditions

    Disanthus cercidifolius Maxim.

    Get PDF
    原著和名: マルバノキ ベニマンサク科名: マンサク科 = Hamamelidaceae採集地: 千葉県 船橋市三山2-2-1 東邦大学 (下総 東邦大学)採集日: 1970/11/18採集者: 萩庭丈壽整理番号: JH008116国立科学博物館整理番号: TNS-VS-95811

    Transcriptional Network Analysis Reveals Drought Resistance Mechanisms of AP2/ERF Transgenic Rice

    No full text
    This study was designed to investigate at the molecular level how a transgenic version of rice “Nipponbare” obtained a drought-resistant phenotype. Using multi-omics sequencing data, we compared wild-type rice (WT) and a transgenic version (erf71) that had obtained a drought-resistant phenotype by overexpressing OsERF71, a member of the AP2/ERF transcription factor (TF) family. A comprehensive bioinformatics analysis pipeline, including TF networks and a cascade tree, was developed for the analysis of multi-omics data. The results of the analysis showed that the presence of OsERF71 at the source of the network controlled global gene expression levels in a specific manner to make erf71 survive longer than WT. Our analysis of the time-series transcriptome data suggests that erf71 diverted more energy to survival-critical mechanisms related to translation, oxidative response, and DNA replication, while further suppressing energy-consuming mechanisms, such as photosynthesis. To support this hypothesis further, we measured the net photosynthesis level under physiological conditions, which confirmed the further suppression of photosynthesis in erf71. In summary, our work presents a comprehensive snapshot of transcriptional modification in transgenic rice and shows how this induced the plants to acquire a drought-resistant phenotype
    corecore