18 research outputs found

    Statistical approaches of gene set analysis with quantitative trait loci for high-throughput genomic studies.

    Get PDF
    Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on gene ontology terms, known biological pathways, etc., which may not establish any formal relation between genotype and trait specific phenotype. Further, in plant biology and breeding, gene set analysis with trait specific Quantitative Trait Loci data are considered to be a great source for biological knowledge discovery. Therefore, innovative statistical approaches are developed for analyzing, and interpreting gene expression data from Microarrays, RNA-sequencing studies in the context of gene sets with trait specific Quantitative Trait Loci. The utility of the developed approaches is studied on multiple real gene expression datasets obtained from various Microarrays and RNA-sequencing studies. The selection of gene sets through differential expression analysis is the primary step of gene set analysis, and which can be achieved through using gene selection methods. The existing methods for such analysis in high-throughput studies, such as Microarrays, RNA-sequencing studies, suffer from serious limitations. For instance, in Microarrays, most of the available methods are either based on relevancy or redundancy measures. Through these methods, the ranking of genes is done on single Microarray expression data, which leads to the selection of spuriously associated, and redundant gene sets. Therefore, newer, and innovative differential expression analytical methods have been developed for Microarrays, and single-cell RNA-sequencing studies for identification of gene sets to successfully carry out the gene set and other downstream analyses. Furthermore, several methods specifically designed for single-cell data have been developed in the literature for the differential expression analysis. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to review the performance of the existing methods. Hence, a comprehensive overview, classification, and comparative study of the available single-cell methods is hereby undertaken to study their unique features, underlying statistical models and their shortcomings on real applications. Moreover, to address one of the shortcomings (i.e., higher dropout events due to lower cell capture rates), an improved statistical method for downstream analysis of single-cell data has been developed. From the users’ point of view, the different developed statistical methods are implemented in various software tools and made publicly available. These methods and tools will help the experimental biologists and genome researchers to analyze their experimental data more objectively and efficiently. Moreover, the limitations and shortcomings of the available methods are reported in this study, and these need to be addressed by statisticians and biologists collectively to develop efficient approaches. These new approaches will be able to analyze high-throughput genomic data more efficiently to better understand the biological systems and increase the specificity, sensitivity, utility, and relevance of high-throughput genomic studies

    DISSECTION OF STRESS RESPONSE NETWORKS REGULATING MULTIPLE STRESSES IN RICE

    Get PDF
    Important food crops like rice are constantly exposed to various stresses that can have devastating effect on their survival and productivity. Being sessile, these highly evolved organisms have developed elaborate molecular machineries to sense a mixture of stress signals and elicit a precise response to minimize the damage. However, recent discoveries revealed that the interplay of these stress regulatory and signaling molecules is highly complex and remains largely unknown. In this work, we conducted large scale analysis of differential gene expression using advanced computational methods to dissect regulation of stress response which is at the heart of all molecular changes leading to the observed phenotypic susceptibility. One of the most important stress conditions in terms of loss of productivity is drought. We performed genomic and proteomic analysis of epigenetic and miRNA mechanisms in regulation of drought responsive genes in rice and found subsets of genes with striking properties. Overexpressed genesets included higher number of epigenetic marks, miRNA targets and transcription factors which regulate drought tolerance. On the other hand, underexpressed genesets were poor in above features but were rich in number of metabolic genes with multiple co-expression partners contributing majorly towards drought resistance. Identification and characterization of the patterns exhibited by differentially expressed genes hold key to uncover the synergistic and antagonistic components of the cross talk between stress response mechanisms. We performed meta-analysis on drought and bacterial stresses in rice and Arabidopsis, and identified hundreds of shared genes. We found high level of conservation of gene expression between these stresses. Weighted co-expression network analysis detected two tight clusters of genes made up of master transcription factors and signaling genes showing strikingly opposite expression status. To comprehensively identify the shared stress responsive genes between multiple abiotic and biotic stresses in rice, we performed meta-analyses of microarray studies from seven different abiotic and six biotic stresses separately and found more than thirteen hundred shared stress responsive genes. Various machine learning techniques utilizing these genes classified the stresses into two major classes\u27 namely abiotic and biotic stresses and multiple classes of individual stresses with high accuracy and identified the top genes showing distinct patterns of expression. Functional enrichment and co-expression network analysis revealed the different roles of plant hormones, transcription factors in conserved and non-conserved genesets in regulation of stress response

    A new computational framework for the classification and function prediction of long non-coding RNAs

    Get PDF
    Long non-coding RNAs (lncRNAs) are known to play a significant role in several biological processes. These RNAs possess sequence length greater than 200 base pairs (bp), and so are often misclassified as protein-coding genes. Most Coding Potential Computation (CPC) tools fail to accurately identify, classify and predict the biological functions of lncRNAs in plant genomes, due to previous research being limited to mammalian genomes. In this thesis, an investigation and extraction of various sequence and codon-bias features for identification of lncRNA sequences has been carried out, to develop a new CPC Framework. For identification of essential features, the framework implements regularisation-based selection. A novel classification algorithm is implemented, which removes the dependency on experimental datasets and provides a coordinate-based solution for sub-classification of lncRNAs. For imputing the lncRNA functions, lncRNA-protein interactions have been first determined through co-expression of genes which were re-analysed by a sequence similaritybased approach for identification of novel interactions and prediction of lncRNA functions in the genome. This integrates a D3-based application for visualisation of lncRNA sequences and their associated functions in the genome. Standard evaluation metrics such as accuracy, sensitivity, and specificity have been used for benchmarking the performance of the framework against leading CPC tools. Case study analyses were conducted with plant RNA-seq datasets for evaluating the effectiveness of the framework using a cross-validation approach. The tests show the framework can provide significant improvements on existing CPC models for plant genomes: 20-40% greater accuracy. Function prediction analysis demonstrates results are consistent with the experimentally-published findings

    Root microbiota functions in mitigating abiotic and biotic stresses in Arabidopsis

    Get PDF
    In nature, plants face both biotic and abiotic stresses while at the same time engaging in complex interactions with a vast diversity of commensal microorganisms comprising bacteria, fungi, and oomycetes. This so-called plant microbiota is thought to promote resistance to pathogens and tolerance to specific environmental constraints, likely driving local adaptation in natural plant populations. Reductionist approaches with synthetic microbial communities assembled from microbial culture collections and gnotobiotic plant systems now allow detailed dissection of microbiota-plant-stress interactions under strictly controlled laboratory conditions. Mechanistic understanding into how the root microbiota promotes mineral nutrition and pathogen protection in plants is now emerging. However, whether belowground response to microbial root commensals and aboveground response to abiotic stresses are connected remains largely unexplored. By reconstituting a synthetic, multi-kingdom root microbiota with different microbial input ratios in two gnotobiotic systems (the calcined-clay system and the FlowPot system) (Chapter I), I first showed that distinct input ratios of bacteria, fungi, and oomycetes converge into a similar output community composition, with stable effects on Arabidopsis growth. By testing different abiotic and biotic stresses in three gnotobiotic plant systems (the FlowPot system, the calcined-clay system, and the white sand system) (Chapter I), I provided evidence that salt, drought, and shade stresses negatively affected plant growth across all three systems, whereas nutritional stress affected on plant performance in a system-dependent manner. Moreover, I demonstrated that a synthetic multi-kingdom root microbiota rescued Arabidopsis growth under salt, drought and light limitation stresses in the FlowPot system and the white sand system (Chapter I). Given the importance of light for plant growth, in chapter II, I further dissected the extent to which response to the synthetic root microbiota and light are interconnected. By manipulating light conditions (low photosynthetically active radiation, LP; end of day far red-light treatment, EODFR) in the FlowPot system, I demonstrated that microbial root commensals confer Arabidopsis tolerance to light limitation stresses and that reciprocally, modification in aboveground light condition shifts the composition of root microbial communities. Notably, this shift in the structure of root bacterial community significantly explains the microbiota-induced growth rescue under LP. Arabidopsis transcriptome analysis revealed that immune responses in root and systemic defense responses in shoot were induced in the presence of the root microbiota under normal light conditions. These host responses were largely shut down under light limiting conditions and were correlated with increased susceptibility to unrelated leaf pathogens, implying that root microbiota-induced systemic defense responses were modulated by light. Through an extensive Arabidopsis mutant screen, I demonstrated that root microbiota-mediated plant survival under LP depends on jasmonic acid biosynthesis and signaling, cryptochromes and brassinosteroids. Furthermore, I present genetic evidence that orchestration of this light-dependent growth-defense trade-off requires the transcriptional regulator MYC2. The data suggest that plants can take advantage of root commensals to activate either growth or defense depending on aboveground light conditions

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

    Dynamical Modeling Techniques for Biological Time Series Data

    Get PDF
    The present thesis is articulated over two main topics which have in common the modeling of the dynamical properties of complex biological systems from large-scale time-series data. On one hand, this thesis analyzes the inverse problem of reconstructing Gene Regulatory Networks (GRN) from gene expression data. This first topic seeks to reverse-engineer the transcriptional regulatory mechanisms involved in few biological systems of interest, vital to understand the specificities of their different responses. In the light of recent mathematical developments, a novel, flexible and interpretable modeling strategy is proposed to reconstruct the dynamical dependencies between genes from short-time series data. In addition, experimental trade-offs and optimal modeling strategies are investigated for given data availability. Consistent literature on these topics was previously surprisingly lacking. The proposed methodology is applied to the study of circadian rhythms, which consists in complex GRN driving most of daily biological activity across many species. On the other hand, this manuscript covers the characterization of dynamically differentiable brain states in Zebrafish in the context of epilepsy and epileptogenesis. Zebrafish larvae represent a valuable animal model for the study of epilepsy due to both their genetic and dynamical resemblance with humans. The fundamental premise of this research is the early apparition of subtle functional changes preceding the clinical symptoms of seizures. More generally, this idea, based on bifurcation theory, can be described by a progressive loss of resilience of the brain and ultimately, its transition from a healthy state to another characterizing the disease. First, the morphological signatures of seizures generated by distinct pathological mechanisms are investigated. For this purpose, a range of mathematical biomarkers that characterizes relevant dynamical aspects of the neurophysiological signals are considered. Such mathematical markers are later used to address the subtle manifestations of early epileptogenic activity. Finally, the feasibility of a probabilistic prediction model that indicates the susceptibility of seizure emergence over time is investigated. The existence of alternative stable system states and their sudden and dramatic changes have notably been observed in a wide range of complex systems such as in ecosystems, climate or financial markets

    The senescence associated gene HvS40 of barley

    Get PDF
    In der vorliegenden Arbeit wurde das seneszenzassoziierte Gen HvS40 der Gerste als dual kodierendes Gen charakterisiert. Damit wurde ein solches Gen erstmals in Pflanzen beschrieben. Der alternative S40+1-Leserahmen, der den kanonischen Leserahmen im 5'-Bereich überragt, konnte auch in anderen monokotylen, jedoch nicht in dikotylen Arten gefunden werden. Das S40-Protein, das durch den kanonischen Leserahmen S40+3 kodiert wird, kann der pflanzenspezifischen Proteinfamilie DUF584 zugeordnet werden. Diese kommt sowohl in monokotylen als auch in dikotylen Pflanzen vor. Der in dieser Arbeit generierte Stammbaum zeigt, dass die DUF584-Proteine monokotyler Arten in sieben Gruppen eingeteilt werden können. Dabei bilden die Proteine, die einen alternativen Leserahmen in ihrer kodierenden Sequenz aufweisen, eine eigene phylogenetische Untergruppe. Versuche zur subzellulären Lokalisation mit PEND- und GFP-Fusionsproteinen zeigten, dass die Alt-S40-Proteine von Gerste und Weizen mit dem PEND-Protein von Arabidopsis in Plastiden und Zellkern derselben Zelle ko-loklisiert waren. Die S40-Proteine zeigten in transient transformierten Zwiebelepidermiszellen hingegen eine ausschließliche Lokalisation im Zellkern. Es konnte gezeigt werden, dass der Transport von Alt-HvS40:GFP in die Plastiden abhängig vom Entwicklungszustand der Zelle ist: In Gerstenprotoplasten aus jungem Gewebe war Alt-HvS40:GFP im Cytoplasma und im Zellkern detektierbar, in Protoplasten aus seneszierendem Gewebe hingegen in den Plastiden und im Zellkern derselben Zelle lokalisiert. Untersuchungen an Gerstentransformanten mit verändertem S40-Transkriptgehalt zeigten, dass beide Proteine eine regulatorische Funktion während des Seneszenzprozesses ausüben. Dabei agiert das HvS40-Protein als negativer Regulator der Seneszenz, während das alternative S40-Protein den Seneszenzprozess fördert. Es ist wahrscheinlich, dass beide Proteine eine regulatorische Funktion als DNA-Bindeproteine ausüben.In this study the senescence associated gene HvS40 of barley was identified as dual coding gene thereby being the first dual coding gene described in plants. The alternative reading frame S40+1 which overlaps the canonical reading frame S40+3 in 5' direction can also be found in other monocotyledonous but not in dicotyledonous plants like Arabidopsis. The S40 protein of barley encoded by the canonical reading frame S40+3 belongs to the plant specific DUF584-protein family which is present in monocotyledons as well as in dicotyledons. A phylogenetic tree was created using the protein sequences of the monocotyledonous DUF584-proteins. Within this phylogenetic tree the DUF584-protein showing an alternative reading frame in the coding sequence constitute an own subgroup. Co-localization experiments with transiently transformed onion epidermis cells confirmed the subcellular localization of Alt-HvS40:GFP and Alt-TdS40:GFP in plastids and the nucleus of the same cell, whereas the S40-proteins of barley and wheat show an exclusive subcellular localization in the nucleus of transiently transformed onion epidermis cells. It was further demonstrated that the subcellular localization of Alt-HvS40:GFP in the chloroplasts is age dependent: In barley protoplasts isolated from young, non-senescent leaves, Alt-HvS40:GFP showed a subcellular localization in the cytoplasma and the nucleus, whereas a subcellular localization in the chloroplasts could only be observed in protoplasts derived from senescent tissue. To gain more insight into the function of the S40 proteins as potential senescence regulators, transgenic barley lines with altered S40 transcript levels were characterized. The S40 protein from barley was shown to be a negative regulator of senescence whereas the Alt-S40 protein might act as a positive regulator of the senescence process. It is likely that both proteins function as DNA binding proteins

    Pertanika Journal of Science & Technology

    Get PDF

    Multi-empirical investigations on the population genetic structure, ecological niche, and regeneration of Ivesia webberi with conservation implications

    Get PDF
    Ecosystems often contain a few cosmopolitan species and a large number of rare species. Despite their relative low abundance and biomass, rare species support the multifunctionality and resilience of ecosystems. Therefore, empirical studies on rare and range-restricted species can increase our understanding of eco-evolutionary underpinnings of species and ecosystem persistence, and generate sufficient knowledge to design effective conservation programs. These research studies can also benefit conservation programs for rare and range-restricted species, which are often prioritized. This research focuses on Ivesia webberi, a federally threatened perennial forb and the vegetative communities that harbor the species. Specifically, empirical studies investigated the following: (1) species-environment relationship of I. webberi using iterative and multi-year ecological niche modeling with complementary model-guided sampling, to describe and predict suitable habitats; (2) the relationship between soil seed bank and aboveground vegetation in plant communities where I. webberi is found, to understand the regeneration niche of I. webberi and assess ecological resilience of the vegetative communities; (3) genetic diversity, structure, and functional connectivity among I. webberi populations in order to characterize genetic resources and therefore evolutionary potential; (4) the relationships between genome size variation and bioclimatic variables within I. webberi and among Ivesia taxa; and (5) seed viability of I. webberi, including spatiotemporal variability and storage behavior.Findings from the 5-year iterative niche modeling study resulted in the discovery of seven novel populations, an expansion of the known species distribution range, and identification of important environmental drivers of the ecological niche of I. webberi. Native species richness was higher in aboveground vegetation in the sampled sites where I. webberi occurs while the soil seed bank is dominated by invasive annual grasses. This resulted in low floristic similarity between the aboveground vegetation and the soil seed bank, and highlights the importance of seeding with native plants and control of invasive plant species to maintain the ecological legacies of these sites in the Great Basin Desert. Genetic diversity is relatively low across I. webberi populations and exhibited significant spatial genetic structure; functional connectivity was influenced by synergistic effects of geographic distance and landscape features. However, I. webberi exhibits a significant temporal, not geographical, variation in seed viability, and seed viability potentially reduces with storage time suggesting a recalcitrant behavior. Seed viability can be reliably estimated and monitored using non-destructive x-ray imagery and multispectral imaging techniques. An 8-fold variation in genome size of 31 Ivesia taxa was observed, ranging from 0.73 pg/2C in I. baileyi var. beneolens to 5.91 pg/2C in I. lycopodioides ssp. megalopetala. This genome size variation significantly correlated with actual evapotranspiration and seed size. Inference from genome size suggest that all sampled Ivesia are diploid with 28 chromosomes. Similar significant correlations between intraspecific genome size variation in I. webberi and evapotranspiration and seed size were observed; genome size was larger in I. webberi populations closer to the species’ range center and smaller towards the margin. Relatively small genome sizes and their correlations with functional trait and energy availability indicate that genome size has adaptive significance for these desert-adapted species. Overall, the findings of these studies have advanced scientific knowledge on the eco-evolutionary processes in a range-restricted plant species in the Great Basin Desert, and provide useful information to design effective conservation programs
    corecore