129 research outputs found

    Efficient gene set analysis of high-throughput data : From omics to pathway architecture of health and disease

    Get PDF
    Background: A wide range of diseases, normal variations in physiology and development of different species are caused by alterations in gene regulation. The study of gene expression is thus crucial for understanding both normal physiology and disease mechanisms. High-throughput mea- surement technologies allow the profiling of tens of thousands of genes simultaneously. However, the high volume of data thus generated poses methodological challenges in inferring biological consequences from gene expression changes. Traditional gene wise analysis of high dimensional data is overwhelming, prone to noise and unintuitive. The analysis of sets of genes (gene set analysis, GSA), solves the problem by boosting statistical power and biological interpretability. Despite more than a decade of research on gene set analysis, there are still serious limitations in the existing methods. Aims of the study: The objectives of this study were: (1) development of an efficient p-value estimation method for GSA; (2) development of an advanced permutation method for GSA of multi-group gene expression data with fewer replicates; and (3) implementation of the developed methods for the identification of novel smoking induced epigenetic signatures at biological pathway level. Materials and methods: The first study involved the assessment of four different statistical null models for modeling the distribution of gene set scores calculated with the Gene Set Z-score (GSZ) function from permuted gene expression data. A new GSA method - modified GSZ (mGSZ) - based on GSZ and the most optimal distribution model was developed. mGSZ was evaluated by comparing its results with seven other popular GSA methods using four different publicly available gene expression datasets. The second study involved the evaluation of six different permutation schemes for GSA of multi-group (more than two groups) datasets based on the identification of reference gene sets generated using a novel data splitting approach. A new GSA method based on a modification of mGSZ (mGSZm) was developed by implementing the best permutation method for the analysis of multi-group data with fewer than six replicates per group. mGSZm was evaluated by contrasting its performance with seven other state-of-the-art GSA methods suitable for multi-group data. The evaluation was based on three different publicly available multi-group datasets. The third study involved an implementation of mGSZ for GSA of genome-wide DNA methylation data from the Cardiovascular Risk in Young Finns study (YFS) cohort with gene sets downloaded from the Molecular Signature Database (MSigDB). Methylation measurements were done on a subset of 192 individuals from whole-blood samples from the 2011 follow-up study using Illumina Infinium HumanMethylation450 BeadChips. Results: Overall, efficient and robust GSA methods were developed (studies I-II) and implemented (study III). In study I, the results demonstrated a clear advantage of asymptotic p-value estimation over empirical methods. mGSZ, a GSA method based on asymptotic p-values, requires fewer permutations which speeds up the analysis process. mGSZ outperformed state-of-the-art methods based on three different evaluations with three different datasets. In study II, results from a novel evaluation approach with two different datasets suggested that the proposed advanced permutation method outperformed the naive permutation method in GSA of multi-group data with fewer than six replicates. Evaluation of mGSZm, a GSA method equipped with the advanced permutation method and asymptoticn/

    Post-harvest losses in mandarin orange: A case study of Dhankuta District, Nepal.

    Get PDF
    Worldwide postharvest fruit and vegetables losses are as high as 30 to 40% and even much higher in developing countries like Nepal. A systematic survey was conducted to assess the extent of loss due to post harvest conditions in oranges at field, transport, storage and market levels during October to January, 2011. The survey data were collected using oral questionnaires, personal interviews, group discussions and informal observation in the field and Krishi Bazar, Dharan. The production of oranges in Dhankuta this year was found to be reduced by 40 to 50% than previous year which was observed to be followed by alternate pattern. Consequently, the price was doubled this year. The post harvest loss was found to be 46% from harvesting to distribution. The losses during harvesting, transportation, grading, packaging and marketing were found to be 7, 25, 3, 1 and 5% maximum, respectively. The storage losses were found to be 5% during 2 to 4 days in Krish Bazar while 40.1% during 21 days experimental condition in room. The losses in experimental condition comprised 15.02% evaporation loss, 14.34% pathological loss and 10.74% other losses. The most observed disease was fungal attack in oranges. Reducing postharvest losses is very important; ensuring that sufficient food, both in quantity and in quality is available to every inhabitant in our planet. Postharvest horticulturists need to coordinate their efforts with those of production horticulturists, agricultural marketing economists, engineers, food technologists, and others who may be involved in various aspects of the production and marketing system

    Pesticide applications in Agriculture and its Environmental and Human Health Impacts

    Get PDF
    The use of chemicals in modern agriculture has significantly increased productivity is very common now a days. There has been an increase in the concentration of pesticides in food and in our environment, with associated negative effects on human health and the environment. The excessive use of pesticides has generated increasing concerns on the negative effects of human health as well as the environment. Impact on the environment of Pesticides can pose serious distress on soil, water, territory, and other vegetation. The pesticides application directly kill the insects, pest, weeds and pathogens but it also indirectly can be harmful and toxic on to the host of the other organism which are birds, beneficial insects, and all other non-target plant and animals. Insecticides are usually the most extremely toxic class of pesticides; however, herbicides can also pose risks to non-target organisms. With this concern most of the pesticides and chemicals are non-biodegradable, and as a result of bioaccumulation, they can enter into the food chain and eventually distress human and animal health, on the whole environment and ecosystem

    Variable impacts on Environment during Construction and Operation of Dam Projects

    Get PDF
    Dams are playing a significant role in utilizing the resources of water and have a larger impact on the river ecosystem. It has an enormous deal of positive and negative effects on the environment in addition to their benefits like managing stream regimes, as a result preventing floods, obtaining domestic and irrigation water from the stored water and producing energy. The acute and chronic effects due to the construction of the dam are various and categorized according to the area, the services provided by the dams to the community and also its unsocial impacts, advantageous and detrimental impacts on nearby communities and to the aquatic environment These consequences of the construction of any dam project may be commanded in a rigorous and complicated approach resembling climatic, hydraulic, biological, communal, intellectual, archaeological etc. The role of Dams and their benefits are much more and impact directly in our social and environmental life, but it is also a key point that we have to focus about the negative effects of these developmental activities and major and minor dam construction projects by the way of water resource engineering and sustainable development. Dams have the majority of significant functions in utilizing water resources. All through the history of the world, dams have been used successfully in collecting, storing and managing water needed to uphold civilization. Dams have a great deal of affirmative and pessimistic effects on the environment. The advantages are also varying from modest to many folds to the community like controlling stream regime as a result of preventing floods, obtaining domestic and irrigation water from stored water and generating energy from hydropower. Whereas dam endows with significant benefit to our civilization, their impact on the surrounding includes resettlement and relocation, socioeconomic impact, environmental concerns, sedimentation issue, safety aspects etc. Over and above their incredibly important communal and ecological benefits, it is significant to moderate the negative effects of the dam on the environment regarding sustainable development

    Robust multi-group gene set analysis with few replicates

    Get PDF
    Background: Competitive gene set analysis is a standard exploratory tool for gene expression data. Permutation-based competitive gene set analysis methods are preferable to parametric ones because the latter make strong statistical assumptions which are not always met. For permutation-based methods, we permute samples, as opposed to genes, as doing so preserves the inter-gene correlation structure. Unfortunately, up until now, sample permutation-based methods have required a minimum of six replicates per sample group. Results: We propose a new permutation-based competitive gene set analysis method for multi-group gene expression data with as few as three replicates per group. The method is based on advanced sample permutation technique that utilizes all groups within a data set for pairwise comparisons. We present a comprehensive evaluation of different permutation techniques, using multiple data sets and contrast the performance of our method, mGSZm, with other state of the art methods. We show that mGSZm is robust, and that, despite only using less than six replicates, we are able to consistently identify a high proportion of the top ranked gene sets from the analysis of a substantially larger data set. Further, we highlight other methods where performance is highly variable and appears dependent on the underlying data set being analyzed. Conclusions: Our results demonstrate that robust gene set analysis of multi-group gene expression data is permissible with as few as three replicates. In doing so, we have extended the applicability of such approaches to resource constrained experiments where additional data generation is prohibitively difficult or expensive. An R package implementing the proposed method and supplementary materials are available from the website http:// ekhidna.biocenter.helsinki.fi/downloads/pashupati/mGSZm.html.Peer reviewe

    Epigenome-450K-wide methylation signatures of active cigarette smoking : The Young Finns Study

    Get PDF
    Smoking as a major risk factor for morbidity affects numerous regulatory systems of the human body including DNA methylation. Most of the previous studies with genome-wide methylation data are based on conventional association analysis and earliest threshold-based gene set analysis that lacks sensitivity to be able to reveal all the relevant effects of smoking. The aim of the present study was to investigate the impact of active smoking on DNA methylation at three biological levels: 5'-C-phosphate-G-3' (CpG) sites, genes and functionally related genes (gene sets). Gene set analysis was done with mGSZ, a modern threshold-free method previously developed by us that utilizes all the genes in the experiment and their differential methylation scores. Application of such method in DNA methylation study is novel. Epigenome-wide methylation levels were profiled from Young Finns Study (YFS) participants' whole blood from 2011 follow-up using Illumina Infinium Hu-manMethylation450 BeadChips. We identified three novel smoking related CpG sites and replicated 57 of the previously identified ones. We found that smoking is associated with hypomethylation in shore (genomic regions 0-2 kilobases from CpG island). We identified smoking related methylation changes in 13 gene sets with false discovery rate (FDR)Peer reviewe

    Multi-Omics Integration in a Twin Cohort and Predictive Modeling of Blood Pressure Values

    Get PDF
    Abnormal blood pressure is strongly associated with risk of high-prevalence diseases, making the study of blood pressure a major public health challenge. Although biological mechanisms underlying hypertension at the single omic level have been discovered, multi-omics integrative analyses using continuous variations in blood pressure values remain limited. We used a multi-omics regression-based method, called sparse multi-block partial least square, for integrative, explanatory, and predictive interests in study of systolic and diastolic blood pressure values. Various datasets were obtained from the Finnish Twin Cohort for up to 444 twins. Blocks of omics-including transcriptomic, methylation, metabolomic-data as well as polygenic risk scores and clinical data were integrated into the modeling and supported by cross-validation. The predictive contribution of each omics block when predicting blood pressure values was investigated using external participants from the Young Finns Study. In addition to revealing interesting inter-omics associations, we found that each block of omics heterogeneously improved the predictions of blood pressure values once the multi-omics data were integrated. The modeling revealed a plurality of clinical, transcriptomic, and metabolomic factors consistent with the literature and that play a leading role in explaining unit variations in blood pressure. These findings demonstrate (1) the robustness of our integrative method to harness results obtained by single omics discriminant analyses, and (2) the added value of predictive and exploratory gains of a multi-omics approach in studies of complex phenotypes such as blood pressure.Peer reviewe

    Uncovering the complex genetics of human personality: response from authors on the PGMRA Model

    Get PDF
    Following publication of our two articles [1, 2], a critique of the methodology of Phenotype-Genotype Many-to-Many Relations Analysis (PGMRA) [1, 3, 4] questioned the validity of our results from the perspective of polygenic risk scores (PRS) [5]. We appreciate the importance of these questions, and here provide a concise discussion of the assumptions and mathematical constraints of both approaches. We thank this commentator and others who have discussed our articles with us for their thoughtful questions and critique

    Uncovering the complex genetic architecture of human plasma lipidome using machine learning methods

    Get PDF
    Genetic architecture of plasma lipidome provides insights into regulation of lipid metabolism and related diseases. We applied an unsupervised machine learning method, PGMRA, to discover phenotype-genotype many-to-many relations between genotype and plasma lipidome (phenotype) in order to identify the genetic architecture of plasma lipidome profiled from 1,426 Finnish individuals aged 30-45 years. PGMRA involves biclustering genotype and lipidome data independently followed by their inter-domain integration based on hypergeometric tests of the number of shared individuals. Pathway enrichment analysis was performed on the SNP sets to identify their associated biological processes. We identified 93 statistically significant (hypergeometric p-value \u3c 0.01) lipidome-genotype relations. Genotype biclusters in these 93 relations contained 5977 SNPs across 3164 genes. Twenty nine of the 93 relations contained genotype biclusters with more than 50% unique SNPs and participants, thus representing most distinct subgroups. We identified 30 significantly enriched biological processes among the SNPs involved in 21 of these 29 most distinct genotype-lipidome subgroups through which the identified genetic variants can influence and regulate plasma lipid related metabolism and profiles. This study identified 29 distinct genotype-lipidome subgroups in the studied Finnish population that may have distinct disease trajectories and therefore could be useful in precision medicine research
    corecore