1,202 research outputs found

    Phenotypic and genotypic data integration and exploration through a web-service architecture

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Linking genotypic and phenotypic information is one of the greatest challenges of current genetics research. The definition of an Information Technology infrastructure to support this kind of studies, and in particular studies aimed at the analysis of complex traits, which require the definition of multifaceted phenotypes and the integration genotypic information to discover the most prevalent diseases, is a paradigmatic goal of Biomedical Informatics. This paper describes the use of Information Technology methods and tools to develop a system for the management, inspection and integration of phenotypic and genotypic data.</p> <p>Results</p> <p>We present the design and architecture of the Phenotype Miner, a software system able to flexibly manage phenotypic information, and its extended functionalities to retrieve genotype information from external repositories and to relate it to phenotypic data. For this purpose we developed a module to allow customized data upload by the user and a SOAP-based communications layer to retrieve data from existing biomedical knowledge management tools. In this paper we also demonstrate the system functionality by an example application of the system in which we analyze two related genomic datasets.</p> <p>Conclusion</p> <p>In this paper we show how a comprehensive, integrated and automated workbench for genotype and phenotype integration can facilitate and improve the hypothesis generation process underlying modern genetic studies.</p

    Diatoms synthesize sterols by inclusion of animal and fungal genes in the plant pathway

    Get PDF
    Diatoms are ubiquitous microalgae that have developed remarkable metabolic plasticity and gene diversification. Here we report the first elucidation of the complete biosynthesis of sterols in the lineage. The study has been carried out on the bloom-forming species Skeletonema marinoi and Cyclotella cryptica that synthesise an ensemble of sterols with chemotypes of animals (cholesterol and desmosterol), plants (dihydrobrassicasterol and 24-methylene cholesterol), algae (fucosterol) and marine invertebrates (clionasterol). In both species, sterols derive from mevalonate through cyclization of squalene to cycloartenol by cycloartenol synthase. The pathway anticipates synthesis of cholesterol by enzymes of the phytosterol route in plants, as recently reported in Solanaceae. Major divergences stem from reduction of Δ24(28) and Δ24(25) double bonds which, in diatoms, are apparently dependent on sterol reductases of fungi, algae and animals. Phylogenetic comparison revealed a good level of similarity between the sterol biosynthetic genes of S. marinoi and C. cryptica with those in the genomes of the other diatoms sequenced so far

    An automated reasoning framework for translational research

    Get PDF
    AbstractIn this paper we propose a novel approach to the design and implementation of knowledge-based decision support systems for translational research, specifically tailored to the analysis and interpretation of data from high-throughput experiments. Our approach is based on a general epistemological model of the scientific discovery process that provides a well-founded framework for integrating experimental data with preexisting knowledge and with automated inference tools.In order to demonstrate the usefulness and power of the proposed framework, we present its application to Genome-Wide Association Studies, and we use it to reproduce a portion of the initial analysis performed on the well-known WTCCC dataset. Finally, we describe a computational system we are developing, aimed at assisting translational research. The system, based on the proposed model, will be able to automatically plan and perform knowledge discovery steps, to keep track of the inferences performed, and to explain the obtained results

    Phenotype forecasting with SNPs data through gene-based Bayesian networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learning Bayesian networks is often non-trivial due to the high number of variables to be taken into account in the model with respect to the instances of the dataset. Therefore, it becomes very interesting to use an abstraction of the variable space that suitably reduces its dimensionality without losing information. In this paper we present a new strategy to achieve this goal by mapping the SNPs related to the same gene to one meta-variable. In order to assign states to the meta-variables we employ an approach based on classification trees.</p> <p>Results</p> <p>We applied our approach to data coming from a genome-wide scan on 288 individuals affected by arterial hypertension and 271 nonagenarians without history of hypertension. After pre-processing, we focused on a subset of 24 SNPs. We compared the performance of the proposed approach with the Bayesian network learned with SNPs as variables and with the network learned with haplotypes as meta-variables. The results were obtained by running a hold-out experiment five times. The mean accuracy of the new method was 64.28%, while the mean accuracy of the SNPs network was 58.99% and the mean accuracy of the haplotype network was 54.57%.</p> <p>Conclusion</p> <p>The new approach presented in this paper is able to derive a gene-based predictive model based on SNPs data. Such model is more parsimonious than the one based on single SNPs, while preserving the capability of highlighting predictive SNPs configurations. The prediction performance of this approach was consistently superior to the SNP-based and the haplotype-based one in all the test sets of the evaluation procedure. The method can be then considered as an alternative way to analyze the data coming from association studies.</p

    Phenotype forecasting with SNPs data through gene-based Bayesian networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learning Bayesian networks is often non-trivial due to the high number of variables to be taken into account in the model with respect to the instances of the dataset. Therefore, it becomes very interesting to use an abstraction of the variable space that suitably reduces its dimensionality without losing information. In this paper we present a new strategy to achieve this goal by mapping the SNPs related to the same gene to one meta-variable. In order to assign states to the meta-variables we employ an approach based on classification trees.</p> <p>Results</p> <p>We applied our approach to data coming from a genome-wide scan on 288 individuals affected by arterial hypertension and 271 nonagenarians without history of hypertension. After pre-processing, we focused on a subset of 24 SNPs. We compared the performance of the proposed approach with the Bayesian network learned with SNPs as variables and with the network learned with haplotypes as meta-variables. The results were obtained by running a hold-out experiment five times. The mean accuracy of the new method was 64.28%, while the mean accuracy of the SNPs network was 58.99% and the mean accuracy of the haplotype network was 54.57%.</p> <p>Conclusion</p> <p>The new approach presented in this paper is able to derive a gene-based predictive model based on SNPs data. Such model is more parsimonious than the one based on single SNPs, while preserving the capability of highlighting predictive SNPs configurations. The prediction performance of this approach was consistently superior to the SNP-based and the haplotype-based one in all the test sets of the evaluation procedure. The method can be then considered as an alternative way to analyze the data coming from association studies.</p

    A multifactorial \u2018Consensus Signature\u2019 by in silico analysis to predict response to neoadjuvant anthracycline-based chemotherapy in triple-negative breast cancer

    Get PDF
    BACKGROUND: Owing to the complex processes required for anthracycline-induced cytotoxicity, a prospectively defined multifactorial Consensus Signature (ConSig) might improve prediction of anthracycline response in triple-negative breast cancer (TNBC) patients, whose only standard systemic treatment option is chemotherapy. AIMS: We aimed to construct and evaluate a multifactorial signature, comprising measures of each function required for anthracycline sensitivity in TNBC. METHODS: ConSigs were constructed based on five steps required for anthracycline function: drug penetration, nuclear topoisomerase II\u3b1 (topoII\u3b1) protein location, increased topoII\u3b1 messenger RNA (mRNA) expression, apoptosis induction, and immune activation measured by, respectively, HIF1\u3b1 or SHARP1 signature, LAPTM4B mRNA, topoII\u3b1 mRNA, Minimal Gene signature or YWHAZ mRNA, and STAT1 signature. TNBC patients treated with neoadjuvant anthracycline-based chemotherapy without taxane were identified from publicly available gene expression data derived with Affymetrix HG-U133 arrays (training set). In silico analyses of correlation between gene expression data and pathological complete response (pCR) were performed using receiver-operating characteristic curves. To determine anthracycline specificity, ConSigs were assessed in patients treated with anthracycline plus taxane. Specificity, sensitivity, positive and negative predictive value, and odds ratio (OR) were calculated for ConSigs. Analyses were repeated in two validation gene expression data sets derived using different microarray platforms. RESULTS: In the training set, 29 of 147 patients had pCR after anthracycline-based chemotherapy. Various combinations of components were evaluated, with the most powerful anthracycline response predictors being ConSig1: (STAT1+topoII\u3b1 mRNA +LAPTM4B) and ConSig2: (STAT1+topoII\u3b1 mRNA+HIF1\u3b1). ConSig1 demonstrated high negative predictive value (85%) and high OR for no pCR (3.18) and outperformed ConSig2 in validation sets for anthracycline specificity. CONCLUSIONS: With further validation, ConSig1 may help refine selection of TNBC patients for anthracycline chemotherapy

    Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection

    Get PDF
    Mutual information (MI) is a robust nonparametric statistical approach for identifying associations between genotypes and gene expression levels. Using the data of Problem 1 provided for the Genetic Analysis Workshop 15, we first compared a quantitative MI (Tsalenko et al. 2006 J Bioinform Comput Biol 4:259–4) with the standard analysis of variance (ANOVA) and the nonparametric Kruskal-Wallis (KW) test. We then proposed a novel feature selection approach using MI in a classification scenario to address the small n - large p problem and compared it with a feature selection that relies on an asymptotic χ2 distribution. In both applications, we used a permutation-based approach for evaluating the significance of MI. Substantial discrepancies in significance were observed between MI, ANOVA, and KW that can be explained by different empirical distributions of the data. In contrast to ANOVA and KW, MI detects shifts in location when the data are non-normally distributed, skewed, or contaminated with outliers. ANOVA but not MI is often significant if one genotype with a small frequency had a remarkable difference in the average gene expression level relative to the other two genotypes. MI depends on genotype frequencies and cannot detect these differences. In the classification scenario, we show that our novel approach for feature selection identifies a smaller list of markers with higher accuracy compared to the standard method. In conclusion, permutation-based MI approaches provide reliable and flexible statistical frameworks which seem to be well suited for data that are non-normal, skewed, or have an otherwise peculiar distribution. They merit further methodological investigation

    Innovation in grapevine water status monitoring and drought adaptation: leaf angle and temperature regulation

    Get PDF
    Increase of frequency, duration, and intensity of drought and heatwave and related water and heat crops stress are among the principal effects of climate change. This paper reports: (i) the effect of calcite particle film (CaPF) as a mitigation strategy against heat stress in well-watered (WW) or in drought-stress (D) conditions; and (ii) response of leaf angle variation to stomatal conductance changes induced by drought stress of Aleatico grapevine cultivar. Results have showed that CaPF, under WW conditions, reduced leaf temperature, and increased gas exchange, but, under very severe water stress, CaPF treatment was ineffective. Leaf angle ranged from 70° (WW vines) to 100° (drought stressed vines) and showed a good fit (R2 =0.81) with stomatal conductance within the range of 0.25 – 0.05 mol m-2 s–1 proving it might be a reliable proxy of vine water status
    • …
    corecore