80 research outputs found

    An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome

    Get PDF
    Background: Single Nucleotide Polymorphisms (SNPs) are widely used molecular markers, and their use has increased massively since the inception of Next Generation Sequencing (NGS) technologies, which allow detection of large numbers of SNPs at low cost. However, both NGS data and their analysis are error-prone, which can lead to the generation of false positive (FP) SNPs. We explored the relationship between FP SNPs and seven factors involved in mapping-based variant calling - quality of the reference sequence, read length, choice of mapper and variant caller, mapping stringency and filtering of SNPs by read mapping quality and read depth. This resulted in 576 possible factor level combinations. We used error- and variant-free simulated reads to ensure that every SNP found was indeed a false positive. Results: The variation in the number of FP SNPs generated ranged from 0 to 36,621 for the 120 million base pairs (Mbp) genome. All of the experimental factors tested had statistically significant effects on the number of FP SNPs generated and there was a considerable amount of interaction between the different factors. Using a fragmented reference sequence led to a dramatic increase in the number of FP SNPs generated, as did relaxed read mapping and a lack of SNP filtering. The choice of reference assembler, mapper and variant caller also significantly affected the outcome. The effect of read length was more complex and suggests a possible interaction between mapping specificity and the potential for contributing more false positives as read length increases. Conclusions: The choice of tools and parameters involved in variant calling can have a dramatic effect on the number of FP SNPs produced, with particularly poor combinations of software and/or parameter settings yielding tens of thousands in this experiment. Between-factor interactions make simple recommendations difficult for a SNP discovery pipeline but the quality of the reference sequence is clearly of paramount importance. Our findings are also a stark reminder that it can be unwise to use the relaxed mismatch settings provided as defaults by some read mappers when reads are being mapped to a relatively unfinished reference sequence from e.g. a non-model organism in its early stages of genomic exploration

    Critical mutation rate has an exponential dependence on population size for eukaryotic-length genomes with crossover

    Get PDF
    The critical mutation rate (CMR) determines the shift between survival-of-the-fittest and survival of individuals with greater mutational robustness (“flattest”). We identify an inverse relationship between CMR and sequence length in an in silico system with a two-peak fitness landscape; CMR decreases to no more than five orders of magnitude above estimates of eukaryotic per base mutation rate. We confirm the CMR reduces exponentially at low population sizes, irrespective of peak radius and distance, and increases with the number of genetic crossovers. We also identify an inverse relationship between CMR and the number of genes, confirming that, for a similar number of genes to that for the plant Arabidopsis thaliana (25,000), the CMR is close to its known wild-type mutation rate; mutation rates for additional organisms were also found to be within one order of magnitude of the CMR. This is the first time such a simulation model has been assigned input and produced output within range for a given biological organism. The decrease in CMR with population size previously observed is maintained; there is potential for the model to influence understanding of populations undergoing bottleneck, stress, and conservation strategy for populations near extinction

    Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems.</p> <p>Results</p> <p>We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in <it>Arabidopsis thaliana</it>. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters.</p> <p>Conclusions</p> <p>Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.</p

    HAD hydrolase function unveiled by substrate screening: enzymatic characterization of Arabidopsis thaliana subclass I phosphosugar phosphatise AtSgpp

    Get PDF
    [EN] This work presents the isolation and the biochemical characterization of the Arabidopsis thaliana gene AtSgpp. This gene shows homology with the Arabidopsis low molecular weight phosphatases AtGpp1 and AtGpp2 and the yeast counterpart GPP1 and GPP2, which have a high specificity for dl-glycerol-3-phosphate. In addition, it exhibits homology with DOG1 and DOG2 that dephosphorylate 2-deoxy-d-glucose-6-phosphate. Using a comparative genomic approach, we identified the AtSgpp gene as a conceptual translated haloacid dehalogenase-like hydrolase HAD protein. AtSgpp (locus tag At2g38740), encodes a protein with a predicted Mw of 26.7 kDa and a pI of 4.6. Its sequence motifs and expected structure revealed that AtSgpp belongs to the HAD hydrolases subfamily I, with the C1-type cap domain. In the presence of Mg2+ ions, the enzyme has a phosphatase activity over a wide range of phosphosugars substrates (pH optima at 7.0 and K (m) in the range of 3.6-7.7 mM). AtSgpp promiscuity is preferentially detectable on d-ribose-5-phosphate, 2-deoxy-d-ribose-5-phosphate, 2-deoxy-d-glucose-6-phosphate, d-mannose-6-phosphate, d-fructose-1-phosphate, d-glucose-6-phosphate, dl-glycerol-3-phosphate, and d-fructose-6-phosphate, as substrates. AtSgpp is ubiquitously expressed throughout development in most plant organs, mainly in sepal and guard cell. Interestingly, expression is affected by abiotic and biotic stresses, being the greatest under Pi starvation and cyclopentenone oxylipins induction. Based on both, substrate lax specificity and gene expression, the physiological function of AtSgpp in housekeeping detoxification, modulation of sugar-phosphate balance and Pi homeostasis, is provisionally assigned.We acknowledge Professors Montserrat Pages (CSIC Barcelona, Spain), Thomas Kupke (University of Heidelberg, Germany) and Manuel Hernandez (University Polytechnic of Valencia, Spain) for their warm support. We also thank the advice and provision of plasmid pSBETa by Dr. Florence Vignols and Yves Meyer (University of Perpignan, France); the computer software helps by Ramon Nogales-Rangel and Alexis Gonzalez-Policarpo; Eugenio Grau-Ferrando for kind advice and help for sequencing. This work was funded by the 10 month research contract MEC-FEDER to J.A.C.-M., 10 month research contract JAE-DOC to I.M.-S. and by the research project BIO2006-10138 from the MEC-FEDER of Spain to F.A.C.-M. In memoriam of Dr. Mari Cruz Cutanda-Perez.Caparrós Martín, JA.; Mccarthy Suarez, I.; Culiañez Macia, FA. (2013). HAD hydrolase function unveiled by substrate screening: enzymatic characterization of Arabidopsis thaliana subclass I phosphosugar phosphatise AtSgpp. Planta. 237(4):943-954. https://doi.org/10.1007/s00425-012-1809-5S9439542374Allen KN, Dunaway-Mariano D (2004) Phosphoryl group transfer: evolution of a catalytic scaffold. Trends Biochem Sci 29:495–503Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410Ames BN (1966) Assay of inorganic phosphate, total phosphate, and phosphatases. Methods Enzymol 8:115–118Böhmer M, Schroeder JI (2011) Quantitative transcriptomic analysis of abscisic acid-induced and reactive oxygen species-dependent expression changes and proteomic profiling in Arabidopsis suspension cells. Plant J 67:105–118Bradford MM (1976) A rapid and sensitive method for the quantization of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 72:248–254Burroughs AM, Allen KN, Dunaway-Mariano D, Aravind L (2006) Evolutionary genomics of the HAD superfamily: understanding the structural adaptations and catalytic diversity in a superfamily of phosphoesterases and allied enzymes. J Mol Biol 361:1003–1034Caparrós-Martín JA, Reiland S, Köchert K, Cutanda MC, Culiáñez-Macia FA (2007) Arabidopsis thaliana AtGpp 1 and AtGpp2: two novel low molecular weight phosphatases involved in plant glycerol metabolism. Plant Mol Biol 63:505–517Collet JF, Stroobant V, Pirard M, Delpierre G, Van Schaftingen E (1998) A new class of phosphotransferases phosphorylated on an aspartate residue in an amino-terminal DXDX(T/V) motif. J Biol Chem 273:14107–14112Corpet F, Servantm F, Gouzy J, Kahn D (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res 28:267–269Cutanda MC (2003) Effect of altering levels of hexoses phosphate in carbohydrate metabolism and glucose signalling in yeast and plants. PhD thesis, Polytechnic University of Valencia, Valencia, SpainHiggins D, Thompson J, Gibson T, Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680Koonin EV, Tatusov RL (1994) Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity. Application of an iterative approach to database search. J Mol Biol 244:125–132Kupke T, Caparrós-Martín JA, Malquichagua Salazar KJ, Culiàñez-Macià FA (2009) Biochemical and physiological characterization of Arabidopsis thaliana AtCoAse: a Nudix CoA hydrolyzing protein that improves plant development. Physiol Plant 135:365–378Kuznetsova E, Proudfoot M, Sanders SA, Reinking J, Savchenko A, Arrowsmith CH, Edwards AM, Yakunin AF (2005) Enzyme genomics: application of general enzymatic screens to discover new enzymes. FEMS Microbiol Rev 29:263–279Kuznetsova E, Proudfoo M, Gonzalez CF, Brown G, Omelchenko MV, Borozan I, Carmel L, Wolf YI, Mori H, Savchenko AV, Arrowsmith CH, Koonin EV, Edwards AM, Yakunin AF (2006) Genome-wide analysis of substrate specificities of the Escherichia coli haloacid dehalogenase-like phosphatase family. J Biol Chem 281:36149–36161Lahiri SD, Zhang G, Dai J, Dunaway-Mariano D, Allen KN (2004) Analysis of the substrate specificity loop of the HAD superfamily cap domain. Biochemistry 43:2812–2820Lahiri SD, Zhang G, Dunaway-Mariano D, Allen KN (2006) Diversification of function in the haloacid dehalogenase enzyme superfamily: the role of the cap domain in hydrolytic phosphorus—carbon bond cleavage. Bioorganic Chem 34:394–409Lambert C, Leonard N, De Bolle X, Depiereux E (2002) ESyPred3D: prediction of proteins 3D structures. Bioinformatics 18:1250–1256Lu Z, Dunaway-Mariano D, Allen KN (2005) HAD superfamily phosphotransferase substrate diversification: structure and function analysis of HAD subclass IIB sugar phosphatase BT4131. Biochemistry 44:8684–8696Lu Z, Dunaway-Mariano D, Allen KN (2008) The catalytic scaffold of the haloalkanoic acid dehalogenase enzyme superfamily acts as a mold for the trigonal bipyramidal transition state. Proc Natl Acad Sci USA 105:5687–5692Maniatis T, Fritsch EF, Sambrook J (1982) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring HarborMorais MC, Zhang W, Baker AS, Zhang G, Dunaway-Mariano D, Allen KN (2000) The crystal structure of Bacillus cereus phosphonoacetaldehyde hydrolase: insight into catalysis of phosphorus bond cleavage and catalytic diversification within the HAD enzyme superfamily. Biochemistry 39:10385–10396Morais MC, Zhang G, Zhang W, Olsen DB, Dunaway-Mariano D, Allen KN (2004) X-ray crystallographic and site-directed mutagenesis analysis of the mechanism of Schiff-base formation in phosphonoacetaldehyde hydrolase catalysis. J Biol Chem 279:9353–9361Mueller WS, Hilbert B, Dueckershoff K, Roitsch T, Krischke M, Mueller MJ, Berger S (2008) General detoxification and stress responses are mediated by oxidized lipids through TGA transcription factors in Arabidopsis. Plant Cell 20:768–785Murashige T, Skoog F (1962) A revised medium for rapid growth and bioassays with tobacco cultures. Physiol Plant 15:473–497Norbeck J, Pahlman AK, Akhtar N, Blomberg A, Adler L (1996) Purification and characterization of two isoenzymes of dl-glycerol-3-phosphatase from Saccharomyces cerevisiae. Identification of the corresponding GPP1 and GPP2 genes and evidence for osmotic regulation of Gpp 2p expression by the osmosensing mitogen-activated protein kinase signal transduction pathway. J Biol Chem 271:13875–13881Rández-Gil F, Blasco A, Prieto JA, Sanz P (1995) DOGR1 and DOGR2: two genes from Saccharomyces cerevisiae that confer 2-deoxyglucose resistance when overexpressed. Yeast 11:1233–1240Rao KN, Kumaran D, Seetharaman J, Bonanno JB, Burley SK, Swaminathan S (2006) Crystal structure of trehalose-6-phosphate phosphatase-related protein: biochemical and biological implications. Protein Sci 15:1735–1744Schagger H, von Jagow G (1987) Tricine-sodium dodecyl sulfatepolyacrylamide gel electrophoresis for the separation of proteins in the range from 1 to 100 kDa. Anal Biochem 166:368–379Schenk PM, Baumann S, Mattes R, Steinbiss HH (1995) Improved high-level expression system for eukaryotic genes in Escherichia coli using T7 RNA polymerase and rare ArgtRNAs. Biotechniques 19:196–200Selengut JD (2001) MDP-1 is a new and distinct member of the haloacid dehalogenase family of aspartate-dependent phosphohydrolases. Biochemistry 40:12704–12711Selengut JD, Levine RL (2000) MDP-1: a novel eukaryotic magnesium-dependent phosphatase. Biochemistry 39:8315–8324Shin DH, Roberts A, Jancarik J, Yocota H, Kim R, Wemmer DE, Kim S-H (2003) Crystal structure of a phosphatase with a unique substrate binding domain from Thermotoga maritime. Protein Sci 12:1464–1472Sussman I, Avron M (1981) Characterization and partial puri-fication of dl-glycerol-1-phosphatase from Dunaliella salina. Biochim Biophys Acta 661:199–204The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815Tremblay LW, Dunaway-Mariano D, Allen KN (2006) Structure and activity analyses of Escherichia coli K-12 NagD provide insight into the evolution of biochemical function in the haloalkanoic acid dehalogenase superfamily. Biochemistry 45:1183–1193Vicient CM, Delseny M (1999) Isolation of total RNA from Arabidopsis thaliana seeds. Anal Biochem 268:412–413Wang W, Cho HS, Kim R, Jancarik J, Yokota H, Nguyen HH, Grigoriev IV, Wemmer DE, Kim S-H (2002) Structural characterization of the reaction pathway in phosphoserine phosphatase: crystallographic “snapshots” of intermediate states. J Mol Biol 319:421–431Zhang G, Mazurkie AS, Dunaway-Mariano D, Allen KN (2002) Kinetic evidence for a substrate-induced fit in phosphonoacetaldehyde hydrolase catalysis. Biochemistry 41:13370–13377Zhang G, Morais MC, Dai J, Zhang W, Dunaway-Mariano D, Allen KN (2004) Investigation of metal Ion binding in phosphonoacetaldehyde hydrolase identifies sequence markers for metal-activated enzymes of the HAD enzyme superfamily. Biochemistry 43:4990–4997Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004) GENEVESTIGATOR: Arabidopsis microarray database and analysis toolbox. Plant Physiol 136:2621–263

    Genetic dissection of fruit quality traits in the octoploid cultivated strawberry highlights the role of homoeo-QTL in their control

    Get PDF
    Fruit quality traits are major breeding targets in the Rosaceae. Several of the major Rosaceae species are current or ancient polyploids. To dissect the inheritance of fruit quality traits in polyploid fleshy fruit species, we used a cultivated strawberry segregating population comprising a 213 full-sibling F1 progeny from a cross between the variety ‘Capitola’ and the genotype ‘CF1116’. We previously developed the most comprehensive strawberry linkage map, which displays seven homoeology groups (HG), including each four homoeology linkage groups (Genetics 179:2045–2060, 2008). The map was used to identify quantitative trait loci (QTL) for 19 fruit traits related to fruit development, texture, colour, anthocyanin, sugar and organic acid contents. Analyses were carried out over two or three successive years on field-grown plants. QTL were detected for all the analysed traits. Because strawberry is an octopolyploid species, QTL controlling a given trait and located at orthologous positions on different homoeologous linkage groups within one HG are considered as homoeo-QTL. We found that, for various traits, about one-fourth of QTL were putative homoeo-QTL and were localised on two linkage groups. Several homoeo-QTL could be detected the same year, suggesting that several copies of the gene underlying the QTL are functional. The detection of some other homoeo-QTL was year-dependent. Therefore, changes in allelic expression could take place in response to environmental changes. We believe that, in strawberry as in other polyploid fruit species, the mechanisms unravelled in the present study may play a crucial role in the variations of fruit quality

    From Mendel’s discovery on pea to today’s plant genetics and breeding

    Get PDF
    In 2015, we celebrated the 150th anniversary of the presentation of the seminal work of Gregor Johann Mendel. While Darwin’s theory of evolution was based on differential survival and differential reproductive success, Mendel’s theory of heredity relies on equality and stability throughout all stages of the life cycle. Darwin’s concepts were continuous variation and “soft” heredity; Mendel espoused discontinuous variation and “hard” heredity. Thus, the combination of Mendelian genetics with Darwin’s theory of natural selection was the process that resulted in the modern synthesis of evolutionary biology. Although biology, genetics, and genomics have been revolutionized in recent years, modern genetics will forever rely on simple principles founded on pea breeding using seven single gene characters. Purposeful use of mutants to study gene function is one of the essential tools of modern genetics. Today, over 100 plant species genomes have been sequenced. Mapping populations and their use in segregation of molecular markers and marker–trait association to map and isolate genes, were developed on the basis of Mendel's work. Genome-wide or genomic selection is a recent approach for the development of improved breeding lines. The analysis of complex traits has been enhanced by high-throughput phenotyping and developments in statistical and modeling methods for the analysis of phenotypic data. Introgression of novel alleles from landraces and wild relatives widens genetic diversity and improves traits; transgenic methodologies allow for the introduction of novel genes from diverse sources, and gene editing approaches offer possibilities to manipulate gene in a precise manner
    corecore