21 research outputs found
Evolution of selenophosphate synthetases: emergence and relocation of function through independent duplications and recurrent subfunctionalization
Selenoproteins are proteins that incorporate selenocysteine (Sec), a nonstandard amino acid encoded by UGA, normally a stop codon. Sec synthesis requires the enzyme Selenophosphate synthetase (SPS or SelD), conserved in all prokaryotic and eukaryotic genomes encoding selenoproteins. Here, we study the evolutionary history of SPS genes, providing a map of selenoprotein function spanning the whole tree of life. SPS is itself a selenoprotein in many species, although functionally equivalent homologs that replace the Sec site with cysteine (Cys) are common. Many metazoans, however, possess SPS genes with substitutions other than Sec or Cys (collectively referred to as SPS1). Using complementation assays in fly mutants, we show that these genes share a common function, which appears to be distinct from the synthesis of selenophosphate carried out by the Sec- and Cys- SPS genes (termed SPS2), and unrelated to Sec synthesis. We show here that SPS1 genes originated through a number of independent gene duplications from an ancestral metazoan selenoprotein SPS2 gene that most likely already carried the SPS1 function. Thus, in SPS genes, parallel duplications and subsequent convergent subfunctionalization have resulted in the segregation to different loci of functions initially carried by a single gene. This evolutionary history constitutes a remarkable example of emergence and evolution of gene function, which we have been able to trace thanks to the singular features of SPS genes, wherein the amino acid at a single site determines unequivocally protein function and is intertwined to the evolutionary fate of the entire selenoproteome
Detection of early seeding of Richter transformation in chronic lymphocytic leukemia
Richter transformation (RT) is a paradigmatic evolution of chronic lymphocytic leukemia (CLL) into a very aggressive large B cell lymphoma conferring a dismal prognosis. The mechanisms driving RT remain largely unknown. We characterized the whole genome, epigenome and transcriptome, combined with single-cell DNA/RNA-sequencing analyses and functional experiments, of 19 cases of CLL developing RT. Studying 54 longitudinal samples covering up to 19 years of disease course, we uncovered minute subclones carrying genomic, immunogenetic and transcriptomic features of RT cells already at CLL diagnosis, which were dormant for up to 19 years before transformation. We also identified new driver alterations, discovered a new mutational signature (SBS-RT), recognized an oxidative phosphorylation (OXPHOS)high-B cell receptor (BCR)low-signaling transcriptional axis in RT and showed that OXPHOS inhibition reduces the proliferation of RT cells. These findings demonstrate the early seeding of subclones driving advanced stages of cancer evolution and uncover potential therapeutic targets for RT
Analysis of multiple protein sequence alignments and phylogenetic trees in the context of phylogenomics studies
Phylogenomics is a biological discipline which can be understood as the intersection
of the fields of genomics and evolution. Its main focuses are the
analyses of genomes through the evolutionary lens and the understanding of
how different organisms relate to each other. Moreover, phylogenomics allows
to make accurate functional annotations of newly sequenced genomes.
This discipline has grown in response to the deluge of data coming from different
genome projects. To achieve their objectives, phylogenomics heavily
depends on the accuracy of different methods to generate precise phylogenetic
trees. Phylogenetic trees are the basic tool of this field and serve to
represent how sequences or species relate to each other through common
ancestry. During my thesis, I have centered my efforts in improving an automated
pipeline to generate accurate phylogenetic trees and its posterior
publication through a public database. Among the efforts to improve the
pipeline, I have specially focused on the problem of multiple sequence alignment
post-processing, which has been shown to be central to the reliability
of subsequent analyses. Subsequently I have applied this pipeline, and a
battery of other phylogenomics tools, to the study of the phylogenetic position
of Microsporidia, a group of fast-evolving intracellular parasites. Due
to their special genomic features, Microsporidia evolution constitutes one of
the classical examples of challenging problems for phylogenomics. Finally,
I have also used the pipeline as a part of a newly designed method for selecting
robust combinations of phylogenetic gene markers. I have used this
method for selecting optimal gene sets to assess the phylogenetic relationships
within fungi and cyanobacteria, showing that the potential of these
genes as phylogenetic markers goes well beyond the species used for their
selection.Filogenómica es una disciplina biológica que puede ser entendida como la
intersección entre los campos de la genómica y la evolución. Su área de
estudio es el análisis evolutivo de los genomas y como se relacionan las
distintas especies entre sÃ. Además, la filogenómica tiene como objetivo
anotar funcionalmente, con gran precisi ón, genomas recién secuenciados.
De hecho, esta disciplina ha crecido rápidamente en los úultimos años
como respuesta a la avalancha de datos provenientes de distintos proyectos
genómicos. Para alcanzar sus objetivos, la filogenómica depende, en gran
medida, de los distintos métodos usados para generar árboles filogenéticos.
Los árboles filogenéticos son las herramientas básicas de la filogenómica y
sirven para representar como secuencias y especies se relacionan entre sà por
ascendencia. Durante el desarrollo de mi tesis, he centrado mis esfuerzos en
mejorar una pipeline (conjunto de programas ejecutados de forma controlada)
automática que permite generar árboles filogenéticos con gran precisión, y
como ofrecer estos datos a la comunidad cientÃfica a través de una base
de datos. Entre los esfuerzos realizados para mejorar la pipeline, me he
centrado especialmente en el post-procesamiento previo a cualquier análisis
de alineamientos múltiples de secuencias, ya que la calidad del alineamiento
determina la de los estudios posteriores. En un contexto más biológico, he
usado esta pipeline junto con otras herramientas filogenómicas en el estudio
de la posición filogenética de Microsporidia. Dadas sus caracterÃsticas
genómicas especiales, la evolución de Microsporidia constituye uno de los
problemas clásicos y difÃciles de resolver en filogenómica. Finalmente,
he usado también la pipeline como parte de un nuevo método para
seleccionar combinaciones óptimas de genes con potencial como marcadores
filogenéticos. De hecho, he usado este método para identificar conjuntos
de marcadores filogenéticos que permiten reconstruir con alto grado de
precisión las relaciones evolutivas en Cyanobacterias y en Hongos. Lo más
interesante de este método es que eval úa la fiabilidad de los marcadores en
especies no usadas para su selección
A phylogenomics approach for selecting robust sets of phylogenetic markers
Includes supplementary materials for the online appendix.Reconstructing the evolutionary relationships of species is a major goal in biology. Despite the increasing number of completely sequenced genomes, a large number of phylogenetic projects rely on targeted sequencing and analysis of a relatively small sample of marker genes. The selection of these phylogenetic markers should ideally be based on accurate predictions of their combined, rather than individual, potential to accurately resolve the phylogeny of interest. Here we present and validate a new phylogenomics strategy to efficiently select a minimal set of stable markers able to reconstruct the underlying species phylogeny. In contrast to previous approaches, our methodology does not only rely on the ability of individual genes to reconstruct a known phylogeny, but it also explores the combined power of sets of concatenated genes to accurately infer phylogenetic relationships of species not previously analyzed. We applied our approach to two broad sets of cyanobacterial and ascomycetous fungal species, and provide two minimal sets of six and four genes, respectively, necessary to fully resolve the target phylogenies. This approach paves the way for the informed selection of phylogenetic markers in the effort of reconstructing the tree of life.Spanish ministry of science and innovation [BIO2012-37161 towards T.G. group research (in part)] and Qatar National Research Fund [NPRP 5-298-3-086]. Funding for open access charge: Core funding from the group of the corresponding author
LimTox: A web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes
A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es.eTOX project [IMI-115002]; European Commission H2020 project OpenMinted [654021]; Plan de Impulso de las TecnologÃas del Lenguaje de la Agenda Digital (PITL) of the Secretary of State of Telecommunications of the Spanish Ministry of Energy, Tourism and the Digital Agenda; ISCIII and ERDF [PT13/0001/00]. Funding for open access charge: European Commission H2020 project OpenMinted [654021]
PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome
Phylogenetic trees representing the evolutionary relationships of homologous genes are the entry point for many evolutionary analyses. For instance, the use of a phylogenetic tree can aid in the inference of orthology and paralogy relationships, and in the detection of relevant evolutionary events such as gene family expansions and contractions, horizontal gene transfer, recombination or incomplete lineage sorting. Similarly, given the plurality of evolutionary histories among genes encoded in a given genome, there is a need for the combined analysis of genome-wide collections of phylogenetic trees (phylomes). Here, we introduce a new release of PhylomeDB (http://phylomedb.org), a public repository of phylomes. Currently, PhylomeDB hosts 120 public phylomes, comprising >1.5 million maximum likelihood trees and multiple sequence alignments. In the current release, phylogenetic trees are annotated with taxonomic, protein-domain arrangement, functional and evolutionary information. PhylomeDB is also a major source for phylogeny-based predictions of orthology and paralogy, covering >10 million proteins across 1059 sequenced species. Here we describe newly implemented PhylomeDB features, and discuss a benchmark of the orthology predictions provided by the database, the impact of proteome updates and the use of the phylome approach in the analysis of newly sequenced genomes and transcriptomes.Spanish ministry of Economy and Competitiveness [BIO2012-37161]; a Grant from the Qatar National/nResearch Fund [NPRP 5-298-3-086]; a the European Research Council under the European Union’s Seventh/nFramework Programme [FP/2007-2013/ERC and ERC-/n2012-StG-310325]; Juan de La Cierva postdoctoral program (to J.H.C.) and La Caixa-CRG International Fellowship Program (to L.P.P.). Funding for open access charge: Internal budget from the CR
Standardized benchmarking in the quest for orthologs
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.This work was supported by Swiss National Science Foundation grant PP00P3_150654 (to C.D.), UK Biotechnology and Biological Sciences Research Council grant BB/L018241/1 (to C.D.), Spanish Ministry of Economy and Competitiveness grant BIO2012-37161 (to T.G.), Qatar National Research Fund NPRP 5-298-3-086 (to T.G.), European Research Council grant ERC-2012-StG-310325 (to T.G.), National Institutes of Health (NIH) grant R24 OD011883 (to S.E.L.), U41 HG002273 (to S.E.L. and P.D.T.), U41 HG007822 (to M.J.M. and I.X.), Swiss State Secretariat for Education, Research and Innovation (SERI) funding (to I.X. and C.D.), US National Science Foundation EAGER Award #1355632 (to K.S.) and ANR project BIP-BIP ANR-10-BINF-03-02 (to O.L.). Furthermore, A.S.d.S., J.H.-C., M.J.M., M.M. and P.B. acknowledge support from the European Molecular Biology Laboratory, M.M. acknowledges support from the Wellcome Trust (WT095908), S.E.L. acknowledges support from Lawrence Berkeley National Laboratory core funds (Office of Basic Energy Sciences and US Department of Energy Contract No. DE-AC02-05CH11231), L.J.J. acknowledges support from the Novo Nordisk Foundation (Grant No. NNF14CC0001) and L.P.P. acknowledges support from the La Caixa–CRG International Fellowship Program
Transcriptomic analysis of a psammophyte food crop, sand rice (Agriophyllum squarrosum) and identification of candidate genes essential for sand dune adaptation
Background. Sand rice (Agriophyllum squarrosum) is an annual desert plant adapted to mobile sand dunes in arid and semi-arid regions of Central Asia. The sand rice seeds have excellent nutrition value and have been historically consumed by local populations in the desert regions of northwest China. Sand rice is a potential food crop resilient to ongoing climate change; however, partly due to the scarcity of genetic information, this species has undergone only little agronomic modifications through classical breeding during recent years./nResults. We generated a deep transcriptomic sequencing of sand rice, which uncovers 67,741 unigenes. Phylogenetic analysis based on 221 single-copy genes showed close relationship between sand rice and the recently domesticated crop sugar beet. Transcriptomic comparisons also showed a high level of global sequence conservation between these two species. Conservation of sand rice and sugar beet orthologs assigned to response to salt stress gene ontology term suggests that sand rice is also a potential salt tolerant plant. Furthermore, sand rice is far more tolerant to high temperature. A set of genes likely relevant for resistance to heat stress, was functionally annotated according to expression levels, sequence annotation, and comparisons corresponding transcriptome profiling results in Arabidopsis./nConclusions. The present work provides abundant genomic information for functional dissection of the important traits in sand rice. Future screening the genetic variation among different ecotypes and constructing a draft genome sequence will further facilitate agronomic trait improvement and final domestication of sand rice.TG group research is funded in part by a grant from the Spanish ministry of Economy and Competitiveness (BIO2012-37161), a Grant from the Qatar National Research Fund grant (NPRP 5-298-3-086), and a grant from the European Research Council under the European Union's Seventh Framework/nProgramme (FP/2007-2013)/ERC (Grant Agreement n. ERC-2012-StG-310325
Transcriptomic analysis of a psammophyte food crop, sand rice (Agriophyllum squarrosum) and identification of candidate genes essential for sand dune adaptation
Background. Sand rice (Agriophyllum squarrosum) is an annual desert plant adapted to mobile sand dunes in arid and semi-arid regions of Central Asia. The sand rice seeds have excellent nutrition value and have been historically consumed by local populations in the desert regions of northwest China. Sand rice is a potential food crop resilient to ongoing climate change; however, partly due to the scarcity of genetic information, this species has undergone only little agronomic modifications through classical breeding during recent years./nResults. We generated a deep transcriptomic sequencing of sand rice, which uncovers 67,741 unigenes. Phylogenetic analysis based on 221 single-copy genes showed close relationship between sand rice and the recently domesticated crop sugar beet. Transcriptomic comparisons also showed a high level of global sequence conservation between these two species. Conservation of sand rice and sugar beet orthologs assigned to response to salt stress gene ontology term suggests that sand rice is also a potential salt tolerant plant. Furthermore, sand rice is far more tolerant to high temperature. A set of genes likely relevant for resistance to heat stress, was functionally annotated according to expression levels, sequence annotation, and comparisons corresponding transcriptome profiling results in Arabidopsis./nConclusions. The present work provides abundant genomic information for functional dissection of the important traits in sand rice. Future screening the genetic variation among different ecotypes and constructing a draft genome sequence will further facilitate agronomic trait improvement and final domestication of sand rice.TG group research is funded in part by a grant from the Spanish ministry of Economy and Competitiveness (BIO2012-37161), a Grant from the Qatar National Research Fund grant (NPRP 5-298-3-086), and a grant from the European Research Council under the European Union's Seventh Framework/nProgramme (FP/2007-2013)/ERC (Grant Agreement n. ERC-2012-StG-310325
The first myriapod genome sequence reveals conservative arthropod gene content and genome organisation in the centipede strigamia maritima
Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific life history.This work was supported by the following grants: NHGRI U54 HG003273 to R.A.G.; EU Marie Curie ITN#215781 ‘‘Evonet’’ to M.A.; a Wellcome Trust Value in People (VIP) award to C.B., a Wellcome Trust graduate studentship WT089615MA to J.E.G., and a Wellcome Trust Investigator Award (098410/Z/12/Z) to C.R.A.; ‘‘Marine Rhythms of Life’’ of the University of Vienna, an FWF (http://www.fwf.ac.at/) START award (#AY0041321) and HFSP (http://www.hfsp.org/) research grant (#RGY0082/2010) to K.T-R; MFPL Vienna International PostDoctoral Program for Molecular Life Sciences (funded by Austrian Ministry of Science and Research and City of Vienna, Cultural Department - Science and Research) to T.K.; Direct Grant (4053034) of the Chinese University of Hong Kong to J.H. L.H.; NHGRI HG004164 to G.M.; Danish Research Agency (FNU), Carlsberg Foundation, and Lundbeck Foundation to C.J.P.G.; U.S. National Institutes of Health R01AI55624 to J.H.W.; Royal Society University Research fellowship to F.M.J.; P.D.E. was supported by the BBSRC via the Babraham Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscri