Search CORE

IsoPlotter(+): A Tool for Studying the Compositional Architecture of Genomes.

Author: Elhaik E.
Graur D.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called "compositional domains," each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter(+) to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter(+) pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter(+) by applying it to human and insect genomes. The computational tools and data repository are available online

A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes

Author: Elhaik E.
Graur D.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/11/2014
Field of study

For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the “isochore theory,” which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the “murid shift,” and in many ways resembles the genome of opossum. We find no support to the “isochore theory.” Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires

FigShare

'Genome order index' should not be used for defining compositional constraints in nucleotide sequences - a case study of the Z-curve

Author: Elhaik E.
Graur D.
Josic K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Background: The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis. Based on the Z-curve, a “genome order index,” was proposed, which is defined as S = a2 + c 2 +t 2 +g2 , where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for almost all tested genomes, which was taken as support for the existence of a constraint on genome composition. A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that an inscribed sphere of radius r = 1/ 3 contains almost all points corresponding to various genomes, implying that S < r 2 . The distribution of the points P obtained by S was studied using the Z-curve. Results: In this work, we studied the basic properties of the Z-curve using the “genome order index” as a case study. We show that (1) the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect, (2) the S index is narrowly distributed, (3) based on the second parity rule, the S index can be derived directly from the Shannon entropy and is, therefore, redundant, and (4) the Z-curve suffers from over dimensionality, and the dimension stands for GC content alone suffices to represent any given genome. Conclusion: The “genome order index” S does not represent a constraint on nucleotide composition. Moreover, S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant. Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively. Reviewers: This article was reviewed by Claus Wilke, Joel Bader, Marek Kimmel and Uladzislau Hryshkevich (nominated by Itai Yanai)

Springer - Publisher Connector

Adversarial childhood events are associated with Sudden Infant Death Syndrome (SIDS): an ecological study

Author: Elhaik E.
Publication venue: 'Journal of Clinical and Translational Research'
Publication date: 10/01/2019
Field of study

Sudden Infant Death Syndrome (SIDS) is the most common cause of postneonatal infant death. The allostatic load hypothesis posits that SIDS is the result of perinatal cumulative painful, stressful, or traumatic exposures that tax neonatal regulatory systems. To test it, we explored the relationships between SIDS and two common stressors, male neonatal circumcision (MNC) and prematurity, using latitudinal data from 15 countries and over 40 US states during the years 1999-2016. We used linear regression analyses and likelihood ratio tests to calculate the association between SIDS and the stressors. SIDS prevalence was significantly and positively correlated with MNC and prematurity rates. MNC explained 14.2% of the variability of SIDS's male bias in the US, reminiscent of the Jewish myth of Lilith, the killer of infant males. Combined, the stressors increased the likelihood of SIDS. Ecological analyses are useful to generate hypotheses but cannot provide strong evidence of causality. Biological plausibility is provided by a growing body of experimental and clinical evidence linking adversary preterm and early-life events with SIDS. Together with historical evidence, our findings emphasize the necessity of cohort studies that consider these environmental stressors with the aim of improving the identification of at-risk infants and reducing infant mortality

Gene expression and nucleotide composition are associated with genic methylation level in Oryza sativa

Author: Elhaik E.
Pellegrini M.
Tatarinova T. V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background The methylation of cytosines at CpG dinucleotides, which plays an important role in gene expression regulation, is one of the most studied epigenetic modifications. Thus far, the detection of DNA methylation has been determined mostly by experimental methods, which are not only prone to bench effects and artifacts but are also time-consuming, expensive, and cannot be easily scaled up to many samples. It is therefore useful to develop computational prediction methods for DNA methylation. Our previous studies highlighted the existence of correlations between the GC content of the third codon position (GC3), methylation, and gene expression. We thus designed a model to predict methylation in Oryza sativa based on genomic sequence features and gene expression data. Results We first derive equations to describe the relationship between gene methylation levels, GC3, expression, length, and other gene compositional features. We next assess gene compositional features involving sixmers and their association with methylation levels and other gene level properties. By applying our sixmer-based approach on rice gene expression data we show that it can accurately predict methylation (Pearson’s correlation coefficient r = 0.79) for the majority (79%) of the genes. Matlab code with our model is included. Conclusions Gene expression variation can be used as predictors of gene methylation levels

Springer - Publisher Connector

eScholarship - University of California

The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish

Author: Das R.
Elhaik E.
Pirooznia M.
Wexler P.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Recently, the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish were investigated by applying the Geographic Population Structure (GPS) to a cohort of exclusively Yiddish-speaking and multilingual AJs. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval villages with names that resemble the word "Ashkenaz." These findings were compatible with the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and at odds with the Rhineland hypothesis advocating a Levantine origin for AJs and German origins for Yiddish. We discuss how these findings advance three ongoing debates concerning (1) the historical meaning of the term "Ashkenaz;" (2) the genetic structure of AJs and their geographical origins as inferred from multiple studies employing both modern and ancient DNA and original ancient DNA analyses; and (3) the development of Yiddish. We provide additional validation to the non-Levantine origin of AJs using ancient DNA from the Near East and the Levant. Due to the rising popularity of geo-localization tools to address questions of origin, we briefly discuss the advantages and limitations of popular tools with focus on the GPS approach. Our results reinforce the non-Levantine origins of AJs

Frontiers - Publisher Connector

Responding to an enquiry concerning the geographic population structure (GPS) approach and the origin of Ashkenazic Jews - a reply to Flegontov et al

Author: Das R.
Elhaik E.
Pirooznia M.
Wexler P.
Publication venue
Publication date
Field of study

Recently, we investigated the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish by applying a biogeographical tool, the Geographic Population Structure (GPS), to a cohort of 367 exclusively Yiddish-speaking and multilingual AJs genotyped on the Genochip microarray. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from the word "Ashkenaz." These findings were compatible with the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and at odds with the Rhineland hypothesis advocating a German origin of both. Our approach has been recently adopted by Flegontov et al. (2016a) to trace the origin of the Siberian Ket people and their language. Recently, Flegontov et al. (2016b) have raised several questions concerning the accuracy of the Genochip microarray and GPS, specifically in relation to AJs and Yiddish. Although many of these issues have been addressed in our previous papers, we take this opportunity to clarify the principles of the GPS approach, review the recent biogeographical and ancient DNA findings regarding AJs, and comment on the origin of Yiddish

Communicating the promise, risks, and ethics of large-scale, open space microbiome and metagenome research

Author: Bibby K.
Elhaik E.
Mason C.E.
Shamarina D.
Stoyantcheva I.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2017
Field of study

The public commonly associates microorganisms with pathogens. This suspicion of microorganisms is understandable, as historically microorganisms have killed more humans than any other agent while remaining largely unknown until the late seventeenth century with the works of van Leeuwenhoek and Kircher. Despite our improved understanding regarding microorganisms, the general public are apt to think of diseases rather than of the majority of harmless or beneficial species that inhabit our bodies and the built and natural environment. As long as microbiome research was confined to labs, the public's exposure to microbiology was limited. The recent launch of global microbiome surveys, such as the Earth Microbiome Project and MetaSUB (Metagenomics and Metadesign of Subways and Urban Biomes) project, has raised ethical, financial, feasibility, and sustainability concerns as to the public's level of understanding and potential reaction to the findings, which, done improperly, risk negative implications for ongoing and future investigations, but done correctly, can facilitate a new vision of "smart cities." To facilitate improved future research, we describe here the major concerns that our discussions with ethics committees, community leaders, and government officials have raised, and we expound on how to address them. We further discuss ethical considerations of microbiome surveys and provide practical recommendations for public engagement

Pair Matcher (PaM): fast model-based optimisation of treatment/case-control matches

Author: Elhaik E.
Ryan D.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/11/2018
Field of study

Motivation: In clinical trials, individuals are matched using demographic criteria, paired, and then randomly assigned to treatment and control groups to determine a drug’s efficacy. A chief cause for the irreproducibility of results across pilot to Phase III trials is population stratification bias caused by the uneven distribution of ancestries in the treatment and control groups. Results: Pair Matcher (PaM) addresses stratification bias by optimising pairing assignments a priori and/or a posteriori to the trial using both genetic and demographic criteria. Using simulated and real datasets, we show that PaM identifies ideal and near-ideal pairs that are more genetically homogeneous than those identified based on competing methods, including the commonly used principal component analysis (PCA). Homogenising the treatment (or case) and control groups can be expected to improve the accuracy and reproducibility of the trial or genetic study. PaM’s ancestral inferences also allow characterizing responders and developing a precision medicine approach to treatment