43 research outputs found
Lightweight Lempel-Ziv Parsing
We introduce a new approach to LZ77 factorization that uses O(n/d) words of
working space and O(dn) time for any d >= 1 (for polylogarithmic alphabet
sizes). We also describe carefully engineered implementations of alternative
approaches to lightweight LZ77 factorization. Extensive experiments show that
the new algorithm is superior in most cases, particularly at the lowest memory
levels and for highly repetitive data. As a part of the algorithm, we describe
new methods for computing matching statistics which may be of independent
interest.Comment: 12 page
Mitochondrial echoes of first settlement and genetic continuity in El Salvador
Background: From Paleo-Indian times to recent historical episodes, the Mesoamerican isthmus played an important role in the distribution and patterns of variability all around the double American continent. However, the amount of genetic information currently available on Central American continental populations is very scarce. In order to shed light on the role of Mesoamerica in the peopling of the New World, the present study focuses on the analysis of the mtDNA variation in a population sample from El Salvador.
Methodology/Principal Findings: We have carried out DNA sequencing of the entire control region of the mitochondrial DNA (mtDNA) genome in 90 individuals from El Salvador. We have also compiled more than 3,985 control region profiles from the public domain and the literature in order to carry out inter-population comparisons. The results reveal a predominant Native American component in this region: by far, the most prevalent mtDNA haplogroup in this country (at ~90%) is A2, in contrast with other North, Meso- and South American populations. Haplogroup A2 shows a star-like phylogeny and is very diverse with a substantial proportion of mtDNAs (45%; sequence range 16090–16365) still unobserved in other American populations. Two different Bayesian approaches used to estimate admixture proportions in El Salvador shows that the majority of the mtDNAs observed come from North America. A preliminary founder analysis indicates that the settlement of El Salvador occurred about 13,400±5,200 Y.B.P.. The founder age of A2 in El Salvador is close to the overall age of A2 in America, which suggests that the colonization of this region occurred within a few thousand years of the initial expansion into the Americas.
Conclusions/Significance: As a whole, the results are compatible with the hypothesis that today's A2 variability in El Salvador represents to a large extent the indigenous component of the region. Concordant with this hypothesis is also the observation of a very limited contribution from European and African women (~5%). This implies that the Atlantic slave trade had a very small demographic impact in El Salvador in contrast to its transformation of the gene pool in neighbouring populations from the Caribbean facade
Archaeological Support for the Three-Stage Expansion of Modern Humans across Northeastern Eurasia and into the Americas
Background
Understanding the dynamics of the human range expansion across northeastern Eurasia during the late Pleistocene is central to establishing empirical temporal constraints on the colonization of the Americas [1]. Opinions vary widely on how and when the Americas were colonized, with advocates supporting either a pre-[2] or post-[1], [3], [4], [5], [6] last glacial maximum (LGM) colonization, via either a land bridge across Beringia [3], [4], [5], a sea-faring Pacific Rim coastal route [1], [3], a trans-Arctic route [4], or a trans-Atlantic oceanic route [5]. Here we analyze a large sample of radiocarbon dates from the northeast Eurasian Upper Paleolithic to identify the origin of this expansion, and estimate the velocity of colonization wave as it moved across northern Eurasia and into the Americas.
Methodology/Principal Findings
We use diffusion models [6], [7] to quantify these dynamics. Our results show the expansion originated in the Altai region of southern Siberia ~46kBP , and from there expanded across northern Eurasia at an average velocity of 0.16 km per year. However, the movement of the colonizing wave was not continuous but underwent three distinct phases: 1) an initial expansion from 47-32k calBP; 2) a hiatus from ~32-16k calBP, and 3) a second expansion after the LGM ~16k calBP. These results provide archaeological support for the recently proposed three-stage model of the colonization of the Americas [8], [9]. Our results falsify the hypothesis of a pre-LGM terrestrial colonization of the Americas and we discuss the importance of these empirical results in the light of alternative models.
Conclusions/Significance
Our results demonstrate that the radiocarbon record of Upper Paleolithic northeastern Eurasia supports a post-LGM terrestrial colonization of the Americas falsifying the proposed pre-LGM terrestrial colonization of the Americas. We show that this expansion was not a simple process, but proceeded in three phases, consistent with genetic data, largely in response to the variable climatic conditions of late Pleistocene northeast Eurasia. Further, the constraints imposed by the spatiotemporal gradient in the empirical radiocarbon record across this entire region suggests that North America cannot have been colonized much before the existing Clovis radiocarbon record suggests
Association of Mitochondrial DNA Variations with Lung Cancer Risk in a Han Chinese Population from Southwestern China
Mitochondrial DNA (mtDNA) is particularly susceptible to oxidative damage and mutation due to the high rate of reactive oxygen species (ROS) production and limited DNA-repair capacity in mitochondrial. Previous studies demonstrated that the increased mtDNA copy number for compensation for damage, which was associated with cigarette smoking, has been found to be associated with lung cancer risk among heavy smokers. Given that the common and “non-pathological” mtDNA variations determine differences in oxidative phosphorylation performance and ROS production, an important determinant of lung cancer risk, we hypothesize that the mtDNA variations may play roles in lung cancer risk. To test this hypothesis, we conducted a case-control study to compare the frequencies of mtDNA haplogroups and an 822 bp mtDNA deletion between 422 lung cancer patients and 504 controls. Multivariate logistic regression analysis revealed that haplogroups D and F were related to individual lung cancer resistance (OR = 0.465, 95%CI = 0.329–0.656, p<0.001; and OR = 0.622, 95%CI = 0.425–0.909, p = 0.014, respectively), while haplogroups G and M7 might be risk factors for lung cancer (OR = 3.924, 95%CI = 1.757–6.689, p<0.001; and OR = 2.037, 95%CI = 1.253–3.312, p = 0.004, respectively). Additionally, multivariate logistic regression analysis revealed that cigarette smoking was a risk factor for the 822 bp mtDNA deletion. Furthermore, the increased frequencies of the mtDNA deletion in male cigarette smoking subjects of combined cases and controls with haplogroup D indicated that the haplogroup D might be susceptible to DNA damage from external ROS caused by heavy cigarette smoking
Origin and Post-Glacial Dispersal of Mitochondrial DNA Haplogroups C and D in Northern Asia
More than a half of the northern Asian pool of human mitochondrial DNA (mtDNA) is fragmented into a number of subclades of haplogroups C and D, two of the most frequent haplogroups throughout northern, eastern, central Asia and America. While there has been considerable recent progress in studying mitochondrial variation in eastern Asia and America at the complete genome resolution, little comparable data is available for regions such as southern Siberia – the area where most of northern Asian haplogroups, including C and D, likely diversified. This gap in our knowledge causes a serious barrier for progress in understanding the demographic pre-history of northern Eurasia in general. Here we describe the phylogeography of haplogroups C and D in the populations of northern and eastern Asia. We have analyzed 770 samples from haplogroups C and D (174 and 596, respectively) at high resolution, including 182 novel complete mtDNA sequences representing haplogroups C and D (83 and 99, respectively). The present-day variation of haplogroups C and D suggests that these mtDNA clades expanded before the Last Glacial Maximum (LGM), with their oldest lineages being present in the eastern Asia. Unlike in eastern Asia, most of the northern Asian variants of haplogroups C and D began the expansion after the LGM, thus pointing to post-glacial re-colonization of northern Asia. Our results show that both haplogroups were involved in migrations, from eastern Asia and southern Siberia to eastern and northeastern Europe, likely during the middle Holocene
Beringian Standstill and Spread of Native American Founders
Native Americans derive from a small number of Asian founders who likely arrived to the Americas via Beringia. However, additional details about the intial colonization of the Americas remain unclear. To investigate the pioneering phase in the Americas we analyzed a total of 623 complete mtDNAs from the Americas and Asia, including 20 new complete mtDNAs from the Americas and seven from Asia. This sequence data was used to direct high-resolution genotyping from 20 American and 26 Asian populations. Here we describe more genetic diversity within the founder population than was previously reported. The newly resolved phylogenetic structure suggests that ancestors of Native Americans paused when they reached Beringia, during which time New World founder lineages differentiated from their Asian sister-clades. This pause in movement was followed by a swift migration southward that distributed the founder types all the way to South America. The data also suggest more recent bi-directional gene flow between Siberia and the North American Arctic
Mitochondrial DNA diversity in indigenous populations of the southern extent of Siberia, and the origin of Native American haplogroups
In search of the ancestors of Native American mitochondrial DNA (mtDNA) haplogroups, we analyzed the mtDNA of 531 individuals from nine indigenous populations in Siberia. All mtDNAs were subjected to high-resolution RFLP analysis, sequencing of the control-region hypervariable segment I (HVS-I), and surveyed for additional polymorphic markers in the coding region. Furthermore, the mtDNAs selected according to haplogroup/subhaplogroup status were completely sequenced. Phylogenetic analyses of the resulting data, combined with those from previously published Siberian arctic and sub-arctic populations, revealed that remnants of the ancient Siberian gene pool are still evident in Siberian populations, suggesting that the founding haplotypes of the Native American A-D branches originated in different parts of Siberia. Thus, lineage A complete sequences revealed in the Mansi of the Lower Ob and the Ket of the Lower Yenisei belong to A1, suggesting that A1 mtDNAs occasionally found in the remnants of hunting-gathering populations of northwestern and northern Siberia belonged to a common gene pool of the Siberian progenitors of Paleoindians. Moreover, lineage B1, which is the most closely related to the American B2, occurred in the Tubalar and Tuvan inhabiting the territory between the upper reaches of the Ob River in the west, to the Upper Yenisei region in the east. Finally, the sequence variants of haplogroups C and D, which are most similar to Native American C1 and D1, were detected in the Ulchi of the Lower Amur. Overall, our data suggest that the immediate ancestors of the Siberian/Beringian migrants who gave rise to ancient (pre-Clovis) Paleoindians have a common origin with aboriginal people of the area now designated the Altai-Sayan Upland, as well as the Lower Amur/Sea of Okhotsk region
Faster Lightweight Lempel-Ziv Parsing
We present an algorithm that computes the Lempel-Ziv decomposition in O(n(log σ + log log n)) time and n log σ + ɛn bits of space, where ϵ; is a constant rational parameter, n is the length of the input string, and σ is the alphabet size. The n log σ bits in the space bound are for the input string itself which is treated as read-only. © Springer-Verlag Berlin Heidelberg 2015