84 research outputs found

    Burrows–Wheeler compression: Principles and reflections

    Get PDF
    AbstractAfter a general description of the Burrows–Wheeler transform and a brief survey of recent work on processing its output, the paper examines the coding of the zero-runs from the MTF recoding stage, an aspect with little prior treatment. It is concluded that the original scheme proposed by Wheeler is extremely efficient and unlikely to be much improved.The paper then proposes some new interpretations and uses of the Burrows–Wheeler transform, with new insights and approaches to lossless compression, perhaps including techniques from error correction

    Burrows‐Wheeler post‐transformation with effective clustering and interpolative coding

    Get PDF
    Lossless compression methods based on the Burrows‐Wheeler transform (BWT) are regarded as an excellent compromise between speed and compression efficiency: they provide compression rates close to the PPM algorithms, with the speed of dictionary‐based methods. Instead of the laborious statistics‐gathering process used in PPM, the BWT reversibly sorts the input symbols, using as the sort key as many following characters as necessary to make the sort unique. Characters occurring in similar contexts are sorted close together, resulting in a clustered symbol sequence. Run‐length encoding and Move‐to‐Front (MTF) recoding, combined with a statistical Huffman or arithmetic coder, is then typically used to exploit the clustering. A drawback of the MTF recoding is that knowledge of the character that produced the MTF number is lost. In this paper, we present a new, competitive Burrows‐Wheeler posttransform stage that takes advantage of interpolative coding—a fast binary encoding method for integer sequences, being able to exploit clusters without requiring explicit statistics. We introduce a fast and simple way to retain knowledge of the run characters during the MTF recoding and use this to improve the clustering of MTF numbers and run‐lengths by applying reversible, stable sorting, with the run characters as sort keys, achieving significant improvement in the compression rate, as shown here by experiments on common text corpora.</p

    Pre-andean tectonic events from albian to eocene in the middle Magdalena valley and situation of the western flank of the proto-eastern cordillera (Colombia)

    Get PDF
    For years in Colombia research has been mainly centered on the knowledge of the most recent uplifting of the Eastern Cordillera, and on the evaluation of the Tertiary units in many basins to understand the recent deformation. Recently, due to the necessities of the oil industry to evaluate new targets, new information has been acquired on Cretaceous units form the Eastern Cordillera and Middle Magdalena Valley. The aim of this work is to analyse the data which gives evidence of deformation in the basin of Middle Magdalena Valley during the Cretaceous. The Cretaceous units are exposed in the Eastern Cordillera, but not in the Middle Magdalena Valley, where only a few outcrops are known. Instead a lot of seismic information is available on this area and some wells that have drilled the Cretaceous units. Using large seismic reflection sections and well information, I could identify important events, unconformities, and faults which were inverted in different stages during the Lower Cretaceous to the Eocene, prior to the Andean Orogeny. For that, I will show some key seismic sections interpreted in the Middle Magdalena Valley (MMV). This interpretation will be validated with sections using surface field information and well information, and the structural styles will be discussed. My results highlight the new proposed interpretation for the cretaceous sequence, showing more details especially in the MMV, due to the seismic coverage and well information, and in the Western Eastern Cordillera (WEC) with more difficulty using some surface information that was collected during this work as well as preexisting information from available literature, how there were multiple phases of deformation that include different states of uprising or relative sea level change during the Cretaceous period, which is important to understand the real process that affected the Cretaceous behavior and also to suggest that tectonics and sea level changes or their interaction could be more complex during this time. Based on the construction and analysis of transgressive-regressive curves and Fischer relative accommodation space plots, I interpreted system tracts and sea-level variations and recognized major surface boundaries surface (unconformities or their correlative conformities) at the Barremian-Aptian, during the Late Albian-Early Cenomanianand and Early Campanian times. Aditionally, based on tracts interpreted and surface boundaries, I built regional paleo-facies maps from Berriasian to Coniacian-Santonian, showing several periods of regression and transgression till early Campanian. From seismic interpretation made in this work, using seismic stratigraphy and wheeler diagrams, five sequences (S) and five unconformities (SU) were identified as sequence boundaries: recognized at the Jurassic-Cretaceous, Late Barremian-Early Aptian (~125 Ma), and at the Albian-Cenomanian (~100 Ma), Santonian-Campanian (~ 80 Ma), and Paleocene-Eocene boundaries, these unconformities or their correlative conformities (SU) have a regional extension. The thermochronological information collected, prepared, analyzed, interpreted and modeled in this work, helped me to recognized two events of heating and two events of cooling for samples deposited before 85 Ma. The heating events occurred from the sedimentation of the units till ~85– 80 Ma and from ~70 Ma to ~10 Ma and the cooling events occurred between ~80 Ma till ~70 Ma and from ~10-2 Ma. The paleo-geothermal gradient versus the present-day thermal gradient make it possible to identify the presence of unconformities. The integration of the structural reconstructions made in this work through the Cretaceous, considering the sequences (S) and the discordances (SU) and the transgressive-regressive sequences (paleo-facies maps), shows the relation between deformation, deposition and erosion and when each of these events happened during the Cretaceous in these basins (MMV and WEC). My tectonic and geodynamic reconstruction makes it possible to conclude that from the Jurassic to the Paleocene, repetitive tectonic extension and compression events produced by a cyclic subduction regime, explain the relative sea fluctuation and the deposition, exhumation and erosion phases observed in the MMV and WEC during the Cretaceous. Finally, the presence of accreted blocks (for ex. Quebradagrande) is undeniable. However, the deformation of the upper plate in my model is not dependent on these accreted terranes, but rather on the subduction regime, for example changes in the plate subduction angle (steep or flat subduction), the convergence rate, the polarity of subduction, the age of the slab, etc.Durante años en Colombia la investigación se ha centrado principalmente en el conocimiento de los más reciente levantamiento de la Cordillera Oriental, y en la evaluación de las unidades terciarias en Muchas cuencas para entender la reciente deformación. Recientemente, debido a las necesidades de la industria petrolera para evaluar nuevos objetivos, se ha adquirido nueva información sobre Las unidades cretácicas forman la Cordillera Oriental y el Valle del Magdalena Medio. El objetivo de este trabajo es analizar los datos que evidencian la deformación en el Cuenca del Valle del Magdalena Medio durante el Cretácico. Las unidades cretácicas están expuestas en la Cordillera Oriental, pero no en el Medio. Valle de Magdalena, donde solo se conocen unos pocos afloramientos.Doctorad

    Genomic analyses of hair from Ludwig van Beethoven

    Get PDF
    Ludwig van Beethoven (1770–1827) remains among the most influential and popular classical music composers. Health problems significantly impacted his career as a composer and pianist, including progressive hearing loss, recurring gastrointestinal complaints, and liver disease. In 1802, Beethoven requested that following his death, his disease be described and made public. Medical biographers have since proposed numerous hypotheses, including many substantially heritable conditions. Here we attempt a genomic analysis of Beethoven in order to elucidate potential underlying genetic and infectious causes of his illnesses. We incorporated improvements in ancient DNA methods into existing protocols for ancient hair samples, enabling the sequencing of high-coverage genomes from small quantities of historical hair. We analyzed eight independently sourced locks of hair attributed to Beethoven, five of which originated from a single European male. We deemed these matching samples to be almost certainly authentic and sequenced Beethoven\u27s genome to 24-fold genomic coverage. Although we could not identify a genetic explanation for Beethoven\u27s hearing disorder or gastrointestinal problems, we found that Beethoven had a genetic predisposition for liver disease. Metagenomic analyses revealed furthermore that Beethoven had a hepatitis B infection during at least the months prior to his death. Together with the genetic predisposition and his broadly accepted alcohol consumption, these present plausible explanations for Beethoven\u27s severe liver disease, which culminated in his death. Unexpectedly, an analysis of Y chromosomes sequenced from five living members of the Van Beethoven patrilineage revealed the occurrence of an extra-pair paternity event in Ludwig van Beethoven\u27s patrilineal ancestry

    Lokalizace deformace v anizotropních horninách: důsledky pro geodynamické interpretace

    Get PDF
    Lokalizace deformace se vyskytuje v zemské kůře jako důsledek aplikovaného napětí a je rozšířeným fenoménem, který lze sledovat v korových horninách. Obvykle se lokalizace deformace projevuje ve formě střižných zón. Střižné zóny malého měřítka označované jako střižné pásy (shear bands), které jsou součástí S-C struktur jsou často využívány jako kinematické indikátory, ačkoli jejich vývoj, kinematický rámec a kontinuita nejsou dobře definovány. Interpretace geodynamického vývoje nemusí být pak jednoznačná a snadná. Z hlediska kinematické kontinuity a vývoje byly rozlišeny a popsány dva typy S-C struktur: a) kinematicky nekontinuální S-C struktury tvořené v průběhu více deformačních události a b) kinematicky kontinuální S-C struktury tvořené během jediné deformační události. Kinematicky nekontinuální S-C struktury byly studovány v západní části Taurského okna ve Východních Alpách a v Gemersko-veporské kontaktní zóně v Centrálních Západních Karpatech, kde předcházející geodynamické interpretace misinterpretovaly význam struktur lokalizace deformace. Kinematicky kontinuální S-C struktury byly studovány v Jihoarmorické střižné zóně, kde byly S-C struktury definovány a poprvé popsány (Berthé a kol., 1979). Navzájem protínající se stavby pod malými úhly tvořící S-C geometrie byly dokumentovány v rámci...Localization of deformation occurs in Earth's crust as a consequence of applied stress and is widespread phenomenon that can be found in crustal rocks. Such localization of deformation can be mostly seen in a form of shear zones. Small shear zones referred as shear bands or S-C structures are often used as kinematic indicators. However, the evolution and kinematic continuity of such structures is not well identified, which makes it problematic when interpreting regional geodynamic evolution. Two possible cases were distinguished and described in this thesis: a) kinematically discontinuous S-C structures formed during two deformation events and b) kinematically continuous S-C structures formed during single deformation event. Kinematically unrelated S-C structures were studied in westernmost part of Tauern Window in Eastern Alps and in Gemer-Vepor Contact Zone in Central West Carpathians where previous geodynamic interpretations might have misinterpreted localization structures. Kinematically continuous shear bands were studied in South Armorican Shear Zone where the S-C fabrics were originally defined and described (Berthé et al., 1979). Two fabrics that crosscut each other at small angles forming S-C geometries were documented during field work and studied from macroscale down to microscale or...Ústav petrologie a strukturní geologieInstitute of Petrology and Structural GeologyPřírodovědecká fakultaFaculty of Scienc

    Lokalizace deformace v anizotropních horninách: důsledky pro geodynamické interpretace

    Get PDF
    Lokalizace deformace se vyskytuje v zemské kůře jako důsledek aplikovaného napětí a je rozšířeným fenoménem, který lze sledovat v korových horninách. Obvykle se lokalizace deformace projevuje ve formě střižných zón. Střižné zóny malého měřítka označované jako střižné pásy (shear bands), které jsou součástí S-C struktur jsou často využívány jako kinematické indikátory, ačkoli jejich vývoj, kinematický rámec a kontinuita nejsou dobře definovány. Interpretace geodynamického vývoje nemusí být pak jednoznačná a snadná. Z hlediska kinematické kontinuity a vývoje byly rozlišeny a popsány dva typy S-C struktur: a) kinematicky nekontinuální S-C struktury tvořené v průběhu více deformačních události a b) kinematicky kontinuální S-C struktury tvořené během jediné deformační události. Kinematicky nekontinuální S-C struktury byly studovány v západní části Taurského okna ve Východních Alpách a v Gemersko-veporské kontaktní zóně v Centrálních Západních Karpatech, kde předcházející geodynamické interpretace misinterpretovaly význam struktur lokalizace deformace. Kinematicky kontinuální S-C struktury byly studovány v Jihoarmorické střižné zóně, kde byly S-C struktury definovány a poprvé popsány (Berthé a kol., 1979). Navzájem protínající se stavby pod malými úhly tvořící S-C geometrie byly dokumentovány v rámci...Localization of deformation occurs in Earth's crust as a consequence of applied stress and is widespread phenomenon that can be found in crustal rocks. Such localization of deformation can be mostly seen in a form of shear zones. Small shear zones referred as shear bands or S-C structures are often used as kinematic indicators. However, the evolution and kinematic continuity of such structures is not well identified, which makes it problematic when interpreting regional geodynamic evolution. Two possible cases were distinguished and described in this thesis: a) kinematically discontinuous S-C structures formed during two deformation events and b) kinematically continuous S-C structures formed during single deformation event. Kinematically unrelated S-C structures were studied in westernmost part of Tauern Window in Eastern Alps and in Gemer-Vepor Contact Zone in Central West Carpathians where previous geodynamic interpretations might have misinterpreted localization structures. Kinematically continuous shear bands were studied in South Armorican Shear Zone where the S-C fabrics were originally defined and described (Berthé et al., 1979). Two fabrics that crosscut each other at small angles forming S-C geometries were documented during field work and studied from macroscale down to microscale or...Ústav petrologie a strukturní geologieInstitute of Petrology and Structural GeologyFaculty of SciencePřírodovědecká fakult

    A seventeenth-centuryMycobacterium tuberculosisgenome supports a Neolithic emergence of theMycobacterium tuberculosiscomplex

    Get PDF
    BACKGROUND: Although tuberculosis accounts for the highest mortality from a bacterial infection on a global scale, questions persist regarding its origin. One hypothesis based on modern Mycobacterium tuberculosis complex (MTBC) genomes suggests their most recent common ancestor followed human migrations out of Africa approximately 70,000 years before present. However, studies using ancient genomes as calibration points have yielded much younger dates of less than 6000 years. Here, we aim to address this discrepancy through the analysis of the highest-coverage and highest-quality ancient MTBC genome available to date, reconstructed from a calcified lung nodule of Bishop Peder Winstrup of Lund (b. 1605-d. 1679). RESULTS: A metagenomic approach for taxonomic classification of whole DNA content permitted the identification of abundant DNA belonging to the human host and the MTBC, with few non-TB bacterial taxa comprising the background. Genomic enrichment enabled the reconstruction of a 141-fold coverage M. tuberculosis genome. In utilizing this high-quality, high-coverage seventeenth-century genome as a calibration point for dating the MTBC, we employed multiple Bayesian tree models, including birth-death models, which allowed us to model pathogen population dynamics and data sampling strategies more realistically than those based on the coalescent. CONCLUSIONS: The results of our metagenomic analysis demonstrate the unique preservation environment calcified nodules provide for DNA. Importantly, we estimate a most recent common ancestor date for the MTBC of between 2190 and 4501 before present and for Lineage 4 of between 929 and 2084 before present using multiple models, confirming a Neolithic emergence for the MTBC
    corecore