230,147 research outputs found
The C-value enigma and timing of the Cambrian explosion
The Cambrian explosion is a grand challenge to science today and involves
multidisciplinary study. This event is generally believed as a result of
genetic innovations, environmental factors and ecological interactions, even
though there are many conflicts on nature and timing of metazoan origins. The
crux of the matter is that an entire roadmap of the evolution is missing to
discern the biological complexity transition and to evaluate the critical role
of the Cambrian explosion in the overall evolutionary context. Here we
calculate the time of the Cambrian explosion by an innovative and accurate
"C-value clock"; our result (560 million years ago) quite fits the fossil
records. We clarify that the intrinsic reason of genome evolution determined
the Cambrian explosion. A general formula for evaluating genome size of
different species has been found, by which major questions of the C-value
enigma can be solved and the genome size evolution can be illustrated. The
Cambrian explosion is essentially a major transition of biological complexity,
which corresponds to a turning point in genome size evolution. The observed
maximum prokaryotic complexity is just a relic of the Cambrian explosion and it
is supervised by the maximum information storage capability in the observed
universe. Our results open a new prospect of studying metazoan origins and
molecular evolution.Comment: 46 pages, 10 figure
Genome maps across 26 human populations reveal population-specific patterns of structural variation.
Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome
Recommended from our members
The Rabl configuration limits topological entanglement of chromosomes in budding yeast.
The three dimensional organization of genomes remains mostly unknown due to their high degree of condensation. Biophysical studies predict that condensation promotes the topological entanglement of chromatin fibers and the inhibition of function. How organisms balance between functionally active genomes and a high degree of condensation remains to be determined. Here we hypothesize that the Rabl configuration, characterized by the attachment of centromeres and telomeres to the nuclear envelope, helps to reduce the topological entanglement of chromosomes. To test this hypothesis we developed a novel method to quantify chromosome entanglement complexity in 3D reconstructions obtained from Chromosome Conformation Capture (CCC) data. Applying this method to published data of the yeast genome, we show that computational models implementing the attachment of telomeres or centromeres alone are not sufficient to obtain the reduced entanglement complexity observed in 3D reconstructions. It is only when the centromeres and telomeres are attached to the nuclear envelope (i.e. the Rabl configuration) that the complexity of entanglement of the genome is comparable to that of the 3D reconstructions. We therefore suggest that the Rabl configuration is an essential player in the simplification of the entanglement of chromatin fibers
Architecture of viral genome-delivery molecular machines.
From the abyss of the ocean to the human gut, bacterial viruses (or bacteriophages) have colonized all ecosystems of the planet earth and evolved in sync with their bacterial hosts. Over 95% of bacteriophages have a tail that varies greatly in length and complexity. The tail complex interrupts the icosahedral capsid symmetry and provides both an entry for viral genome-packaging during replication and an exit for genome-ejection during infection. Here, we review recent progress in deciphering the structure, assembly and conformational dynamics of viral genome-delivery tail machines. We focus on the bacteriophages P22 and T7, two well-studied members of the Podoviridae family that use short, non-contractile tails to infect Gram-negative bacteria. The structure of specialized tail fibers and their putative role in host anchoring, cell-surface penetration and genome-ejection is discussed
Accessing complexity from genome information
This paper studies the information content of the chromosomes of 24 species. In a first
phase, a scheme inspired in dynamical system state space representation is developed.
For each chromosome the state space dynamical evolution is shed into a two dimensional
chart. The plots are then analyzed and characterized in the perspective of fractal dimension.
This information is integrated in two measures of the species’ complexity addressing
its average and variability. The results are in close accordance with phylogenetics pointing
quantitative aspects of the species’ genomic complexity
Towards Better Understanding of Artifacts in Variant Calling from High-Coverage Samples
Motivation: Whole-genome high-coverage sequencing has been widely used for
personal and cancer genomics as well as in various research areas. However, in
the lack of an unbiased whole-genome truth set, the global error rate of
variant calls and the leading causal artifacts still remain unclear even given
the great efforts in the evaluation of variant calling methods.
Results: We made ten SNP and INDEL call sets with two read mappers and five
variant callers, both on a haploid human genome and a diploid genome at a
similar coverage. By investigating false heterozygous calls in the haploid
genome, we identified the erroneous realignment in low-complexity regions and
the incomplete reference genome with respect to the sample as the two major
sources of errors, which press for continued improvements in these two areas.
We estimated that the error rate of raw genotype calls is as high as 1 in
10-15kb, but the error rate of post-filtered calls is reduced to 1 in 100-200kb
without significant compromise on the sensitivity.
Availability: BWA-MEM alignment: http://bit.ly/1g8XqRt; Scripts:
https://github.com/lh3/varcmp; Additional data:
https://figshare.com/articles/Towards_better_understanding_of_artifacts_in_variating_calling_from_high_coverage_samples/981073Comment: Published versio
The ever-evolving concept of the gene: The use of RNA/Protein experimental techniques to understand genome functions
The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as "junk" DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years
- …