17 research outputs found
The birth of a human-specific neural gene by incomplete duplication and gene fusion
Background: Gene innovation by duplication is a fundamental evolutionary process but is difficult to study in humans due to the large size, high sequence identity, and mosaic nature of segmental duplication blocks. The human-specific gene hydrocephalus-inducing 2, HYDIN2, was generated by a 364 kbp duplication of 79 internal exons of the large ciliary gene HYDIN from chromosome 16q22.2 to chromosome 1q21.1. Because the HYDIN2 locus lacks the ancestral promoter and seven terminal exons of the progenitor gene, we sought to characterize transcription at this locus by coupling reverse transcription polymerase chain reaction and long-read sequencing. Results: 5' RACE indicates a transcription start site for HYDIN2 outside of the duplication and we observe fusion transcripts spanning both the 5' and 3' breakpoints. We observe extensive splicing diversity leading to the formation of altered open reading frames (ORFs) that appear to be under relaxed selection. We show that HYDIN2 adopted a new promoter that drives an altered pattern of expression, with highest levels in neural tissues. We estimate that the HYDIN duplication occurred ~3.2 million years ago and find that it is nearly fixed (99.9%) for diploid copy number in contemporary humans. Examination of 73 chromosome 1q21 rearrangement patients reveals that HYDIN2 is deleted or duplicated in most cases. Conclusions: Together, these data support a model of rapid gene innovation by fusion of incomplete segmental duplications, altered tissue expression, and potential subfunctionalization or neofunctionalization of HYDIN2 early in the evolution of the Homo lineage
DNA word analysis based on the distribution of the distances between symmetric words
We address the problem of discovering pairs of symmetric genomic words (i.e., words and the
corresponding reversed complements) occurring at distances that are overrepresented. For this
purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical
distance distribution and with clusters of overrepresented short distances. We speculate that patterns
of overrepresentation of short distances between symmetric word pairs may allow the occurrence of
non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human
genome, and analysed both the complete genome as well as a version with known repetitive sequences
masked out. We reported several well-defined features in the distributions of distances, which can be
classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in
greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised
by the surprising fact that they occur at single distances more frequently than expecte