94 research outputs found

    Exploring the Genetic Basis of Variation in Gene Predictions with a Synthetic Association Study

    Get PDF
    Identifying DNA polymorphisms that affect molecular processes like transcription, splicing, or translation typically requires genotyping and experimentally characterizing tissue from large numbers of individuals, which remains expensive and time consuming. Here we introduce an alternative strategy: a “synthetic association study” in which we computationally predict molecular phenotypes on artificial genomes containing randomly sampled combinations of polymorphic alleles, and perform a classical association study to identify genotypes underlying variation in these computationally predicted annotations. We applied this method to characterize the effects on gene structure of 32,792 single-nucleotide polymorphisms between two strains of the antibiotic producing fungus Penicilium chrysogenum. Although these SNPs represent only 0.1 percent of the nucleotides in the genome, they collectively altered 1.8 percent of predicted gene models between these strains. To determine which SNPs or combinations of SNPs were responsible for this variation, we predicted protein-coding genes in 500 intermediate genomes, each identical except for randomly chosen alleles at each SNP position. Of 30,468 gene models in the genome, 557 varied across these 500 genomes. 226 of these polymorphic gene models (40%) were perfectly correlated with individual SNPs, all of which were within or immediately proximal to the affected gene. The genetic architectures of the other 321 were more complex, with several examples of SNP epistasis that would have been difficult to predict a priori. We expect that many of the SNPs that affect computational gene structure reflect a biologically unrealistic sensitivity of the gene prediction algorithm to sequence changes, and we propose that genome annotation algorithms could be improved by minimizing their sensitivity to natural polymorphisms. However, many of the SNPs we identified are likely to affect transcript structure in vivo, and the synthetic association study approach can be easily generalized to any computed genome annotation to uncover relationships between genotype and important molecular phenotypes

    Genome-Wide Analysis of the World's Sheep Breeds Reveals High Levels of Historic Mixture and Strong Recent Selection

    Get PDF
    Genomic structure in a global collection of domesticated sheep reveals a history of artificial selection for horn loss and traits relating to pigmentation, reproduction, and body size

    Efficient Sparse Coding in Early Sensory Processing: Lessons from Signal Recovery

    Get PDF
    Sensory representations are not only sparse, but often overcomplete: coding units significantly outnumber the input units. For models of neural coding this overcompleteness poses a computational challenge for shaping the signal processing channels as well as for using the large and sparse representations in an efficient way. We argue that higher level overcompleteness becomes computationally tractable by imposing sparsity on synaptic activity and we also show that such structural sparsity can be facilitated by statistics based decomposition of the stimuli into typical and atypical parts prior to sparse coding. Typical parts represent large-scale correlations, thus they can be significantly compressed. Atypical parts, on the other hand, represent local features and are the subjects of actual sparse coding. When applied on natural images, our decomposition based sparse coding model can efficiently form overcomplete codes and both center-surround and oriented filters are obtained similar to those observed in the retina and the primary visual cortex, respectively. Therefore we hypothesize that the proposed computational architecture can be seen as a coherent functional model of the first stages of sensory coding in early vision

    Prdm9, a Major Determinant of Meiotic Recombination Hotspots, Is Not Functional in Dogs and Their Wild Relatives, Wolves and Coyotes

    Get PDF
    Meiotic recombination is a fundamental process needed for the correct segregation of chromosomes during meiosis in sexually reproducing organisms. In humans, 80% of crossovers are estimated to occur at specific areas of the genome called recombination hotspots. Recently, a protein called PRDM9 was identified as a major player in determining the location of genome-wide meiotic recombination hotspots in humans and mice. The origin of this protein seems to be ancient in evolutionary time, as reflected by its fairly conserved structure in lineages that diverged over 700 million years ago. Despite its important role, there are many animal groups in which Prdm9 is absent (e.g. birds, reptiles, amphibians, diptera) and it has been suggested to have disruptive mutations and thus to be a pseudogene in dogs. Because of the dog's history through domestication and artificial selection, we wanted to confirm the presence of a disrupted Prdm9 gene in dogs and determine whether this was exclusive of this species or whether it also occurred in its wild ancestor, the wolf, and in a close relative, the coyote. We sequenced the region in the dog genome that aligned to the last exon of the human Prdm9, containing the entire zinc finger domain, in 4 dogs, 17 wolves and 2 coyotes. Our results show that the three canid species possess mutations that likely make this gene non functional. Because these mutations are shared across the three species, they must have appeared prior to the split of the wolf and the coyote, millions of years ago, and are not related to domestication. In addition, our results suggest that in these three canid species recombination does not occur at hotspots or hotspot location is controlled through a mechanism yet to be determined
    corecore