44,010 research outputs found

    In silico karyotyping of chromosomally polymorphic malaria mosquitoes in the Anopheles gambiae complex

    Get PDF
    Chromosomal inversion polymorphisms play an important role in adaptation to environmental heterogeneities. For mosquito species in the Anopheles gambiae complex that are significant vectors of human malaria, paracentric inversion polymorphisms are abundant and are associated with ecologically and epidemiologically important phenotypes. Improved understanding of these traits relies on determining mosquito karyotype, which currently depends upon laborious cytogenetic methods whose application is limited both by the requirement for specialized expertise and for properly preserved adult females at specific gonotrophic stages. To overcome this limitation, we developed sets of tag single nucleotide polymorphisms (SNPs) inside inversions whose biallelic genotype is strongly correlated with inversion genotype. We leveraged 1,347 fully sequenced An. gambiae and Anopheles coluzzii genomes in the Ag1000G database of natural variation. Beginning with principal components analysis (PCA) of population samples, applied to windows of the genome containing individual chromosomal rearrangements, we classified samples into three inversion genotypes, distinguishing homozygous inverted and homozygous uninverted groups by inclusion of the small subset of specimens in Ag1000G that are associated with cytogenetic metadata. We then assessed the correlation between candidate tag SNP genotypes and PCA-based inversion genotypes in our training sets, selecting those candidates with >80% agreement. Our initial tests both in held-back validation samples from Ag1000G and in data independent of Ag1000G suggest that when used for in silico inversion genotyping of sequenced mosquitoes, these tags perform better than traditional cytogenetics, even for specimens where only a small subset of the tag SNPs can be successfully ascertained

    Human pol II promoter prediction: time series descriptors and machine learning

    Get PDF
    Although several in silico promoter prediction methods have been developed to date, they are still limited in predictive performance. The limitations are due to the challenge of selecting appropriate features of promoters that distinguish them from non-promoters and the generalization or predictive ability of the machine-learning algorithms. In this paper we attempt to define a novel approach by using unique descriptors and machine-learning methods for the recognition of eukaryotic polymerase II promoters. In this study, non-linear time series descriptors along with non-linear machine-learning algorithms, such as support vector machine (SVM), are used to discriminate between promoter and non-promoter regions. The basic idea here is to use descriptors that do not depend on the primary DNA sequence and provide a clear distinction between promoter and non-promoter regions. The classification model built on a set of 1000 promoter and 1500 non-promoter sequences, showed a 10-fold cross-validation accuracy of 87% and an independent test set had an accuracy >85% in both promoter and non-promoter identification. This approach correctly identified all 20 experimentally verified promoters of human chromosome 22. The high sensitivity and selectivity indicates that n-mer frequencies along with non-linear time series descriptors, such as Lyapunov component stability and Tsallis entropy, and supervised machine-learning methods, such as SVMs, can be useful in the identification of pol II promoters

    Mechanics and dynamics of X-chromosome pairing at X inactivation

    Get PDF
    At the onset of X-chromosome inactivation, the vital process whereby female mammalian cells equalize X products with respect to males, the X chromosomes are colocalized along their Xic (X-inactivation center) regions. The mechanism inducing recognition and pairing of the X’s remains, though, elusive. Starting from recent discoveries on the molecular factors and on the DNA sequences (the so-called "pairing sites") involved, we dissect the mechanical basis of Xic colocalization by using a statistical physics model. We show that soluble DNA-specific binding molecules, such as those experimentally identified, can be indeed sufficient to induce the spontaneous colocalization of the homologous chromosomes but only when their concentration, or chemical affinity, rises above a threshold value as a consequence of a thermodynamic phase transition. We derive the likelihood of pairing and its probability distribution. Chromosome dynamics has two stages: an initial independent Brownian diffusion followed, after a characteristic time scale, by recognition and pairing. Finally, we investigate the effects of DNA deletion/insertions in the region of pairing sites and compare model predictions to available experimental data
    • …
    corecore