196 research outputs found

    PeanutMap: an online genome database for comparative molecular maps of peanut

    Get PDF
    BACKGROUND: Molecular maps have been developed for many species, and are of particular importance for varietal development and comparative genomics. However, despite the existence of multiple sets of linkage maps, databases of these data are lacking for many species, including peanut. DESCRIPTION: PeanutMap provides a web-based interface for viewing specific linkage groups of a map set. PeanutMap can display and compare multiple maps of a set based upon marker or trait correspondences, which is particularly important as cultivated peanut is a disomic tetraploid. The database can also compare linkage groups among multiple map sets, allowing identification of corresponding linkage groups from results of different research projects. Data from the two published peanut genome map sets, and also from three maps sets of phenotypic traits are present in the database. Data from PeanutMap have been incorporated into the Legume Information System website to allow peanut map data to be used for cross-species comparisons. CONCLUSION: The utility of the database is expected to increase as several SSR-based maps are being developed currently, and expanded efforts for comparative mapping of legumes are underway. Optimal use of these data will benefit from the development of tools to facilitate comparative analysis

    Formation of regulatory modules by local sequence duplication

    Get PDF
    Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

    Gut evacuation rate and grazing impact of the krill Thysanoessa raschii and T. inermis

    Get PDF
    Gut evacuation rates and ingestion rates were measured for the krill Thysanoessa raschii and T. inermis in GodthΓ₯bsfjord, SW Greenland. Combined with biomass of the krill community, the grazing potential on phytoplankton along the fjord was estimated. Gut evacuation rates were 3.9 and 2.3 hβˆ’1 for T. raschii and T. inermis, respectively. Ingestion rates were 12.2 Β± 7.5 Β΅g C mg Cβˆ’1 dayβˆ’1 (n = 4) for T. inermis and 4.9 Β± 3.2 Β΅g C mg Cβˆ’1 dayβˆ’1 (n = 4) for T. raschii, corresponding to daily rations of 1.2 and 0.5 % body carbon dayβˆ’1. Clearance experiments conducted in parallel to the gut evacuation experiment gave similar results for ingestion rates and daily rations. Krill biomass was highest in the central part of the fjord’s length, with T. raschii dominating. Community grazing rates from krill and copepods were comparable; however, their combined impact was low, estimated as <1 % of phytoplankton standing stock being removed per day during this late spring study

    Analysis of Microsatellite Variation in Drosophila melanogaster with Population-Scale Genome Sequencing

    Get PDF
    Genome sequencing technologies promise to revolutionize our understanding of genetics, evolution, and disease by making it feasible to survey a broad spectrum of sequence variation on a population scale. However, this potential can only be realized to the extent that methods for extracting and interpreting distinct forms of variation can be established. The error profiles and read length limitations of early versions of next-generation sequencing technologies rendered them ineffective for some sequence variant types, particularly microsatellites and other tandem repeats, and fostered the general misconception that such variants are inherently inaccessible to these platforms. At the same time, tandem repeats have emerged as important sources of functional variation. Tandem repeats are often located in and around genes, and frequent mutations in their lengths exert quantitative effects on gene function and phenotype, rapidly degrading linkage disequilibrium between markers and traits. Sensitive identification of these variants in large-scale next-gen sequencing efforts will enable more comprehensive association studies capable of revealing previously invisible associations. We present a population-scale analysis of microsatellite repeats using whole-genome data from 158 inbred isolates from the Drosophila Genetics Reference Panel, a collection of over 200 extensively phenotypically characterized isolates from a single natural population, to uncover processes underlying repeat mutation and to enable associations with behavioral, morphological, and life-history traits. Analysis of repeat variation from next-generation sequence data will also enhance studies of genome stability and neurodegenerative diseases

    Transcriptomic Analysis of Toxoplasma Development Reveals Many Novel Functions and Structures Specific to Sporozoites and Oocysts

    Get PDF
    Sexual reproduction of Toxoplasma gondii occurs exclusively within enterocytes of the definitive felid host. The resulting immature oocysts are excreted into the environment during defecation, where in the days following, they undergo a complex developmental process. Within each oocyst, this culminates in the generation of two sporocysts, each containing 4 sporozoites. A single felid host is capable of shedding millions of oocysts, which can survive for years in the environment, are resistant to most methods of microbial inactivation during water-treatment and are capable of producing infection in warm-blooded hosts at doses as low as 1–10 ingested oocysts. Despite its extremely interesting developmental biology and crucial role in initiating an infection, almost nothing is known about the oocyst stage beyond morphological descriptions. Here, we present a complete transcriptomic analysis of the oocyst from beginning to end of its development. In addition, and to identify genes whose expression is unique to this developmental form, we compared the transcriptomes of developing oocysts with those of in vitro-derived tachyzoites and in vivo-derived bradyzoites. Our results reveal many genes whose expression is specifically up- or down-regulated in different developmental stages, including many genes that are likely critical to oocyst development, wall formation, resistance to environmental destruction and sporozoite infectivity. Of special note is the up-regulation of genes that appear β€œoff” in tachyzoites and bradyzoites but that encode homologues of proteins known to serve key functions in those asexual stages, including a novel pairing of sporozoite-specific paralogues of AMA1 and RON2, two proteins that have recently been shown to form a crucial bridge during tachyzoite invasion of host cells. This work provides the first in-depth insight into the development and functioning of one of the most important but least studied stages in the Toxoplasma life cycle

    Fast MCMC sampling for hidden markov models to determine copy number variations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Hidden Markov Models (HMM) are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons the parameters of a HMM are often estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches integrating out parameters using Markov Chain Monte Carlo (MCMC) sampling. While the advantages of Bayesian approaches have been clearly demonstrated, the likelihood based approaches are still preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems.</p> <p>Results</p> <p>We propose an approximate sampling technique, inspired by compression of discrete sequences in HMM computations and by <it>kd</it>-trees to leverage spatial relations between data points in typical data sets, to speed up the MCMC sampling.</p> <p>Conclusions</p> <p>We test our approximate sampling method on simulated and biological ArrayCGH datasets and high-density SNP arrays, and demonstrate a speed-up of 10 to 60 respectively 90 while achieving competitive results with the state-of-the art Bayesian approaches.</p> <p><it>Availability: </it>An implementation of our method will be made available as part of the open source GHMM library from <url>http://ghmm.org</url>.</p

    Chromatin States Accurately Classify Cell Differentiation Stages

    Get PDF
    Gene expression is controlled by the concerted interactions between transcription factors and chromatin regulators. While recent studies have identified global chromatin state changes across cell-types, it remains unclear to what extent these changes are co-regulated during cell-differentiation. Here we present a comprehensive computational analysis by assembling a large dataset containing genome-wide occupancy information of 5 histone modifications in 27 human cell lines (including 24 normal and 3 cancer cell lines) obtained from the public domain, followed by independent analysis at three different representations. We classified the differentiation stage of a cell-type based on its genome-wide pattern of chromatin states, and found that our method was able to identify normal cell lines with nearly 100% accuracy. We then applied our model to classify the cancer cell lines and found that each can be unequivocally classified as differentiated cells. The differences can be in part explained by the differential activities of three regulatory modules associated with embryonic stem cells. We also found that the β€œhotspot” genes, whose chromatin states change dynamically in accordance to the differentiation stage, are not randomly distributed across the genome but tend to be embedded in multi-gene chromatin domains, and that specialized gene clusters tend to be embedded in stably occupied domains

    Combining Structure and Sequence Information Allows Automated Prediction of Substrate Specificities within Enzyme Families

    Get PDF
    An important aspect of the functional annotation of enzymes is not only the type of reaction catalysed by an enzyme, but also the substrate specificity, which can vary widely within the same family. In many cases, prediction of family membership and even substrate specificity is possible from enzyme sequence alone, using a nearest neighbour classification rule. However, the combination of structural information and sequence information can improve the interpretability and accuracy of predictive models. The method presented here, Active Site Classification (ASC), automatically extracts the residues lining the active site from one representative three-dimensional structure and the corresponding residues from sequences of other members of the family. From a set of representatives with known substrate specificity, a Support Vector Machine (SVM) can then learn a model of substrate specificity. Applied to a sequence of unknown specificity, the SVM can then predict the most likely substrate. The models can also be analysed to reveal the underlying structural reasons determining substrate specificities and thus yield valuable insights into mechanisms of enzyme specificity. We illustrate the high prediction accuracy achieved on two benchmark data sets and the structural insights gained from ASC by a detailed analysis of the family of decarboxylating dehydrogenases. The ASC web service is available at http://asc.informatik.uni-tuebingen.de/

    An Active Site Aromatic Triad in Escherichia coli DNA Pol IV Coordinates Cell Survival and Mutagenesis in Different DNA Damaging Agents

    Get PDF
    DinB (DNA Pol IV) is a translesion (TLS) DNA polymerase, which inserts a nucleotide opposite an otherwise replication-stalling N2-dG lesion in vitro, and confers resistance to nitrofurazone (NFZ), a compound that forms these lesions in vivo. DinB is also known to be part of the cellular response to alkylation DNA damage. Yet it is not known if DinB active site residues, in addition to aminoacids involved in DNA synthesis, are critical in alkylation lesion bypass. It is also unclear which active site aminoacids, if any, might modulate DinB's bypass fidelity of distinct lesions. Here we report that along with the classical catalytic residues, an active site β€œaromatic triad”, namely residues F12, F13, and Y79, is critical for cell survival in the presence of the alkylating agent methyl methanesulfonate (MMS). Strains expressing dinB alleles with single point mutations in the aromatic triad survive poorly in MMS. Remarkably, these strains show fewer MMS- than NFZ-induced mutants, suggesting that the aromatic triad, in addition to its role in TLS, modulates DinB's accuracy in bypassing distinct lesions. The high bypass fidelity of prevalent alkylation lesions is evident even when the DinB active site performs error-prone NFZ-induced lesion bypass. The analyses carried out with the active site aromatic triad suggest that the DinB active site residues are poised to proficiently bypass distinctive DNA lesions, yet they are also malleable so that the accuracy of the bypass is lesion-dependent

    Empirical Distributions of F-ST from Large-Scale Human Polymorphism Data

    Get PDF
    Studies of the apportionment of human genetic variation have long established that most human variation is within population groups and that the additional variation between population groups is small but greatest when comparing different continental populations. These studies often used Wright’s FST that apportions the standardized variance in allele frequencies within and between population groups. Because local adaptations increase population differentiation, high-FST may be found at closely linked loci under selection and used to identify genes undergoing directional or heterotic selection. We re-examined these processes using HapMap data. We analyzed 3 million SNPs on 602 samples from eight worldwide populations and a consensus subset of 1 million SNPs found in all populations. We identified four major features of the data: First, a hierarchically FST analysis showed that only a paucity (12%) of the total genetic variation is distributed between continental populations and even a lesser genetic variation (1%) is found between intra-continental populations. Second, the global FST distribution closely follows an exponential distribution. Third, although the overall FST distribution is similarly shaped (inverse J), FST distributions varies markedly by allele frequency when divided into non-overlapping groups by allele frequency range. Because the mean allele frequency is a crude indicator of allele age, these distributions mark the time-dependent change in genetic differentiation. Finally, the change in mean-FST of these groups is linear in allele frequency. These results suggest that investigating the extremes of the FST distribution for each allele frequency group is more efficient for detecting selection. Consequently, we demonstrate that such extreme SNPs are more clustered along the chromosomes than expected from linkage disequilibrium for each allele frequency group. These genomic regions are therefore likely candidates for natural selection
    • …
    corecore