76 research outputs found

    AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants

    Get PDF
    Single nucleotide polymorphisms (SNPs) may be considered the ultimate genetic marker as they represent the finest resolution of a DNA sequence (a single nucleotide), are generally abundant in populations and have a low mutation rate. Analysis of assembled EST sequence data provides a cost-effective means to identify large numbers of SNPs associated with functional genes. We have developed an integrated SNP discovery pipeline, which identifies SNPs from assembled EST sequences. The results are maintained in a custom relational database along with EST source and annotation information. The current database hosts data for the important crops rice, barley and Brassica. Users may rapidly identify polymorphic sequences of interest through BLAST sequence comparison, keyword searches of annotations derived from UniRef90 and GenBank comparisons, GO annotations or in genes corresponding to syntenic regions of reference genomes. In addition, SNPs between specific varieties may be identified for targeted mapping and association studies. SNPs are viewed using a user-friendly graphical interface. The database is freely accessible at http://autosnpdb.qfab.org.au/

    Recovering complete and draft population genomes from metagenome datasets

    Get PDF
    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

    J Acquir Immune Defic Syndr

    Get PDF
    BackgroundCervical cancer is a major public health problem in resource-limited settings, particularly among HIV-infected women. Given the challenges of cytology-based approaches, the efficiency of new screening programs need to be assessed.SettingCommunity and hospital-based clinics in Gaborone, Botswana.ObjectiveTo determine the feasibility, and efficiency of the \u201cSee and Treat\u201d approach using Visual Inspection Acetic Acid (VIA) and Enhanced Digital Imaging (EDI) for cervical cancer prevention in HIV-infected women.MethodsA two-tier community-based cervical cancer prevention program was implemented. HIV-infected women were screened by nurses at the community using the VIA/EDI approach. Low-grade lesions were treated with cryotherapy on the same visit.ResultsFrom March 2009 through January 2011, 2,175 patients were screened for cervical cancer at our community-based clinic. 253 (11.6%) were found to have low-grade lesions and received same-day cryotherapy. 1,347 (61.9%) women were considered to have a normal examination and 575 (27.3%) were referred for further evaluation and treatment. Of the 1,347 women initially considered to have normal exams, 267 (19.8%) were recalled based on weekly quality control assessments. 210 (78.6%) of the 267 recalled women and 499 (86.8%) of the 575 referred women were seen at the referral clinic. Of these 709 women, 506 (71.4%) required additional treatment. Overall, 264 CIN stage 2 or 3 were identified and treated, and six micro-invasive cancers identified were referred for further management.ConclusionsOur \u201cSee and Treat\u201d cervical cancer prevention program using the VIA/EDI approach is a feasible, high-output and high-efficiency program, worthy of considering as an additional cervical cancer screening method in Botswana, especially for women with limited access to the current cytology-based screening services.20122014-01-08T00:00:00ZP30 AI045008/AI/NIAID NIH HHS/United StatesU2G PS001949/PS/NCHHSTP CDC HHS/United States1U2GPS001949/PHS HHS/United StatesIP30 AI 45008/AI/NIAID NIH HHS/United States22134146PMC388408

    ICoVeR - an interactive visualization tool for verification and refinement of metagenomic bins.

    Get PDF
    BACKGROUND: Recent advances in high-throughput sequencing allow for much deeper exploitation of natural and engineered microbial communities, and to unravel so-called "microbial dark matter" (microbes that until now have evaded cultivation). Metagenomic analyses result in a large number of genomic fragments (contigs) that need to be grouped (binned) in order to reconstruct draft microbial genomes. While several contig binning algorithms have been developed in the past 2 years, they often lack consensus. Furthermore, these software tools typically lack a provision for the visualization of data and bin characteristics. RESULTS: We present ICoVeR, the Interactive Contig-bin Verification and Refinement tool, which allows the visualization of genome bins. More specifically, ICoVeR allows curation of bin assignments based on multiple binning algorithms. Its visualization window is composed of two connected and interactive main views, including a parallel coordinates view and a dimensionality reduction plot. To demonstrate ICoVeR's utility, we used it to refine disparate genome bins automatically generated using MetaBAT, CONCOCT and MyCC for an anaerobic digestion metagenomic (AD microbiome) dataset. Out of 31 refined genome bins, 23 were characterized with higher completeness and lower contamination in comparison to their respective, automatically generated, genome bins. Additionally, to benchmark ICoVeR against a previously validated dataset, we used Sharon's dataset representing an infant gut metagenome. CONCLUSIONS: ICoVeR is an open source software package that allows curation of disparate genome bins generated with automatic binning algorithms. It is freely available under the GPLv3 license at https://git.list.lu/eScience/ICoVeR . The data management and analytical functions of ICoVeR are implemented in R, therefore the software can be easily installed on any system for which R is available. Installation and usage guide together with the example files ready to be visualized are also provided via the project wiki. ICoVeR running instance preloaded with AD microbiome and Sharon's datasets can be accessed via the website

    Multiple Data Analyses and Statistical Approaches for Analyzing Data from Metagenomic Studies and Clinical Trials

    Get PDF
    Metagenomics, also known as environmental genomics, is the study of the genomic content of a sample of organisms (microbes) obtained from a common habitat. Metagenomics and other “omics” disciplines have captured the attention of researchers for several decades. The effect of microbes in our body is a relevant concern for health studies. There are plenty of studies using metagenomics which examine microorganisms that inhabit niches in the human body, sometimes causing disease, and are often correlated with multiple treatment conditions. No matter from which environment it comes, the analyses are often aimed at determining either the presence or absence of specific species of interest in a given metagenome or comparing the biological diversity and the functional activity of a wider range of microorganisms within their communities. The importance increases for comparison within different environments such as multiple patients with different conditions, multiple drugs, and multiple time points of same treatment or same patient. Thus, no matter how many hypotheses we have, we need a good understanding of genomics, bioinformatics, and statistics to work together to analyze and interpret these datasets in a meaningful way. This chapter provides an overview of different data analyses and statistical approaches (with example scenarios) to analyze metagenomics samples from different medical projects or clinical trials

    A Single Molecule Scaffold for the Maize Genome

    Get PDF
    About 85% of the maize genome consists of highly repetitive sequences that are interspersed by low-copy, gene-coding sequences. The maize community has dealt with this genomic complexity by the construction of an integrated genetic and physical map (iMap), but this resource alone was not sufficient for ensuring the quality of the current sequence build. For this purpose, we constructed a genome-wide, high-resolution optical map of the maize inbred line B73 genome containing >91,000 restriction sites (averaging 1 site/∼23 kb) accrued from mapping genomic DNA molecules. Our optical map comprises 66 contigs, averaging 31.88 Mb in size and spanning 91.5% (2,103.93 Mb/∼2,300 Mb) of the maize genome. A new algorithm was created that considered both optical map and unfinished BAC sequence data for placing 60/66 (2,032.42 Mb) optical map contigs onto the maize iMap. The alignment of optical maps against numerous data sources yielded comprehensive results that proved revealing and productive. For example, gaps were uncovered and characterized within the iMap, the FPC (fingerprinted contigs) map, and the chromosome-wide pseudomolecules. Such alignments also suggested amended placements of FPC contigs on the maize genetic map and proactively guided the assembly of chromosome-wide pseudomolecules, especially within complex genomic regions. Lastly, we think that the full integration of B73 optical maps with the maize iMap would greatly facilitate maize sequence finishing efforts that would make it a valuable reference for comparative studies among cereals, or other maize inbred lines and cultivars

    Genomics-assisted breeding in four major pulse crops of developing countries: present status and prospects

    Get PDF
    The global population is continuously increasing and is expected to reach nine billion by 2050. This huge population pressure will lead to severe shortage of food, natural resources and arable land. Such an alarming situation is most likely to arise in developing countries due to increase in the proportion of people suffering from protein and micronutrient malnutrition. Pulses being a primary and affordable source of proteins and minerals play a key role in alleviating the protein calorie malnutrition, micronutrient deficiencies and other undernourishment-related issues. Additionally, pulses are a vital source of livelihood generation for millions of resource-poor farmers practising agriculture in the semi-arid and sub-tropical regions. Limited success achieved through conventional breeding so far in most of the pulse crops will not be enough to feed the ever increasing population. In this context, genomics-assisted breeding (GAB) holds promise in enhancing the genetic gains. Though pulses have long been considered as orphan crops, recent advances in the area of pulse genomics are noteworthy, e.g. discovery of genome-wide genetic markers, high-throughput genotyping and sequencing platforms, high-density genetic linkage/QTL maps and, more importantly, the availability of whole-genome sequence. With genome sequence in hand, there is a great scope to apply genome-wide methods for trait mapping using association studies and to choose desirable genotypes via genomic selection. It is anticipated that GAB will speed up the progress of genetic improvement of pulses, leading to the rapid development of cultivars with higher yield, enhanced stress tolerance and wider adaptability

    Population Genomics of Parallel Adaptation in Threespine Stickleback using Sequenced RAD Tags

    Get PDF
    Next-generation sequencing technology provides novel opportunities for gathering genome-scale sequence data in natural populations, laying the empirical foundation for the evolving field of population genomics. Here we conducted a genome scan of nucleotide diversity and differentiation in natural populations of threespine stickleback (Gasterosteus aculeatus). We used Illumina-sequenced RAD tags to identify and type over 45,000 single nucleotide polymorphisms (SNPs) in each of 100 individuals from two oceanic and three freshwater populations. Overall estimates of genetic diversity and differentiation among populations confirm the biogeographic hypothesis that large panmictic oceanic populations have repeatedly given rise to phenotypically divergent freshwater populations. Genomic regions exhibiting signatures of both balancing and divergent selection were remarkably consistent across multiple, independently derived populations, indicating that replicate parallel phenotypic evolution in stickleback may be occurring through extensive, parallel genetic evolution at a genome-wide scale. Some of these genomic regions co-localize with previously identified QTL for stickleback phenotypic variation identified using laboratory mapping crosses. In addition, we have identified several novel regions showing parallel differentiation across independent populations. Annotation of these regions revealed numerous genes that are candidates for stickleback phenotypic evolution and will form the basis of future genetic analyses in this and other organisms. This study represents the first high-density SNP–based genome scan of genetic diversity and differentiation for populations of threespine stickleback in the wild. These data illustrate the complementary nature of laboratory crosses and population genomic scans by confirming the adaptive significance of previously identified genomic regions, elucidating the particular evolutionary and demographic history of such regions in natural populations, and identifying new genomic regions and candidate genes of evolutionary significance

    Novel SNP Discovery in African Buffalo, Syncerus caffer, using high-throughput Sequencing

    Get PDF
    The African buffalo, Syncerus caffer, is one of the most abundant and ecologically important species of megafauna in the savannah ecosystem. It is an important prey species, as well as a host for a vast array of nematodes, pathogens and infectious diseases, such as bovine tuberculosis and corridor disease. Large-scale SNP discovery in this species would greatly facilitate further research into the area of host genetics and disease susceptibility, as well as provide a wealth of sequence information for other conservation and genomics studies. We sequenced pools of Cape buffalo DNA from a total of 9 animals, on an ABI SOLiD4 sequencer. The resulting short reads were mapped to the UMD3.1 Bos taurus genome assembly using both BWA and Bowtie software packages. A mean depth of 2.76 coverage over the mapped regions was obtained. Btau4 gene annotation was added to all SNPs identified within gene regions. Bowtie and BWA identified a maximum of 2,222,665 and 276,847 SNPs within the buffalo respectively, depending on analysis method. A panel of 173 SNPs was validated by fluorescent genotyping in 87 individuals. 27 SNPs failed to amplify, and of the remaining 146 SNPs, 43–54 % of the Bowtie SNPs and 57–58 % of the BWA SNPs were confirmed as polymorphic. dN/dS ratios found no evidence of positive selection, and although there were genes that appeared to be under negative selection, these were more likely to be slowl
    corecore