459 research outputs found
The Sequence Alignment/Map format and SAMtools
Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments
The variant call format and VCFtools
Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API
A standard variation file format for human genome sequences
Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology website and from an Amazon elastic block storage (EBS) snapshot for use in Amazon's EC2 cloud computing environment
Extending reference assembly models
The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required
Potential health impacts of heavy metals on HIV-infected population in USA.
Noninfectious comorbidities such as cardiovascular diseases have become increasingly prevalent and occur earlier in life in persons with HIV infection. Despite the emerging body of literature linking environmental exposures to chronic disease outcomes in the general population, the impacts of environmental exposures have received little attention in HIV-infected population. The aim of this study is to investigate whether individuals living with HIV have elevated prevalence of heavy metals compared to non-HIV infected individuals in United States. We used the National Health and Nutrition Examination Survey (NHANES) 2003-2010 to compare exposures to heavy metals including cadmium, lead, and total mercury in HIV infected and non-HIV infected subjects. In this cross-sectional study, we found that HIV-infected individuals had higher concentrations of all heavy metals than the non-HIV infected group. In a multivariate linear regression model, HIV status was significantly associated with increased blood cadmium (p=0.03) after adjusting for age, sex, race, education, poverty income ratio, and smoking. However, HIV status was not statistically associated with lead or mercury levels after adjusting for the same covariates. Our findings suggest that HIV-infected patients might be significantly more exposed to cadmium compared to non-HIV infected individuals which could contribute to higher prevalence of chronic diseases among HIV-infected subjects. Further research is warranted to identify sources of exposure and to understand more about specific health outcomes
Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle
Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief (“Chief”) and his son Walkway Chief Mark (“Mark”), each accounting for ∼7% of all current genomes. We aligned 20.5 Gbp (∼7.3× coverage) and 37.9 Gbp (∼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using ∼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief’s DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor’s alleles that have been subjected to artificial selection
Identification and characterisation of novel SNP markers in Atlantic cod: Evidence for directional selection
<p>Abstract</p> <p>Background</p> <p>The Atlantic cod (<it>Gadus morhua</it>) is a groundfish of great economic value in fisheries and an emerging species in aquaculture. Genetic markers are needed to identify wild stocks in order to ensure sustainable management, and for marker-assisted selection and pedigree determination in aquaculture. Here, we report on the development and evaluation of a large number of Single Nucleotide Polymorphism (SNP) markers from the alignment of Expressed Sequence Tag (EST) sequences in Atlantic cod. We also present basic population parameters of the SNPs in samples of North-East Arctic cod and Norwegian coastal cod obtained from three different localities, and test for SNPs that may have been targeted by natural selection.</p> <p>Results</p> <p>A total of 17,056 EST sequences were used to find 724 putative SNPs, from which 318 segregating SNPs were isolated. The SNPs were tested on Atlantic cod from four different sites, comprising both North-East Arctic cod (NEAC) and Norwegian coastal cod (NCC). The average heterozygosity of the SNPs was 0.25 and the average minor allele frequency was 0.18. <it>F</it><sub><it>ST </it></sub>values were highly variable, with the majority of SNPs displaying very little differentiation while others had <it>F</it><sub><it>ST </it></sub>values as high as 0.83. The <it>F</it><sub><it>ST </it></sub>values of 29 SNPs were found to be larger than expected under a strictly neutral model, suggesting that these loci are, or have been, influenced by natural selection. For the majority of these outlier SNPs, allele frequencies in a northern sample of NCC were intermediate between allele frequencies in a southern sample of NCC and a sample of NEAC, indicating a cline in allele frequencies similar to that found at the Pantophysin I locus.</p> <p>Conclusion</p> <p>The SNP markers presented here are powerful tools for future genetics work related to management and aquaculture. In particular, some SNPs exhibiting high levels of population divergence have potential to significantly enhance studies on the population structure of Atlantic cod.</p
- …