Search CORE

459 research outputs found

The Sequence Alignment/Map format and SAMtools

Author: A. Wysoker
B. Handsaker
G. Abecasis
G. Marth
H. Li
J. Ruan
Langmead
Mardis
N. Homer
R. Durbin
T. Fennell
Publication venue: Oxford University Press
Publication date: 30/01/2013
Field of study

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments

CiteSeerX

Crossref

Harvard University - DASH

PubMed Central

The variant call format and VCFtools

Author: A. Auton
C. A. Albers
Durbin
E. Banks
G. Abecasis
G. Lunter
G. McVean
G. T. Marth
M. A. DePristo
P. Danecek
R. Durbin
R. E. Handsaker
S. T. Sherry
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API

Oxford University Research Archive

A standard variation file format for human genome sequences

Author: Batchelor Colin
Cunningham Fiona
Eilbeck Karen
Flicek Paul
Marth Gabor T
Moore Barry
Reese Martin G
Salas Fidel
Stein Lincoln
Yandell Mark
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology website and from an Amazon elastic block storage (EBS) snapshot for use in Amazon's EC2 cloud computing environment

Crossref

Springer - Publisher Connector

PubMed Central

Extending reference assembly models

Author: Aken B.
Chin C. S.
Church D. M.
Durbin R.
Flicek P.
Herrero J.
Hoffman M. M.
Kitts P. A.
Marth G. T.
Mendoza M. L. Z.
Quinlan A. R.
Schatz M. C.
Schneider V. A.
Steinberg K. M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required

Crossref

Cold Spring Harbor Laboratory Institutional Repository

University of Birmingham Research Portal

UCL Discovery

Copenhagen University Research Information System

Digital Commons@Becker

PubMed Central

Potential health impacts of heavy metals on HIV-infected population in USA.

Author: A American Lung
A Tabib
A Vallet-Pichard
Amy B. Dailey
AN Phillips
B Messner
B Vigneshkumar
C Smith
CD Poirier
D Ibrahim
DS Kim
E Losina
E Marth
E Vartiainen
EM Alissa
Evelyn O. Talbott
F Camus
G Barbaro
G Brito
GA Kyerematen
Greg Kearney
GS Young
H Garg
HI Afridi
Hui Hu
IM Ozturk
J Sorvari
Jaymie Meliker
JE Sackoff
JM Miro
KA Perkins
KM Harrison
KR Mahaffey
L Barregard
L Calderon-Garciduenas
L Järup
M Jones
M Pölkki
M Tellez-Plaza
M Tellez-Plaza
MA Fox
MB Rabinowitz
MG Lee
MJ Jarvis
MS Lee
NV Stepanova
O Laurent
P Brown
P Piot
PW Hunt
RA Chaisemartin
RJ Shaw
Robert L. Cook
RS Mijal
RT Gandhi
SG Deeks
SS Moon
SY Rhee
T van Ooik
TB Griffin
USC Bureau
VP Chashchin
Xiaohui Xu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Noninfectious comorbidities such as cardiovascular diseases have become increasingly prevalent and occur earlier in life in persons with HIV infection. Despite the emerging body of literature linking environmental exposures to chronic disease outcomes in the general population, the impacts of environmental exposures have received little attention in HIV-infected population. The aim of this study is to investigate whether individuals living with HIV have elevated prevalence of heavy metals compared to non-HIV infected individuals in United States. We used the National Health and Nutrition Examination Survey (NHANES) 2003-2010 to compare exposures to heavy metals including cadmium, lead, and total mercury in HIV infected and non-HIV infected subjects. In this cross-sectional study, we found that HIV-infected individuals had higher concentrations of all heavy metals than the non-HIV infected group. In a multivariate linear regression model, HIV status was significantly associated with increased blood cadmium (p=0.03) after adjusting for age, sex, race, education, poverty income ratio, and smoking. However, HIV status was not statistically associated with lead or mercury levels after adjusting for the same covariates. Our findings suggest that HIV-infected patients might be significantly more exposed to cadmium compared to non-HIV infected individuals which could contribute to higher prevalence of chronic diseases among HIV-infected subjects. Further research is warranted to identify sources of exposure and to understand more about specific health outcomes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

The University of North Carolina at Greensboro

Gettysburg College

ScholarShip

Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle

Author: A. G. Hernandez
Altshuler
B. J. Hayes
C. L. Wright
D. M. Larkin
Frazer
Goddard
Green
H. A. Lewin
H. D. Daetwyler
I. M. Macleod
J. E. McCague
J. Thimmapuram
Kong
L. A. Hetrick
L. Boucek
Lasky-Su
Leitner
Levy
Liu
M. Cohen-Zinder
M. E. Goddard
M. R. Band
Margulies
Marth
Matukumalli
Montesano
Norman
Oltenacu
Qanbari
Quinlan
Rincon
Ron
S. L. Bachman
T. T. Harkins
T. V. Akraiko
The Bovine Genome Sequencing and Analysis Consorti
The Bovine HapMap Consortium
Vergara
Vierhout
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 23/04/2012
Field of study

Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief (“Chief”) and his son Walkway Chief Mark (“Mark”), each accounting for ∼7% of all current genomes. We aligned 20.5 Gbp (∼7.3× coverage) and 37.9 Gbp (∼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using ∼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief’s DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor’s alleles that have been subjected to artificial selection

Crossref

Aberystwyth Research Portal

PubMed Central

University of Queensland eSpace

Identification and characterisation of novel SNP markers in Atlantic cod: Evidence for directional selection

Author: B Ewing
B Hayes
Ben Hayes
D Gordon
D Møller
D Møller
D Møller
E Árnason
EE Nielsen
Frank Nilsen
G Dahle
GH Pogson
H Kim
ICES
J Mork
J Mork
J Stenvik
J Stenvik
JI Westgaard
JI Westgaard
K Jørstad
K Sick
K Tang
KB Jakobsdóttir
Kjersti T Fjalestad
KM Miller
KT Fjalestad
L Excoffier
M Cargill
M Delghandi
M Kurlansky
MA Beaumont
MA Lee
Madjid Delghandi
MS Wesmajervi
Paul R Berg
S Fevolden
Sigbjørn Lien
Svein-Erik Fevolden
T Marth G
TH Sarvas
TH Skarstein
Thomas Moen
V Christensen
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The Atlantic cod (<it>Gadus morhua</it>) is a groundfish of great economic value in fisheries and an emerging species in aquaculture. Genetic markers are needed to identify wild stocks in order to ensure sustainable management, and for marker-assisted selection and pedigree determination in aquaculture. Here, we report on the development and evaluation of a large number of Single Nucleotide Polymorphism (SNP) markers from the alignment of Expressed Sequence Tag (EST) sequences in Atlantic cod. We also present basic population parameters of the SNPs in samples of North-East Arctic cod and Norwegian coastal cod obtained from three different localities, and test for SNPs that may have been targeted by natural selection. Results A total of 17,056 EST sequences were used to find 724 putative SNPs, from which 318 segregating SNPs were isolated. The SNPs were tested on Atlantic cod from four different sites, comprising both North-East Arctic cod (NEAC) and Norwegian coastal cod (NCC). The average heterozygosity of the SNPs was 0.25 and the average minor allele frequency was 0.18. <it>F</it><it>ST </it>values were highly variable, with the majority of SNPs displaying very little differentiation while others had <it>F</it><it>ST </it>values as high as 0.83. The <it>F</it><it>ST </it>values of 29 SNPs were found to be larger than expected under a strictly neutral model, suggesting that these loci are, or have been, influenced by natural selection. For the majority of these outlier SNPs, allele frequencies in a northern sample of NCC were intermediate between allele frequencies in a southern sample of NCC and a sample of NEAC, indicating a cline in allele frequencies similar to that found at the Pantophysin I locus. Conclusion The SNP markers presented here are powerful tools for future genetics work related to management and aquaculture. In particular, some SNPs exhibiting high levels of population divergence have potential to significantly enhance studies on the population structure of Atlantic cod.</p

University of Bergen

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

University of Queensland eSpace