Search CORE

16 research outputs found

FLYSNPdb: a high-density SNP database of Drosophila melanogaster

Author: Adams
Altschul
Berger
Celniker
Chen
Crosby
Doris Chen
Drysdale
Ewing
Ewing
Gordon
Hoskins
Jürg Berger
Lunter
Lunter
Marth
Martin
Michaela Fellner
Nairz
Olson
Rice
Roberts
Roberts
Rorth
Rozen
Takashi Suzuki
Teeter
Xu
Publication venue: Oxford University Press
Publication date
Field of study

FLYSNPdb provides high-resolution single nucleotide polymorphism (SNP) data of Drosophila melanogaster. The database currently contains 27 367 polymorphisms, including >3700 indels (insertions/deletions), covering all major chromsomes. These SNPs are clustered into 2238 markers, which are evenly distributed with an average density of one marker every 50.3 kb or 6.6 genes. SNPs were identified automatically, filtered for high quality and partly manually curated. The database provides detailed information on the SNP data including molecular and cytological locations (genome Releases 3–5), alleles of up to five commonly used laboratory stocks, flanking sequences, SNP marker amplification primers, quality scores and genotyping assays. Data specific for a certain region, particular stocks or a certain genome assembly version are easily retrievable through the interface of a publicly accessible website (http://flysnp.imp.ac.at/flysnpdb.php)

Crossref

PubMed Central

Relationship between amino acid composition and gene expression in the mouse genome

Abstract Background Codon bias is a phenomenon that refers to the differences in the frequencies of synonymous codons among different genes. In many organisms, natural selection is considered to be a cause of codon bias because codon usage in highly expressed genes is biased toward optimal codons. Methods have previously been developed to predict the expression level of genes from their nucleotide sequences, which is based on the observation that synonymous codon usage shows an overall bias toward a few codons called major codons. However, the relationship between codon bias and gene expression level, as proposed by the translation-selection model, is less evident in mammals. Findings We investigated the correlations between the expression levels of 1,182 mouse genes and amino acid composition, as well as between gene expression and codon preference. We found that a weak but significant correlation exists between gene expression levels and amino acid composition in mouse. In total, less than 10% of variation of expression levels is explained by amino acid components. We found the effect of codon preference on gene expression was weaker than the effect of amino acid composition, because no significant correlations were observed with respect to codon preference. Conclusion These results suggest that it is difficult to predict expression level from amino acid components or from codon bias in mouse.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Investigating selection on viruses: a statistical alignment approach

Author: de Groot Saskia
Hein Jotun
Lunter Gerton
Mailund Thomas
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Two problems complicate the study of selection in viral genomes: Firstly, the presence of genes in overlapping reading frames implies that selection in one reading frame can bias our estimates of neutral mutation rates in another reading frame. Secondly, the high mutation rates we are likely to encounter complicate the inference of a reliable alignment of genomes. To address these issues, we develop a model that explicitly models selection in overlapping reading frames. We then integrate this model into a statistical alignment framework, enabling us to estimate selection while explicitly dealing with the uncertainty of individual alignments. We show that in this way we obtain un-biased selection parameters for different genomic regions of interest, and can improve in accuracy compared to using a fixed alignment. Results We run a series of simulation studies to gauge how well we do in selection estimation, especially in comparison to the use of a fixed alignment. We show that the standard practice of using a ClustalW alignment can lead to considerable biases and that estimation accuracy increases substantially when explicitly integrating over the uncertainty in inferred alignments. We even manage to compete favourably for general evolutionary distances with an alignment produced by GenAl. We subsequently run our method on HIV2 and Hepatitis B sequences. Conclusion We propose that marginalizing over all alignments, as opposed to using a fixed one, should be considered in any parametric inference from divergent sequence data for which the alignments are not known with certainty. Moreover, we discover in HIV2 that double coding regions appear to be under less stringent selection than single coding ones. Additionally, there appears to be evidence for differential selection, where one overlapping reading frame is under positive and the other under negative selection.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement

Author: Darling Aaron E.
Mau Bob
Perna Nicole T.
Publication venue
Publication date: 01/01/2009
Field of study

Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. We describe a method to align two or more genomes that have undergone large-scale recombination, particularly genomes that have undergone substantial amounts of gene gain and loss (gene flux). The method utilizes a novel alignment objective score, referred to as a sum-of-pairs breakpoint score. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The progressive genome alignment algorithm demonstrates markedly improved accuracy over previous approaches in situations where genomes have undergone realistic amounts of genome rearrangement, gene gain, loss, and duplication. We apply the progressive genome alignment algorithm to a set of 23 completely sequenced genomes from the genera Escherichia, Shigella, and Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content conserved among all taxa and total unique content of 15.2Mbp. We document substantial population-level variability among these organisms driven by homologous recombination, gene gain, and gene loss. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200

arXiv.org e-Print Archive

CiteSeerX

Sequence context affects the rate of short insertions and deletions in flies and primates

Author: Siggia Eric D
Tanay Amos
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Analysis of a large collection of short insertions and deletions in primates and flies shows that the rate of insertions or deletions of specific lengths can vary by more than 100 fold, depending on the surrounding sequence

Crossref

Springer - Publisher Connector

PubMed Central

Parameters for accurate genome alignment

Author: A Morgulis
A Morgulis
A Schwartz
A Stark
B Paten
CH Yuh
CN Dewey
D Gusfield
D Karolchik
D States
DA Pollard
E Kim
EH Margulies
F Chiaromonte
G Benson
G Lunter
G Lunter
I Holmes
J Ruan
J Wang
JC Wootton
JE Janecka
JO Kriegs
JT Reese
KD Pruitt
KM Wong
LA Newberg
LE Carvalho
M Brudno
M Hamada
Martin C Frith
MC Frith
Michiaki Hamada
MS Waterman
Paul Horton
PP Gardner
R Durbin
RC Friedman
RK Bradley
S Karlin
S Kumar
S Miyazawa
S Schwartz
S Sheetlin
SF Altschul
SF Altschul
TJ Treangen
W Huang
WJ Kent
WJ Kent
YK Yu
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed. Results We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases. Conclusions These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours <url>http://last.cbrc.jp/</url>.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios

Author: Belling Kirstine C.
Besenbacher Søren
Bolund Lars
Bork-Jensen Jette
Brunak Søren
Børglum Anders D.
Cheng Xiaofang
Chmura Piotr Jaroslaw
Dam-Als Thomas
Demontis Ditte
Dworzynski Piotr
Eiberg Hans
Flores Santa Cruz David
Friborg Rune M.
Gonzalez-Izarzugaza Jose Maria
Grove Jakob
Gupta Ramneek
Hansen Torben
Huang Shujia
Jun Wang
Kristiansen Karsten
Krogh Anders
Lescai Francesco
Li Ning
Li Shengting
Liu Hao
Liu Siyang
Lund Ole
Mailund Thomas
Pedersen Christian N. S.
Pedersen Oluf
Rao Junhua
Rapacki Kristoffer
Rasmussen Simon
Rubio García Arcadio
Rydza Emil Karol
Schierup Mikkel H
Sun Jihua
Sørensen John Damm
Sørensen Thorkild I. A.
Villesen Palle
Wang Ou
Westergaard David
Xu Ruiqi
Xu Xun
Yadav Rachita
Yadav Rachita
Ye Weijian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e−8 and 1.5e−9 per nucleotide per generation for SNVs and indels, respectively

Crossref

Copenhagen University Research Information System

PubMed Central

Online Research Database In Technology

Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

Author: A Ritz
AG Clark
AJ Iafrate
AR Quinlan
AR Quinlan
AV Zimin
AW Pang
Bo Thomsen
Bujie Zhan
C Alkan
C Spillane
C Xie
CA Albers
CA Heid
CG Elsik
Christian Bendixen
D Pushkarev
DA Wheeler
DF Conrad
DG Lemay
DJ de Koning
DM Larkin
DR Bentley
E Seroussi
EM Ibeagha-Awemu
ER Mardis
F Zhang
Frank Panitz
G Dennis Jr
G Lunter
GE Liu
GE Liu
GM Church
GP Consortium
GP Harhay
GT McVean
GT McVean
H Li
H Li
H Li
H Li
H Park
HB Fraser
J Eid
J Fadista
J Fadista
J Sebat
J Wang
Jakob Hedegaard
JC Dohm
JI Kim
João Fadista
JR Lupski
JS Bae
JW Drake
K Chen
K Wang
K Wong
K Ye
KJ McKernan
KU Mir
LA Hindorff
LK Matukumalli
LW Hillier
M Kirin
M Perez-Enciso
MA Taub
ME Goddard
ML Metzker
MW Nachman
O Harismendy
P Medvedev
P Stankiewicz
P Tong
PC Ng
PC Ng
R Kawahara-Miki
R Nielsen
R Redon
RA Cartwright
RA Gibbs
RE Mills
RL Tellam
S Levy
S Yoon
SC Schuster
SH Eck
SM Ahn
T Meuwissen
TH Meuwissen
V Ramensky
V Whan
V Yuzbasiyan-Gurkan
Y Erlich
Y Hou
Y Li
YS Ju
YS Ju
ZL Hu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central