Search CORE

1,284 research outputs found

Copy number variants and selective sweeps in natural populations of the house mouse (Mus musculus domesticus)

Author: Axelsson
Bryk
Carter
Cooper
Cucchi
Cutler
Didion
Dieringer
Egan
Ellegren
Faircloth
Fare
Feuk
Gentleman
Gonzalez
Graubert
Hastings
Henrichsen
Ihle
Karolchik
Lee
Perry
Pozhitkov
Redon
Rozen
SchlÃ¶tterer
SchlÃ¶tterer
Staubach
Stranger
Teschke
Wang
Wang
Williams
Wineinger
Yalcin
Yang
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Copy–number variants (CNVs) may play an important role in early adaptations, potentially facilitating rapid divergence of populations. We describe an approach to study this question by investigating CNVs present in natural populations of mice in the early stages of divergence and their involvement in selective sweeps. We have analyzed individuals from two recently diverged natural populations of the house mouse (Mus musculus domesticus) from Germany and France using custom, high–density, comparative genome hybridization arrays (CGH) that covered almost 164 Mb and 2444 genes. One thousand eight hundred and sixty one of those genes we previously identified as differentially expressed between these populations, while the expression of the remaining genes was invariant. In total, we identified 1868 CNVs across all 10 samples, 200 bp to 600 kb in size and affecting 424 genic regions. Roughly two thirds of all CNVs found were deletions. We found no enrichment of CNVs among the differentially expressed genes between the populations compared to the invariant ones, nor any meaningful correlation between CNVs and gene expression changes. Among the CNV genes, we found cellular component gene ontology categories of the synapse overrepresented among all the 2444 genes tested. To investigate potential adaptive significance of the CNV regions, we selected six that showed large differences in frequency of CNVs between the two populations and analyzed variation in at least two microsatellites surrounding the loci in a sample of 46 unrelated animals from the same populations collected in field trappings. We identified two loci with large differences in microsatellite heterozygosity (Sfi1 and Glo1/Dnahc8 regions) and one locus with low variation across the populations (Cmah), thus suggesting that these genomic regions might have recently undergone selective sweeps. Interestingly, the Glo1 CNV has previously been implicated in anxiety–like behavior in mice, suggesting a differential evolution of a behavioral trai

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

MPG.PuRe

University of Huddersfield Repository

Spectral classification of short numerical exon and intron sequences

Author: Benjamin YM Kwan
D Blackenberg
D Karolchik
Hon Keung Kwan
J Goecks
Jennifer YY Kwan
JYY Kwan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

This research presents three new numerical representations for classifying short exon and intron sequences using discrete Fourier transform period-3 value. Based on the human genome, results indicate that the Complex Twin-Pair representation is attractive compared with other numerical representations and the approach has potential applications in genome annotation and read mapping

Crossref

Springer - Publisher Connector

PubMed Central

The UCSC Proteome Browser

Author: Diekhans Mark
Haussler David
Hsu Fan
Karolchik Donna
Kent W. James
Kuhn Robert M.
Pringle Tom H.
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

The University of California Santa Cruz (UCSC) Proteome Browser provides a wealth of protein information presented in graphical images and with links to other protein-related Internet sites. The Proteome Browser is tightly integrated with the UCSC Genome Browser. For the first time, Genome Browser users have both the genome and proteome worlds at their fingertips simultaneously. The Proteome Browser displays tracks of protein and genomic sequences, exon structure, polarity, hydrophobicity, locations of cysteine and glycosylation potential, Superfamily domains and amino acids that deviate from normal abundance. Histograms show genome-wide distribution of protein properties, including isoelectric point, molecular weight, number of exons, InterPro domains and cysteine locations, together with specific property values of the selected protein. The Proteome Browser also provides links to gene annotations in the Genome Browser, the Known Genes details page and the Gene Sorter; domain information from Superfamily, InterPro and Pfam; three-dimensional structures at the Protein Data Bank and ModBase; and pathway data at KEGG, BioCarta/CGAP and BioCyc. As of August 2004, the Proteome Browser is available for human, mouse and rat proteomes. The browser may be accessed from any Known Genes details page of the Genome Browser at http://genome.ucsc.edu. A user's guide is also available on this website

CiteSeerX

Crossref

PubMed Central

Aubergene - a sensitive genome alignment tool.

Author: Arslan
Heger
J. Heringa
Karolchik
Kellis
Miller
Morgenstern
Murphy
Notredame
Park
R. Szklarczyk
Thomas
Vingron
Waterston
Ye
Zhang
Publication venue
Publication date: 01/01/2006
Field of study

Motivation: The accumulation of genome sequences will only accelerate in the coming years. We aim to use this abundance of data to improve the quality of genomic alignments and devise a method which is capable of detecting regions evolving under weak or no evolutionary constraints. Results: We describe a genome alignment program AuberGene, which explores the idea of transitivity of local alignments. Assessment of the program was done based on a 2 Mbp genomic region containing the CFTR gene of 13 species. In this region, we can identify 53% of human sequence sharing common ancestry with mouse, as compared with 44% found using the usual pairwise alignment. Between human and tetraodon 93 orthologous exons are found, as compared with 77 detected by the pairwise human-tetraodon comparison. AuberGene allows the user to (1) identify distant, previously undetected, conserved orthogonal regions such as ORFs or regulatory regions; (2) identify neutrally evolving regions in related species which are often overlooked by other alignment programs; (3) recognize false orthologous genomic regions. The increased sensitivity of the method is not obtained at the cost of reduced specificity. Our results suggest that, over the CFTR region, human shares 10% more sequence with mouse than previously thought (∼50%, instead of 40% found with the pairwise alignment). © 2006 Oxford University Press

Crossref

VU Research Portal

TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates

Author: Asaf Levy
Biemont
Borchert
Callinan
Clark
Consortium
Dagan
Deininger
Deininger
Gasteiger
Giardine
Gil Ast
Griffiths-Jones
Han
Hedges
Houwing
Johnson
Jordan
Jurka
Karolchik
Kent
Kim
Kim
Kuhn
Lander
Lev-Maor
Lippman
Lorenc
Makalowski
Martignetti
McKusick
Morgan
Noa Sela
Pasyukova
Piriyapongsa
Pruitt
Sayah
Sela
Smalheiser
Smalheiser
Sorek
Sorek
Thornburg
Waterston
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/11/2008
Field of study

Transposed elements (TEs) are mobile genetic sequences. During the evolution of eukaryotes TEs were inserted into active protein-coding genes, affecting gene structure, expression and splicing patterns, and protein sequences. Genomic insertions of TEs also led to creation and expression of new functional non-coding RNAs such as micro- RNAs. We have constructed the TranspoGene database, which covers TEs located inside proteincoding genes of seven species: human, mouse, chicken, zebrafish, fruit fly, nematode and sea squirt. TEs were classified according to location within the gene: proximal promoter TEs, exonized TEs (insertion within an intron that led to exon creation), exonic TEs (insertion into an existing exon) or intronic TEs. TranspoGene contains information regarding specific type and family of the TEs, genomic and mRNA location, sequence, supporting transcript accession and alignment to the TE consensus sequence. The database also contains host gene specific data: gene name, genomic location, Swiss-Prot and RefSeq accessions, diseases associated with the gene and splicing pattern. In addition, we created microTranspoGene: a database of human, mouse, zebrafish and nematode TEderived microRNAs. The TranspoGene and micro- TranspoGene databases can be used by researchers interested in the effect of TE insertion on the eukaryotic transcriptome

arXiv.org e-Print Archive

Crossref

PubMed Central

Integrating diverse databases into an unified analysis framework: a Galaxy approach

Author: A. Nekrutenko
Blankenberg
Bock
D. Blankenberg
G. Von Kuster
Giardine
Hawkins
J. Taylor
Karolchik
Lyne
N. Coraor
Publication venue: Oxford University Press
Publication date
Field of study

Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources

Crossref

PubMed Central

MicroRNA enrichment among short ‘ultraconserved’ sequences in insects

Author: Ambros
Ambros
AMBROS
Bartel
Berezikov
Boffelli
Bray
Brown
Brudno
Brudno
Cullen
Drysdale
Elnitski
Frazer
Grad
Griffiths-Jones
Hatfield
Havlak
Hillier
J. Miller
Karolchik
Karolchik
Kolbe
Lai
Lee
Lewis
Lu
Mattick
Mattick
Mattick
Miller
Ning
P. Havlak
Pasquinelli
Pasquinelli
Peng
Reinhart
Sandelin
Stajich
Stone
T. Tran
Thomas
Voss
Weber
Zdobnov
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

MicroRNAs are short (∼22 nt) regulatory RNA molecules that play key roles in metazoan development and have been implicated in human disease. First discovered in Caenorhabditis elegans, over 2500 microRNAs have been isolated in metazoans and plants; it has been estimated that there may be more than a thousand microRNA genes in the human genome alone. Motivated by the experimental observation of strong conservation of the microRNA let-7 among nearly all metazoans, we developed a novel methodology to characterize the class of such strongly conserved sequences: we identified a non-redundant set of all sequences 20 to 29 bases in length that are shared among three insects: fly, bee and mosquito. Among the few hundred sequences greater than 20 bases in length are close to 40% of the 78 confirmed fly microRNAs, along with other non-coding RNAs and coding sequence

CiteSeerX

Crossref

PubMed Central

HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants

Author: Adzhubei
Andersen
Berger
Chen
Davydov
Durbin
Ernst
Han
Karolchik
L. D. Ward
Lander
M. Kellis
Matys
McCarthy
Ng
Nicolae
Pohlmann
Sherry
Touzet
Yue
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2011
Field of study

The resolution of genome-wide association studies (GWAS) is limited by the linkage disequilibrium (LD) structure of the population being studied. Selecting the most likely causal variants within an LD block is relatively straightforward within coding sequence, but is more difficult when all variants are intergenic. Predicting functional non-coding sequence has been recently facilitated by the availability of conservation and epigenomic information. We present HaploReg, a tool for exploring annotations of the non-coding genome among the results of published GWAS or novel sets of variants. Using LD information from the 1000 Genomes Project, linked SNPs and small indels can be visualized along with their predicted chromatin state in nine cell types, conservation across mammals and their effect on regulatory motifs. Sets of SNPs, such as those resulting from GWAS, are analyzed for an enrichment of cell type-specific enhancers. HaploReg will be useful to researchers developing mechanistic hypotheses of the impact of non-coding variants on clinical phenotypes and normal variation. The HaploReg database is available at http://compbio.mit.edu/HaploReg.National Institutes of Health (U.S.) (R01-HG004037)National Institutes of Health (U.S.) (RC1-HG005334)National Science Foundation (U.S.) (HG005334

DSpace@MIT

Crossref

The UCSC Genome Browser database: update 2010

Author: A. Pohl
A. S. Hinrichs
A. S. Zweig
Austin
B. Giardine
B. J. Raney
B. Rhead
Berman
Blanchette
D. Haussler
D. Karolchik
F. Hsu
Feuk
G. P. Barber
H. Clawson
Hsu
Iafrate
J. Hillman-Jackson
Jain
K. E. Smith
K. Learned
K. R. Rosenbloom
Kaiser
Karolchik
Karolchik
Kent
L. R. Meyer
M. Diekhans
M. Pheasant
Nord
P. A. Fujita
Pettersen
R. A. Harte
R. M. Kuhn
Sherry
T. R. Dreszer
The ENCODE Project Consortium
The MGC Project Team
W. J. Kent
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools

CiteSeerX

Crossref

PubMed Central

University of Queensland eSpace

BigWig and BigBed: enabling browsing of large distributed datasets

Author: A. S. Hinrichs
A. S. Zweig
Alekseyenko
D. Karolchik
G. Barber
Guttman
Kent
Kent
Li
Rhead
W. J. Kent
Publication venue: Oxford University Press
Publication date
Field of study

Summary: BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets

Crossref

PubMed Central