Search CORE

139 research outputs found

Primique: automatic design of specific PCR primers for each sequence in a family

Author: J Fredslund
J Fredslund
Jakob Fredslund
JJ SantaLucia
Mette Lange
N le Novère
N Sugimoto
PMK Gordon
R Kalendar
S Rozen
SF Altschul
SJ Emrich
Publication venue: BioMed Central
Publication date: 01/10/2007
Field of study

Abstract Background In many contexts, researchers need specific primers for all sequences in a family such that each primer set amplifies only its target sequence and none of the others, e.g. to detect which transcription factor out of a family of very similar proteins that is present in a sample, or to design diagnostic assays for the identification of pathogen strains. Results This paper presents primique, a new graphical, user-friendly, fast, web-based tool which solves the problem: It designs specific primers for each sequence in an uploaded set. Further, a secondary set of sequences <it>not </it>to be amplified by any primer pair may be uploaded. Primers with high sequence similarity to non-target sequences are selected against. Lastly, the suggested primers may be checked against the National Center for Biotechnology Information databases for possible mis-priming. Conclusion Results are presented in interactive tables, and various primer properties are listed and displayed graphically. Any close match alignments can be displayed. Given 30 sequences, the running time of primique is about 20 seconds. primique can be reached via this web address: <url>http://cgi-www.daimi.au.dk/cgi-chili/primique/front.py</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Linked read technology for assembling large complex and polyploid genomes

Author: A Akintayo
A Akintayo
A Balu
A Salman-Minkov
Alina Ott
B Nystedt
C Del Fabbro
C Feuillet
C Liu
C Rao
Chao Liu
Cheng-Ting Yeh
Clifton L. Dalgard
CS Chin
DM Altshuler
DR Bentley
E Lieberman-Aiden
E Lyons
E Lyons
GXY Zheng
H Li
H Tang
HB Tang
Heng-Cheng Hu
HV Hunt
James C. Schnable
JL Bennetzen
JR MacDonald
JS Seo
L Coombe
Linjiang Wu
LJ Briggs
M Freeling
M Kubesova
MA Hamoud
ME Rasekh
MW Crepeau
MW Libbrecht
N Rodic
N Spies
NI Weisenfeld
P SanMiguel
Patrick S. Schnable
PS Schnable
RK Saxena
RS Baucom
RS Li
S Goodwin
S Renny-Byfield
S Sarkar
S Sarkar
SJ Emrich
SJ Emrich
SM Utturkar
Soumik Sarkar
TJ Treangen
Y Fu
Y Mostovoy
YN Jiao
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2018
Field of study

Background: Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. But it remains challenging to assemble short read data, particularly for large, complex and polyploid genomes. The linked read strategy has the potential to enhance the value of short reads for genome assembly because all reads originating from a single long molecule of DNA share a common barcode. However, the majority of studies to date that have employed linked reads were focused on human haplotype phasing and genome assembly. Results: Here we describe a de novo maize B73 genome assembly generated via linked read technology which contains ~ 172,000 scaffolds with an N50 of 89 kb that cover 50% of the genome. Based on comparisons to the B73 reference genome, 91% of linked read contigs are accurately assembled. Because it was possible to identify errors with \u3e 76% accuracy using machine learning, it may be possible to identify and potentially correct systematic errors. Complex polyploids represent one of the last grand challenges in genome assembly. Linked read technology was able to successfully resolve the two subgenomes of the recent allopolyploid, proso millet (Panicum miliaceum). Our assembly covers ~ 83% of the 1 Gb genome and consists of 30,819 scaffolds with an N50 of 912 kb. Conclusions: Our analysis provides a framework for future de novo genome assemblies using linked reads, and we suggest computational strategies that if implemented have the potential to further improve linked read assemblies, particularly for repetitive genomes

Crossref

DigitalCommons@University of Nebraska

Directory of Open Access Journals

FigShare

A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications

Author: A Palmenberg
C Kingsford
C Scholtissek
C Yu
Chenglong Yu
F Liu
F Murtagh
H Musto
H Nakashima
K Amano
K Carr
K Katoh
L Liu
L Wang
M Kullberg
MA Larkin
Mo Deng
NA Chuzhanova
PC FitzGerald
Qian Liang
RB Belshe
RC Edgar
RJ Garten
Rong L. He
S Karlin
S Kumar
S Yau
SJ Emrich
SM Waterman
SS-T Yau
Stephen S.-T. Yau
Sudhindra Gadagkar
SZ Raina
T Abe
T Ito
T Kamimura
V Shinde
W Ma
WM Brown
Z Kou
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Most existing methods for phylogenetic analysis involve developing an evolutionary model and then using some type of computational algorithm to perform multiple sequence alignment. There are two problems with this approach: (1) different evolutionary models can lead to different results, and (2) the computation time required for multiple alignments makes it impossible to analyse the phylogeny of a whole genome. This motivates us to create a new approach to characterize genetic sequences.To each DNA sequence, we associate a natural vector based on the distributions of nucleotides. This produces a one-to-one correspondence between the DNA sequence and its natural vector. We define the distance between two DNA sequences to be the distance between their associated natural vectors. This creates a genome space with a biological distance which makes global comparison of genomes with same topology possible. We use our proposed method to analyze the genomes of the new influenza A (H1N1) virus, human rhinoviruses (HRV) and mammalian mitochondrial. The result shows that a triple-reassortant swine virus circulating in North America and the Eurasian swine virus belong to the lineage of the influenza A (H1N1) virus. For the HRV and mammalian mitochondrial genomes, the results coincide with biologists' analyses.Our approach provides a powerful new tool for analyzing and annotating genomes and their phylogenetic relationships. Whole or partial genomes can be handled more easily and more quickly than using multiple alignment methods. Once a genome space has been constructed, it can be stored in a database. There is no need to reconstruct the genome space for subsequent applications, whereas in multiple alignment methods, realignment is needed to add new sequences. Furthermore, one can make a global comparison of all genomes simultaneously, which no other existing method can achieve

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Illinois at Chicago: UIC INDIGO (INtellectual property in DIGital form available online in an Open environment)

De Novo Transcriptome of Safflower and the Identification of Putative Genes for Oleosin and the Biosynthesis of Flavonoids

Author: A Conesa
A Maroufi
A Monnier
AJ Simkin
Anna Tramontano
C Wu
CJ Rudolph
DJ Lacey
E Cazzato
G Hrazdina
GJ van Eldik
H Wenping
Haiyan Li
J Jarosova
J Wu
J Ye
JC Vega-Arreguin
Jing Yang
Jinyu Wu
JS Roh
K Giannoulia
KJ Chung
Lili Guan
Na Yao
Nan Wang
NT Thao
Q Qiu
Q Tang
R Li
RM Siloto
S Akada
S Vandana
SJ Emrich
SY Ye
V Hemleben
V Katavic
W Pitakdantham
Xiaokun Li
Xiuming Liu
XZ Li
Y Mu
Y Zhang
Yanfang Wang
YJ Li
Yuanyuan Dong
Publication venue: Public Library of Science
Publication date: 21/02/2012
Field of study

Safflower (Carthamus tinctorius L.) is one of the most extensively used oil crops in the world. However, little is known about how its compounds are synthesized at the genetic level. In this study, Solexa-based deep sequencing on seed, leaf and petal of safflower produced a de novo transcriptome consisting of 153,769 unigenes. We annotated 82,916 of the unigenes with gene annotation and assigned functional terms and specific pathways to a subset of them. Metabolic pathway analysis revealed that 23 unigenes were predicted to be responsible for the biosynthesis of flavonoids and 8 were characterized as seed-specific oleosins. In addition, a large number of differentially expressed unigenes, for example, those annotated as participating in anthocyanin and chalcone synthesis, were predicted to be involved in flavonoid biosynthesis pathways. In conclusion, the de novo transcriptome investigation of the unique transcripts provided candidate gene resources for studying oleosin-coding genes and for investigating genes related to flavonoid biosynthesis and metabolism in safflower

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

PDXScholar (Portland State University)

FigShare

Sequencing, de novo annotation and analysis of the first Anguilla anguilla transcriptome: EeelBase opens new perspectives for the study of the critically endangered european eel

Author: Alessandro Coppe
APM Weber
B Chevreux
B Knights
CM Hale
E Kristiansson
E Meyer
ED Novaes
F Alagna
F Cheung
FD Guerrero
G Van den Thillart
GA Calin
Gregory E Maes
ICES
J Lu
J Yasuda
JC Vera
JM Pujolar
JM Pujolar
Jose Martin Pujolar
KR Elmer
L He
Lorenzo Zane
Louis Bernatchez
M Kanehisa
M Margulies
M Salem
Michael M Hansen
MJ Moore
MN Bainbridge
O Morozova
P Munk
P Nogueira
PA Zhulidov
Peter F Larsen
S Götz
S Renault
S Wang
SJ Emrich
SM Huse
Stefania Bortoluzzi
T Miyahara
T Wicker
TL Parchman
TT Torres
W Dekker
ZL Hu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Once highly abundant, the European eel (Anguilla anguilla L.; Anguillidae; Teleostei) is considered to be critically endangered and on the verge of extinction, as the stock has declined by 90-99% since the 1980s. Yet, the species is poorly characterized at molecular level with little sequence information available in public databases.\ud \ud Results: The first European eel transcriptome was obtained by 454 FLX Titanium sequencing of a normalized cDNA library, produced from a pool of 18 glass eels (juveniles) from the French Atlantic coast and two sites in the Mediterranean coast. Over 310,000 reads were assembled in a total of 19,631 transcribed contigs, with an average length of 531 nucleotides. Overall 36% of the contigs were annotated to known protein/nucleotide sequences and 35 putative miRNA identified.\ud \ud Conclusions: This study represents the first transcriptome analysis for a critically endangered species. EeelBase, a dedicated database of annotated transcriptome sequences of the European eel is freely available at http://compgen.bio.unipd.it/eeelbase. Considering the multiple factors potentially involved in the decline of the European eel, including anthropogenic factors such as pollution and human-introduced diseases, our results will provide a rich source of data to discover and identify new genes, characterize gene expression, as well as for identification of genetic markers scattered across the genome to be used in various applications

ResearchOnline@JCU

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

ResearchOnline at James Cook University

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Open Marine Archive

Evolution of Disease Response Genes in Loblolly Pine: Insights from Candidate Genes

Author: AG Clark
AJ Eckert
AM Morse
B Ewing
B Ewing
BC Verrelli
CD Bustamante
Charles H. Langley
D Gordon
D Tian
DA Moeller
David B. Neale
DJ Kliebenstein
DS Gernandt
EA Stahl
Elhan S. Ersoz
F Tajima
G McVean
GA Watterson
GP Gill
GR Brown
GR Brown
H Myburg
HH Flor
IW Wilson
J de Meaux
J Wakeley
JH McDonald
JM Smith
JM Warren
L Van Valen
LE Rose
M Rossi
Mark H. Wright
P Tiffin
RC Schmidtling
RR Hudson
RR Hudson
Santiago C. González-Martínez
SC González-Martínez
Simon Joly
SJ Emrich
T Pyhäjärvi
Y Kim
YX Fu
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: Host-pathogen interactions that may lead to a competitive co-evolution of virulence and resistance mechanisms present an attractive system to study molecular evolution because strong, recent (or even current) selective pressure is expected at many genomic loci. However, it is unclear whether these selective forces would act to preserve existing diversity, promote novel diversity, or reduce linked neutral diversity during rapid fixation of advantageous alleles. In plants, the lack of adaptive immunity places a larger burden on genetic diversity to ensure survival of plant populations. This burden is even greater if the generation time of the plant is much longer than the generation time of the pathogen. METHODOLOGY/PRINCIPAL FINDINGS: Here, we present nucleotide polymorphism and substitution data for 41 candidate genes from the long-lived forest tree loblolly pine, selected primarily for their prospective influences on host-pathogen interactions. This dataset is analyzed together with 15 drought-tolerance and 13 wood-quality genes from previous studies. A wide range of neutrality tests were performed and tested against expectations from realistic demographic models. CONCLUSIONS/SIGNIFICANCE: Collectively, our analyses found that axr (auxin response factor), caf1 (chromatin assembly factor) and gatabp1 (gata binding protein 1) candidate genes carry patterns consistent with directional selection and erd3 (early response to drought 3) displays patterns suggestive of a selective sweep, both of which are consistent with the arm-race model of disease response evolution. Furthermore, we have identified patterns consistent with diversifying selection at erf1-like (ethylene responsive factor 1), ccoaoemt (caffeoyl-CoA-O-methyltransferase), cyp450-like (cytochrome p450-like) and pr4.3 (pathogen response 4.3), expected under the trench-warfare evolution model. Finally, a drought-tolerance candidate related to the plant cell wall, lp5, displayed patterns consistent with balancing selection. In conclusion, both arms-race and trench-warfare models seem compatible with patterns of polymorphism found in different disease-response candidate genes, indicating a mixed strategy of disease tolerance evolution for loblolly pine, a major tree crop in southeastern United States

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

Author: AP Rooney
AP Rooney
AP Weber
C Roth
CD Bustamante
Dario Grattapaglia
DB Neale
DE Stage
Derek R Drost
DL Hartl
Evandro Novaes
F Cheung
FAO
GA Tuskan
GA Watterson
Georgios J Pappas
GR Brown
J Bergelson
JM Cork
JM Eirin-Lopez
K Ohtsu
KB McIntosh
KV Krutovsky
M Barrier
M Heuertz
M Kirst
M Lynch
M Margulies
M Meyer
M Nei
Matias Kirst
MJ Moore
MW Jones-Rhoades
P Parameswaran
PK Ingvarsson
R Fluhr
RM Clark
Ronald R Sederoff
S Chang
SC Gonzalez-Martinez
SJ Emrich
SN Santos
The Arabidopsis Genome Initiative
WB Barbazuk
William G Farmerie
XF Ma
Y Matsuo
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation. Results With the purpose of generating the first broad survey of gene sequences in <it>Eucalyptus grandis</it>, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of <it>Arabidopsis</it> genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches. Conclusion In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the <it>E. grandis </it>genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.</p

Repository Open Access to Scientific Information from Embrapa

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

De-Novo Transcriptome Sequencing of a Normalized cDNA Pool from Influenza Infected Ferrets

Author: A Coppe
A McBrayer
Alexis McBrayer
AS Lipatov
BA Fraser
C Sun
Carl E. Bruder
CE Bruder
Colleen B. Jonsson
D Kaplan
DA Wheeler
E Meyer
G Dennis Jr
H Choe
H Vogel
HL Yen
JA Maher
JC Vera
Jeremy V. Camp
JI Hoffman
JR Monaghan
M Iorizzo
M Margulies
M Torabinejad
Michael C. W. Chan
NM Larin
NR Polato
PD Reuman
Peter Liljeström
PK Wall
RE Green
S Kirkeby
S Maere
SJ Emrich
T Rowe
Thomas L. Svensson
TS Schwartz
V Borrell
V Borrell
W Huang da
Y Huang
Y Matsuoka
Y Zheng
YK Chu
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The ferret is commonly used as a model for studies of infectious diseases. The genomic sequence of this animal model is not yet characterized, and only a limited number of fully annotated cDNAs are currently available in GenBank. The majority of genes involved in innate or adaptive immune response are still lacking, restricting molecular genetic analysis of host response in the ferret model. To enable de novo identification of transcriptionally active ferret genes in response to infection, we performed de-novo transcriptome sequencing of animals infected with H1N1 A/California/07/2009. We also included splenocytes induced with bacterial lipopolysaccharide to allow for identification of transcripts specifically induced by Gram-negative bacteria. We pooled and normalized the cDNA library in order to delimit the risk of sequencing only highly expressed genes. While normalization of the cDNA library removes the possibility of assessing expression changes between individual animals, it has been shown to increase identification of low abundant transcripts. In this study, we identified more than 19000 partial ferret transcripts, including more than 1000 gene orthologs known to be involved in the innate and the adaptive immune response

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Improvement in the Reproducibility and Accuracy of DNA Microarray Quantification by Optimizing Hybridization Conditions

Author: A Naderi
AK Jarvinen
AT Rogojina
Carrie L Moland
Cathy D Melvin
CL Yauk
D Xu
DC Sgroi
DL Wheeler
DT Chen
E Marshall
EK Nordberg
F Diehl
F Li
H Lyng
HB Nielsen
J Meinkoth
James C Fuscoe
JC Fuscoe
JE Larkin
JM Lage
JM Rouillard
Karol L Thompson
KL Thompson
L Shi
Leming Shi
M Schena
MK McQuain
N Mah
N Tolstrup
P Scott Pine
PK Tan
PO Brown
R Kothapalli
RA Irizarry
S Draghici
S Huang
SJ Emrich
T Bammler
Tao Han
W Tong
William S Branham
X Wang
YA Chen
YA Chen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: DNA microarrays, which have been increasingly used to monitor mRNA transcripts at a global level, can provide detailed insight into cellular processes involved in response to drugs and toxins. This is leading to new understandings of signaling networks that operate in the cell, and the molecular basis of diseases. Custom printed oligonucleotide arrays have proven to be an effective way to facilitate the applications of DNA microarray technology. A successful microarray experiment, however, involves many steps: well-designed oligonucleotide probes, printing, RNA extraction and labeling, hybridization, and imaging. Optimization is essential to generate reliable microarray data. RESULTS: Hybridization and washing steps are crucial for a successful microarray experiment. By following the hybridization and washing conditions recommended by an oligonucleotide provider, it was found that the expression ratios were compressed greater than expected and data analysis revealed a high degree of non-specific binding. A series of experiments was conducted using rat mixed tissue RNA reference material (MTRRM) and other RNA samples to optimize the hybridization and washing conditions. The optimized hybridization and washing conditions greatly reduced the non-specific binding and improved the accuracy of spot intensity measurements. CONCLUSION: The results from the optimized hybridization and washing conditions greatly improved the reproducibility and accuracy of expression ratios. These experiments also suggested the importance of probe designs using better bioinformatics approaches and the need for common reference RNA samples for platform performance evaluation in order to fulfill the potential of DNA microarray technology

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Standardized metadata for human pathogen/vector genomic sequences

Author: Barrett T
Birren B
Brinkac L
Bruno VM
Caler E
Chapman S
Collins FH
Cuomo CA
Di Francesco V
Dugan VG
Durkin S
Emrich SJ
Eppinger M
Feldgarden M
Fraser C
Fricke WF
Giovanni M
Giraldo-Calderón GI
Harb OS
Henn MR
Hine E
Hotopp JD
Karsch-Mizrachi I
Kissinger JC
Lee EM
Mathur P
Mongodin EF
Murphy CI
Myers G
Neafsey DE
Nelson KE
Newman RM
Nierman WC
Pickett BE
Puzak J
Rasko D
Roos DS
Sadzewicz L
Scheuermann RH
Schriml LM
Silva JC
Singh I
Sobral B
Squires RB
Stevens RL
Stockwell TB
Stoeckert CJ
Sullivan DE
Tallon L
Tettelin H
Ward DV
Wentworth D
White O
Will R
Wortman J
Yao A
Zhang Y
Zheng J
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/06/2014
Field of study

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant

OPUS - University of Technology Sydney