Search CORE

446 research outputs found

CLU: A new algorithm for EST clustering

Author: A Kalyanaraman
Andrey Ptitsyn
AR Williamson
GG Lennon
J Burke
J Quackenbush
K Malde
M Cariaso
MS Boguski
MS Boguski
RT Miller
T Kapros
VB Streletc
VB Strelets
Winston Hide
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. RESULTS: We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats. CONCLUSION: CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded fro

Crossref

Springer - Publisher Connector

PubMed Central

White Rose Research Online

SILAC-based proteomic quantification of chemoattractant-induced cytoskeleton dynamics on a second to minute timescale

Author: A Bagorda
A Kortholt
A Kortholt
A Para
A Shevchenko
AJ Saldanha
AT Sasaki
C Orelio
CL Chen
CL Manahan
DM Veltman
DM Veltman
DM Veltman
EC Rericha
EL de Hostos
F Friedberg
F Vazquez
G Vlahou
H Cai
H Kae
HR Bourne
I Marin
J Condeelis
J Cox
J Faix
J Riedl
J Schindelin
J Yan
JF Cote
JF Cote
JW Han
JY Kim
KF Swaney
L Bosgraaf
L Bosgraaf
L Bosgraaf
L Chen
M Affolter
M Brenner
M de la Roche
M Patel
MJ de Hoon
MK Vartiainen
MR Lee
MS Boguski
N Ibarra
N Meller
OD Weiner
R Insall
RH Insall
RJ Eddy
S Hanna
S Levi
SE Ong
SJ Allen
SL Blagg
TJ Jeon
WN van Egmond
X Xu
Y Yang
YC Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/02/2014
Field of study

Cytoskeletal dynamics during cell behaviours ranging from endocytosis and exocytosis to cell division and movement is controlled by a complex network of signalling pathways, the full details of which are as yet unresolved. Here we show that SILAC-based proteomic methods can be used to characterize the rapid chemoattractant-induced dynamic changes in the actin–myosin cytoskeleton and regulatory elements on a proteome-wide scale with a second to minute timescale resolution. This approach provides novel insights in the ensemble kinetics of key cytoskeletal constituents and association of known and novel identified binding proteins. We validate the proteomic data by detailed microscopy-based analysis of in vivo translocation dynamics for key signalling factors. This rapid large-scale proteomic approach may be applied to other situations where highly dynamic changes in complex cellular compartments are expected to play a key role

Crossref

PubMed Central

University of Dundee Online Publications

annot8r: GO, EC and KEGG annotation of EST datasets

Author: A Bairoch
A Conesa
A Papanicolaou
DM Martin
E Camon
EM Zdobnov
J Bai
J Parkinson
J Parkinson
JD Wasmuth
JE Stajich
LB Koski
M Ashburner
M Kanehisa
Mark L Blaxter
MS Boguski
Ralf Schmid
SF Altschul
SR Stürzenbaum
The UniProt Consortium
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Leicester Research Archive

Evolutionary History of the HAP2/GCS1 Gene and Sexual Reproduction in Metazoans

Author: AY Signorovitch
B Schierwater
C Notredame
Catherine E. Dana
CD Goodman
D Bridge
D Bridge
F Borges
G Hemmrich
H Bode
Jason E. Stajich
K von Besser
KG Grell
M Hirai
M Srivastava
MA Miller
MS Boguski
N King
NJ Besansky
Robert E. Steele
T Mori
Y Liu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

The HAP2/GCS1 gene first appeared in the common ancestor of plants, animals, and protists, and is required in the male gamete for fusion to the female gamete in the unicellular organisms Chlamydomonas and Plasmodium. We have identified a HAP2/GCS1 gene in the genome sequence of the sponge Amphimedon queenslandica. This finding provides a continuous evolutionary history of HAP2/GCS1 from unicellular organisms into the metazoan lineage. Divergent versions of the HAP2/GCS1 gene are also present in the genomes of some but not all arthropods. By examining the expression of the HAP2/GCS1 gene in the cnidarian Hydra, we have found the first evidence supporting the hypothesis that HAP2/GCS1 was used for male gamete fusion in the ancestor of extant metazoans and that it retains that function in modern cnidarians

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

ASPicDB: a database of annotated transcript and protein variants generated by alternative splicing

Author: A. M. D'Erchia
A. Valletti
A. Zauli
Barash
Berman
Boguski
Bonizzoni
Bonizzoni
Boutet
Castrignano
Dutta
E. Picardi
F. Licciulli
F. Mignone
F. Zambelli
Fariselli
Faustino
G. Pavesi
G. Pesole
I. Rossi
Kodzius
M. D'Antonio
M. Finelli
M. Mangiulli
Martelli
Matlin
P. Bonizzoni
P. D'Onorio De Meo
P. Fariselli
P. L. Martelli
Pan
Pettigrew
Pierleoni
Pierleoni
R. Casadio
R. Rizzi
Riva
Srebrow
Stamm
T. Castrignano
Venables
Wang
Wang
Wang
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256 939 protein variants from 17 191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www.caspur.it/ASPicDB/

Crossref

AIR Universita degli studi di Milano

PubMed Central

Archivio istituzionale della ricerca - Università di Bari

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Archivio istituzionale della ricerca - Università di Padova

Institutional Research Information System University of Turin

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

Author: AA Schäffer
AL Delcher
Alejandro A Schäffer
B Brejová
B Hao
BG Barrell
DJ States
E Birney
E Birney
E Boy-Marcotte
E Boy-Marcotte
E Halperin
E Michael Gertz
EM Gertz
F Damak
F Zinoni
G Macino
H Peltola
IG Young
J Hein
J Hein
JC Wootton
L Knecht
M Gribskov
MS Boguski
MS Boguski
MS Gelfand
O Gotoh
P Steneberg
P Steneberg
R Durbin
Richa Agarwala
S Henikoff
S Kurtz
SA Chervitz
SC Low
SF Altschul
SF Altschul
SF Altschul
SF Altschul
Stephen F Altschul
TF Smith
W Gish
WJ Kent
WR Pearson
WR Pearson
WR Pearson
X Guan
X Huang
Yi-Kuo Yu
YK Yu
YK Yu
Z Zhang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. RESULTS: We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. CONCLUSION: TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The Origins, Evolution, and Functional Potential of Alternative Splicing in Vertebrates

Author: A. Frankish
A. Reymond
Altschul
Amit
Boguski
C. Howald
Cazalla
Cheah
Chen
Graveley
Hansen
J. Fernandez-Banet
J. Harrow
J. M. Mudge
Jekosch
Jurka
Kan
Kim
Kim
Koren
Langmead
Lareau
Lu
McGuire
Mendell
Modrek
Mortazavi
Nilsen
Ohoka
Pan
Pickrell
R. Guigo
Schwartz
Sela
Sela
Simpson
Slater
Sorek
Sorek
Sorek
Sprague
Stolc
Sureau
T. Alioto
T. Derrien
T. Hubbard
Wang
Wang
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Alternative splicing (AS) has the potential to greatly expand the functional repertoire of mammalian transcriptomes. However, few variant transcripts have been characterized functionally, making it difficult to assess the contribution of AS to the generation of phenotypic complexity and to study the evolution of splicing patterns. We have compared the AS of 309 protein-coding genes in the human ENCODE pilot regions against their mouse orthologs in unprecedented detail, utilizing traditional transcriptomic and RNAseq data. The conservation status of every transcript has been investigated, and each functionally categorized as coding (separated into coding sequence [CDS] or nonsense-mediated decay [NMD] linked) or noncoding. In total, 36.7% of human and 19.3% of mouse coding transcripts are species specific, and we observe a 3.6 times excess of human NMD transcripts compared with mouse; in contrast to previous studies, the majority of species-specific AS is unlinked to transposable elements. We observe one conserved CDS variant and one conserved NMD variant per 2.3 and 11.4 genes, respectively. Subsequently, we identify and characterize equivalent AS patterns for 22.9% of these CDS or NMD-linked events in nonmammalian vertebrate genomes, and our data indicate that functional NMD-linked AS is more widespread and ancient than previously thought. Furthermore, although we observe an association between conserved AS and elevated sequence conservation, as previously reported, we emphasize that 30% of conserved AS exons display sequence conservation below the average score for constitutive exons. In conclusion, we demonstrate the value of detailed comparative annotation in generating a comprehensive set of AS transcripts, increasing our understanding of AS evolution in vertebrates. Our data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution

CiteSeerX

Crossref

Serveur académique lausannois

PubMed Central

UPF Digital Repository

King's Research Portal

EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data

Author: A Christoffels
A Kalyanaraman
A Masoudi-Nejad
B Lee
C Wei
D Karolchik
E Eyras
E Kim
ER Mardis
Ernesto Picardi
Flavio Mignone
G Pertea
G Pesole
GD Schuler
Graziano Pesole
J Burke
J Forment
J Harrow
J Kleffe
J Kleffe
J Parkinson
JP Wang
L Florea
M Arumugam
M de la Bastide
M Stanke
MB Gerstein
MS Boguski
R Apweiler
RT Miller
S Djebali
S Hazelhurst
SF Altschul
SH Nagaraj
SH Nagaraj
T Castrignano
TD Wu
WJ Kent
X Huang
Y Lee
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background ESTs and full-length cDNAs represent an invaluable source of evidence for inferring reliable gene structures and discovering potential alternative splicing events. In newly sequenced genomes, these tasks may not be practicable owing to the lack of appropriate training sets. However, when expression data are available, they can be used to build EST clusters related to specific genomic transcribed <it>loci</it>. Common strategies recently employed to this end are based on sequence similarity between transcripts and can lead, in specific conditions, to inconsistent and erroneous clustering. In order to improve the cluster building and facilitate all downstream annotation analyses, we developed a simple genome-based methodology to generate gene-oriented clusters of ESTs when a genomic sequence and a pool of related expressed sequences are provided. Our procedure has been implemented in the software EasyCluster and takes into account the spliced nature of ESTs after an <it>ad hoc </it>genomic mapping. Methods EasyCluster uses the well-known GMAP program in order to perform a very quick EST-to-genome mapping in addition to the detection of reliable splice sites. Given a genomic sequence and a pool of ESTs/FL-cDNAs, EasyCluster starts building genomic and EST local databases and runs GMAP. Subsequently, it parses results creating an initial collection of pseudo-clusters by grouping ESTs according to the overlap of their genomic coordinates on the same strand. In the final step, EasyCluster refines the clustering by again running GMAP on each pseudo-cluster and groups together ESTs sharing at least one splice site. Results The higher accuracy of EasyCluster with respect to other clustering tools has been verified by means of a manually cured benchmark of human EST clusters. Additional datasets including the Unigene cluster Hs.122986 and ESTs related to the human <it>HOXA </it>gene family have also been used to demonstrate the better clustering capability of EasyCluster over current genome-based web service tools such as ASmodeler and BIPASS. EasyCluster has also been used to provide a first compilation of gene-oriented clusters in the <it>Ricinus communis </it>oilseed plant for which no Unigene clusters are yet available, as well as an evaluation of the alternative splicing in this plant species.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Bari

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

ConservedPrimers 2.0: A high-throughput pipeline for comparative genome referenced intron-flanking PCR primer design and its application in wheat SNP discovery

Author: A Fedorov
AH Paterson
CR Primmer
ER Sears
EV Koonin
FM You
Frank M You
G Haberer
Gerard R Lazo
H Wei
HK Choi
J Dvorak
J Dvorak
J Fredslund
Jan Dvorak
JC Nicod
JF Wendel
JP Vogel
KH Buetow
LA Lyons
LL Qi
M Bekaert
M Hassen
MS Boguski
N Aitken
N Huo
N Rostoks
Naxin Huo
NK Blake
Olin D Anderson
PD Keightley
S Bensch
S Rozen
SF Altschul
SR Palumbi
SW Roy
X Guo
Yong Q Gu
ZL Hu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In some genomic applications it is necessary to design large numbers of PCR primers in exons flanking one or several introns on the basis of orthologous gene sequences in related species. The primer pairs designed by this target gene approach are called "intron-flanking primers" or because they are located in exonic sequences which are usually conserved between related species, "conserved primers". They are useful for large-scale single nucleotide polymorphism (SNP) discovery and marker development, especially in species, such as wheat, for which a large number of ESTs are available but for which genome sequences and intron/exon boundaries are not available. To date, no suitable high-throughput tool is available for this purpose. Results We have developed, the ConservedPrimers 2.0 pipeline, for designing intron-flanking primers for large-scale SNP discovery and marker development, and demonstrated its utility in wheat. This tool uses non-redundant wheat EST sequences, such as wheat contigs and singleton ESTs, and related genomic sequences, such as those of rice, as inputs. It aligns the ESTs to the genomic sequences to identify unique colinear exon blocks and predicts intron lengths. Intron-flanking primers are then designed based on the intron/exon information using the Primer3 core program or BatchPrimer3. Finally, a tab-delimited file containing intron-flanking primer pair sequences and their primer properties is generated for primer ordering and their PCR applications. Using this tool, 1,922 bin-mapped wheat ESTs (31.8% of the 6,045 in total) were found to have unique colinear exon blocks suitable for primer design and 1,821 primer pairs were designed from these single- or low-copy genes for PCR amplification and SNP discovery. With these primers and subsequently designed genome-specific primers, a total of 1,527 loci were found to contain one or more genome-specific SNPs. Conclusion The ConservedPrimers 2.0 pipeline for designing intron-flanking primers was developed and its utility demonstrated. The tool can be used for SNP discovery, genetic variation assays and marker development for any target genome that has abundant ESTs and a related reference genome that has been fully sequenced. The ConservedPrimers 2.0 pipeline has been implemented as a command-line tool as well as a web application. Both versions are freely available at <url>http://wheat.pw.usda.gov/demos/ConservedPrimers/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CTdatabase: a knowledge-base of high-throughput and curated data on cancer-testis antigens

Author: A. A. Camargo
A. A. Jungbluth
A. Bairoch
A. J. G. Simpson
A. R. deOliveira
A. S. Mundstein
A. T. R. Vasconcelos
Altschul
Boguski
Brenner
E. Kiesler
Kumar
L. G. Almeida
L. J. Old
M. C. C. Silva
N. J. Sakabe
O. L. Caballero
Old
R. Chua
S. Gnjatic
S. Gurung
S. L. White
T. Cohen
Thompson
van der Bruggen
Velculescu
Y.-T. Chen
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

The potency of the immune response has still to be harnessed effectively to combat human cancers. However, the discovery of T-cell targets in melanomas and other tumors has raised the possibility that cancer vaccines can be used to induce a therapeutically effective immune response against cancer. The targets, cancer-testis (CT) antigens, are immunogenic proteins preferentially expressed in normal gametogenic tissues and different histological types of tumors. Therapeutic cancer vaccines directed against CT antigens are currently in late-stage clinical trials testing whether they can delay or prevent recurrence of lung cancer and melanoma following surgical removal of primary tumors. CT antigens constitute a large, but ill-defined, family of proteins that exhibit a remarkably restricted expression. Currently, there is a considerable amount of information about these proteins, but the data are scattered through the literature and in several bioinformatic databases. The database presented here, CTdatabase (http://www.cta.lncc.br), unifies this knowledge to facilitate both the mining of the existing deluge of data, and the identification of proteins alleged to be CT antigens, but that do not have their characteristic restricted expression pattern. CTdatabase is more than a repository of CT antigen data, since all the available information was carefully curated and annotated with most data being specifically processed for CT antigens and stored locally. Starting from a compilation of known CT antigens, CTdatabase provides basic information including gene names and aliases, RefSeq accession numbers, genomic location, known splicing variants, gene duplications and additional family members. Gene expression at the mRNA level in normal and tumor tissues has been collated from publicly available data obtained by several different technologies. Manually curated data related to mRNA and protein expression, and antigen-specific immune responses in cancer patients are also available, together with links to PubMed for relevant CT antigen articles

CiteSeerX

Crossref

PubMed Central

Archive ouverte UNIGE