Search CORE

226 research outputs found

CLU: A new algorithm for EST clustering

Author: A Kalyanaraman
Andrey Ptitsyn
AR Williamson
GG Lennon
J Burke
J Quackenbush
K Malde
M Cariaso
MS Boguski
MS Boguski
RT Miller
T Kapros
VB Streletc
VB Strelets
Winston Hide
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. RESULTS: We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats. CONCLUSION: CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded fro

Crossref

Springer - Publisher Connector

PubMed Central

White Rose Research Online

phorest: a web-based tool for comparative analyses of expressed sequence tag data

Author: Boguski MS
Burks C
Huang X
Johansson T
Karp PD
Picoult-Newberg L
Tunlid A
Publication venue: 'Wiley'
Publication date: 01/01/2004
Field of study

Comparative analysis of expressed sequence tags is becoming an important tool in molecular ecology for comparing gene expression in organisms grown in certain environments. Additionally, expressed sequence tag database information can be used for the construction of DNA microarrays and for the detection of single nucleotide polymorphisms. For such applications, we present PHOREST, a web-based tool for managing, analysing and comparing various collections of expressed sequence tags. It is written in PHP (PHP: Hypertext Preprocessor) and runs on UNIX, Microsoft Windows and Macintosh (Mac OS X) platforms

Crossref

Lund University Publications

annot8r: GO, EC and KEGG annotation of EST datasets

Author: A Bairoch
A Conesa
A Papanicolaou
DM Martin
E Camon
EM Zdobnov
J Bai
J Parkinson
J Parkinson
JD Wasmuth
JE Stajich
LB Koski
M Ashburner
M Kanehisa
Mark L Blaxter
MS Boguski
Ralf Schmid
SF Altschul
SR Stürzenbaum
The UniProt Consortium
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Leicester Research Archive

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

Author: AA Schäffer
AL Delcher
Alejandro A Schäffer
B Brejová
B Hao
BG Barrell
DJ States
E Birney
E Birney
E Boy-Marcotte
E Boy-Marcotte
E Halperin
E Michael Gertz
EM Gertz
F Damak
F Zinoni
G Macino
H Peltola
IG Young
J Hein
J Hein
JC Wootton
L Knecht
M Gribskov
MS Boguski
MS Boguski
MS Gelfand
O Gotoh
P Steneberg
P Steneberg
R Durbin
Richa Agarwala
S Henikoff
S Kurtz
SA Chervitz
SC Low
SF Altschul
SF Altschul
SF Altschul
SF Altschul
Stephen F Altschul
TF Smith
W Gish
WJ Kent
WR Pearson
WR Pearson
WR Pearson
X Guan
X Huang
Yi-Kuo Yu
YK Yu
YK Yu
Z Zhang
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. RESULTS: We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. CONCLUSION: TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Analysis of multiplex gene expression maps obtained by voxelation

Author: AK Jain
B Albert
D Lin
D Liu
Desmond J Smith
G Kaiser
Hongbo Xie
JA Hartigan
JB MacQueen
Li An
Mark H Chin
MB Eisen
MH Chin
MS Boguski
PO Brown
RJ Lipshutz
RP Singh
Vasileios Megalooikonomou
VE Velculescu
VM Brown
VM Brown
Zoran Obradovic
Publication venue: BioMed Central
Publication date: 01/04/2009
Field of study

BackgroundGene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions.ResultsTo analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in cortex and corpus callosum.ConclusionThe experimental results confirm the hypothesis that genes with similar gene expression maps might have similar gene functions. The voxelation data takes into account the location information of gene expression level in mouse brain, which is novel in related research. The proposed approach can potentially be used to predict gene functions and provide helpful suggestions to biologists

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Evolutionary History of the HAP2/GCS1 Gene and Sexual Reproduction in Metazoans

Author: AY Signorovitch
B Schierwater
C Notredame
Catherine E. Dana
CD Goodman
D Bridge
D Bridge
F Borges
G Hemmrich
H Bode
Jason E. Stajich
K von Besser
KG Grell
M Hirai
M Srivastava
MA Miller
MS Boguski
N King
NJ Besansky
Robert E. Steele
T Mori
Y Liu
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

The HAP2/GCS1 gene first appeared in the common ancestor of plants, animals, and protists, and is required in the male gamete for fusion to the female gamete in the unicellular organisms Chlamydomonas and Plasmodium. We have identified a HAP2/GCS1 gene in the genome sequence of the sponge Amphimedon queenslandica. This finding provides a continuous evolutionary history of HAP2/GCS1 from unicellular organisms into the metazoan lineage. Divergent versions of the HAP2/GCS1 gene are also present in the genomes of some but not all arthropods. By examining the expression of the HAP2/GCS1 gene in the cnidarian Hydra, we have found the first evidence supporting the hypothesis that HAP2/GCS1 was used for male gamete fusion in the ancestor of extant metazoans and that it retains that function in modern cnidarians

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

SILAC-based proteomic quantification of chemoattractant-induced cytoskeleton dynamics on a second to minute timescale

Author: A Bagorda
A Kortholt
A Kortholt
A Para
A Shevchenko
AJ Saldanha
AT Sasaki
C Orelio
CL Chen
CL Manahan
DM Veltman
DM Veltman
DM Veltman
EC Rericha
EL de Hostos
F Friedberg
F Vazquez
G Vlahou
H Cai
H Kae
HR Bourne
I Marin
J Condeelis
J Cox
J Faix
J Riedl
J Schindelin
J Yan
JF Cote
JF Cote
JW Han
JY Kim
KF Swaney
L Bosgraaf
L Bosgraaf
L Bosgraaf
L Chen
M Affolter
M Brenner
M de la Roche
M Patel
MJ de Hoon
MK Vartiainen
MR Lee
MS Boguski
N Ibarra
N Meller
OD Weiner
R Insall
RH Insall
RJ Eddy
S Hanna
S Levi
SE Ong
SJ Allen
SL Blagg
TJ Jeon
WN van Egmond
X Xu
Y Yang
YC Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/02/2014
Field of study

Cytoskeletal dynamics during cell behaviours ranging from endocytosis and exocytosis to cell division and movement is controlled by a complex network of signalling pathways, the full details of which are as yet unresolved. Here we show that SILAC-based proteomic methods can be used to characterize the rapid chemoattractant-induced dynamic changes in the actin–myosin cytoskeleton and regulatory elements on a proteome-wide scale with a second to minute timescale resolution. This approach provides novel insights in the ensemble kinetics of key cytoskeletal constituents and association of known and novel identified binding proteins. We validate the proteomic data by detailed microscopy-based analysis of in vivo translocation dynamics for key signalling factors. This rapid large-scale proteomic approach may be applied to other situations where highly dynamic changes in complex cellular compartments are expected to play a key role

Crossref

PubMed Central

University of Dundee Online Publications

The vertebrate phylotypic stage and an early bilaterian-related stage in mouse embryogenesis defined by genomic information

Author: A Pires-daSilva
AL Hughes
Atsuko Sehara-Fujisawa
B Hall
BBR Hogan
BK Hall
D Duboule
E Hazkani-Covo
F Seidel
JM Slack
K Sander
K Sander
KE von Baer
L Wolpert
MK Richardson
MK Richardson
MK Richardson
MM Alba
MS Boguski
Naoki Irie
OR Bininda-Emonds
PH O'Farrell
R Rugh
RA Raff
RP Elinson
T Miyata
U Joan
WW Ballard
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Embryos of taxonomically different vertebrates are thought to pass through a stage in which they resemble one another morphologically. This "vertebrate phylotypic stage" may represent the basic vertebrate body plan that was established in the common ancestor of vertebrates. However, much controversy remains about when the phylotypic stage appears, and whether it even exists. To overcome the limitations of studies based on morphological comparison, we explored a comprehensive quantitative method for defining the constrained stage using expressed sequence tag (EST) data, gene ontologies (GO), and available genomes of various animals. If strong developmental constraints occur during the phylotypic stage of vertebrate embryos, then genes conserved among vertebrates would be highly expressed at this stage. RESULTS: We established a novel method for evaluating the ancestral nature of mouse embryonic stages that does not depend on comparative morphology. The numerical "ancestor index" revealed that the mouse indeed has a highly conserved embryonic period at embryonic day 8.0–8.5, the time of appearance of the pharyngeal arch and somites. During this period, the mouse prominently expresses GO-determined developmental genes shared among vertebrates. Similar analyses revealed the existence of a bilaterian-related period, during which GO-determined developmental genes shared among bilaterians are markedly expressed at the cleavage-to-gastrulation period. The genes associated with the phylotypic stage identified by our method are essential in embryogenesis. CONCLUSION: Our results demonstrate that the mid-embryonic stage of the mouse is indeed highly constrained, supporting the existence of the phylotypic stage. Furthermore, this candidate stage is preceded by a putative bilaterian ancestor-related period. These results not only support the developmental hourglass model, but also highlight the hierarchical aspect of embryogenesis proposed by von Baer. Identification of conserved stages and tissues by this method in various animals would be a powerful tool to examine the phylotypic stage hypothesis, and to understand which kinds of developmental events and gene sets are evolutionarily constrained and how they limit the possible variations of animal basic body plans

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

BCR and its mutants, the reciprocal t(9;22)-associated ABL/BCR fusion proteins, differentially regulate the cytoskeleton and cell motility

Author: AS Alberts
Elena Puccetti
F Grignani
G Bug
G Radziwill
G Scita
GM Mahon
H Kantarjian
H Pfeifer
IP Whitehead
JD van Buul
JS Tokarski
JV Melo
JV Melo
L Van Aelst
M Deininger
Martin Ruthardt
MS Boguski
N Gokbuget
R Mohle
RD Unwin
S Faderl
S Salesse
Saskia Güller
SD Pelletier
T Harnois
TH Chuang
Tim Beissert
W Lu
Xiaomin Zheng
Y Maru
Y Maru
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The reciprocal (9;22) translocation fuses the bcr (breakpoint cluster region) gene on chromosome 22 to the abl (Abelson-leukemia-virus) gene on chromosome 9. Depending on the breakpoint on chromosome 22 (the Philadelphia chromosome – Ph+) the derivative 9+ encodes either the p40((ABL/BCR) )fusion transcript, detectable in about 65% patients suffering from chronic myeloid leukemia, or the p96((ABL/BCR) )fusion transcript, detectable in 100% of Ph+ acute lymphatic leukemia patients. The ABL/BCRs are N-terminally truncated BCR mutants. The fact that BCR contains Rho-GEF and Rac-GAP functions strongly suggest an important role in cytoskeleton modeling by regulating the activity of Rho-like GTPases, such as Rho, Rac and cdc42. We, therefore, compared the function of the ABL/BCR proteins with that of wild-type BCR. METHODS: We investigated the effects of BCR and ABL/BCRs i.) on the activation status of Rho, Rac and cdc42 in GTPase-activation assays; ii.) on the actin cytoskeleton by direct immunofluorescence; and iii) on cell motility by studying migration into a three-dimensional stroma spheroid model, adhesion on an endothelial cell layer under shear stress in a flow chamber model, and chemotaxis and endothelial transmigration in a transwell model with an SDF-1α gradient. RESULTS: Here we show that both ABL/BCRs lost fundamental functional features of BCR regarding the regulation of small Rho-like GTPases with negative consequences on cell motility, in particular on the capacity to adhere to endothelial cells. CONCLUSION: Our data presented here describe for the first time an analysis of the biological function of the reciprocal t(9;22) ABL/BCR fusion proteins in comparison to their physiological counterpart BCR

Crossref

Online Research @ Cardiff

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

Systematic identification of abundant A-to-I editing sites in the human transcriptome

Author: AG Polson
AG Polson
Avi Shoshan
B Hoopengardner
BL Bass
Dan Sztybel
DP Morse
DP Morse
Eli Eisenberg
Erez Y Levanon
FM Ausubel
Gideon Rechavi
I Gurevich
JB Patterson
JH Yang
KA Lehmann
LA Tonkin
LA Tonkin
LD Hillier
M Higuchi
M Higuchi
M Lei
Martina Hallegger
Michael F Jantsch
MJ Palladino
Moshe Olshansky
MS Boguski
MS Paul
PH Seeburg
Q Wang
R Brusa
R Jiang
R Kikuno
R Sorek
Rodrigo Yelin
Ronen Shemesh
S Maas
S Maas
Sarah R Pollock
SE Antonarakis
Sergey Nemzer
SK Wong
U Kim
Y Kawahara
Zipora Y Fligelman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

RNA editing by members of the double-stranded RNA-specific ADAR family leads to site-specific conversion of adenosine to inosine (A-to-I) in precursor messenger RNAs. Editing by ADARs is believed to occur in all metazoa, and is essential for mammalian development. Currently, only a limited number of human ADAR substrates are known, while indirect evidence suggests a substantial fraction of all pre-mRNAs being affected. Here we describe a computational search for ADAR editing sites in the human transcriptome, using millions of available expressed sequences. 12,723 A-to-I editing sites were mapped in 1,637 different genes, with an estimated accuracy of 95%, raising the number of known editing sites by two orders of magnitude. We experimentally validated our method by verifying the occurrence of editing in 26 novel substrates. A-to-I editing in humans primarily occurs in non-coding regions of the RNA, typically in Alu repeats. Analysis of the large set of editing sites indicates the role of editing in controlling dsRNA stability.Comment: Pre-print version. See http://dx.doi.org/10.1038/nbt996 for a reprin

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive