Search CORE

58 research outputs found

A unifying framework for seed sensitivity and its application to subset seeds

Author: A. Finkelstein
A.V. Aho
B. Brejova
B. Brejova
B. Brejova
B. Ma
D. Brown
G. Kucherov
G. Kucherov
I.H. Yang
J. Buhler
J. Xu
J.D. Ullman
K. Choi
K.P. Choi
S. Altschul
S. Burkhardt
W. Chen
W.J. Kent
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2004
Field of study

We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem -- a set of target alignments, an associated probability distribution, and a seed model -- that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

Hal-Diderot

Decoding HMMs using the k best paths: algorithms and applications

Author: A Bairoch
A Krogh
B Brejova
D Eppstein
D Golod
D Golod
Daniel G Brown
Daniil Golod
G Tusnady
L Kall
L Kall
L Rabiner
M Rapp
P Fariselli
R Durbin
R Sramek
Publication venue: BioMed Central
Publication date
Field of study

Crossref

PubMed Central

SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences

Author: A Brazma
A Califano
B Brejova
DR Cavener
E Eskin
Fathi Elloumi
FP Roth
G Pavesi
G Thijs
GZ Hertz
H Salgado
I Jonassen
I Rigoutsos
I Rigoutsos
J Van Helden
M Burset
M Tompa
Martha Nason
PA Pevzner
PA Pevzner
R Agrawal
S Sinha
S Sinha
TL Bailey
Y Makita
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Computational methods to predict transcription factor binding sites (TFBS) based on exhaustive algorithms are guaranteed to find the best patterns but are often limited to short ones or impose some constraints on the pattern type. Many patterns for binding sites in prokaryotic species are not well characterized but are known to be large, between 16–30 base pairs (bp) and contain at least 2 conserved bases. The length of prokaryotic species promoters (about 400 bp) and our interest in studying a small set of genes that could be a cluster of co-regulated genes from microarray experiments led to the development of a new exhaustive algorithm targeting these large patterns. Results We present Searchpattool, a new method to search for and select the most specific (conservative) frequent patterns. This method does not impose restrictions on the density or the structure of the pattern. The best patterns (motifs) are selected using several statistics, including a new application of a z-score based on the number of matching sequences. We compared Searchpattool against other well known algorithms on a <it>Bacillus subtilis </it>group of 14 input sequences and found that in our experiments Searchpattool always performed the best based on performance scores. Conclusion Searchpattool is a new method for pattern discovery relative to transcription factor binding sites for species or genes with short promoters. It outputs the most specific significant patterns and helps the biologist to choose the best candidates.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Assessment and improvement of the Plasmodium yoelii yoelii genome annotation through comparative analysis

Author: Alice S. Tarun
Ashley Vaughan
Bendtsen
Blair
Brejova
Carlton
Eng
Gardner
Gowthaman Ramasamy
Hall
Kaiser
Kariu
Keller
Kooij
Krogh
Kumar
Li
Ling Li
Lu
Luke
Malcolm J. Gardner
Mulder
Pertea
Rice
Roy
Sacci
Salzberg
Slater
Snow
Stefan H.I. Kappe
Sum-Ying Chiu
Tarun
Tarun
Tatusov
Wang
Xinxia Peng
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: The sequencing of the Plasmodium yoelii genome, a model rodent malaria parasite, has greatly facilitated research for the development of new drug and vaccine candidates against malaria. Unfortunately, only preliminary gene models were annotated on the partially sequenced genome, mostly by in silico gene prediction, and there has been no major improvement of the annotation since 2002

Crossref

PubMed Central

Finding genes in Schistosoma japonicum: annotating novel genomes with help of extrinsic evidence

Author: Allen
Altschul
Brejova
Brown
Broňa Brejová
Cantarel
Chiu
Cole
Daniel G. Brown
Edgar
Florea
Guigo
Guindon
Guoping Zhao
Kent
Korf
Lomsadze
Margulies
Ming Li
Ng
Ohler
Parra
Price
Shengyue Wang
Sonnenburg
Stanke
Stanke
Tatusov
Tomáš Vinař
World Health Organization Expert Committee
Yan Zhou
Yangyi Chen
Publication venue: Oxford University Press
Publication date
Field of study

We have developed a novel method for estimating the parameters of hidden Markov models for gene finding in newly sequenced species. Our approach does not rely on curated training data sets, but instead uses extrinsic evidence (including paired-end ditags that have not been used in gene finding previously) and iterative training. This new method is particularly suitable for annotation of species with large evolutionary distance to the closest annotated species. We have used our approach to produce an initial annotation of more than 16 000 genes in the newly sequenced Schistosoma japonicum draft genome. We established the high quality of our predictions by comparison to full-length cDNAs (withdrawn from the extrinsic evidence) and to CEGMA core genes. We also evaluated the effectiveness of the new training procedure on Caenorhabditis elegans genome. ExonHunter and the newest parametric files for S. japonicum genome are available for download at www.bioinformatics.uwaterloo.ca/downloads/exonhunte

Crossref

PubMed Central

Hit integration for identifying optimal spaced seeds

Author: B Brejova
B Ma
B Ma
DYF Mak
FP Preparata
G Kucherov
H Anton
I Herms
IH Yang
J Buhler
J Stoer
J Xu
J Yang
KP Choi
KP Choi
L Ilie
L Ilie
L Zhou
M Farach-Colton
M Li
M Li
Seong-Bae Park
SF Altschul
TF Smith
U Keich
WJ Kent
Won-Hyoung Chung
WR Pearson
X Gao
Y Sun
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

The common marmoset genome provides insight into primate biology and evolution

We report the whole-genome sequence of the common marmoset (Callithrix jacchus). The 2.26-Gb genome of a female marmoset was assembled using Sanger read data (6×) and a whole-genome shotgun strategy. A first analysis has permitted comparison with the genomes of apes and Old World monkeys and the identification of specific features that might contribute to the unique biology of this diminutive primate, including genetic changes that may influence body size, frequent twinning and chimerism. We observed positive selection in growth hormone/insulin-like growth factor genes (growth pathways), respiratory complex I genes (metabolic pathways), and genes encoding immunobiological factors and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibited evidence of rapid sequence evolution. This genome sequence for a New World monkey enables increased power for comparative analyses among available primate genomes and facilitates biomedical research application. © 2014 Nature America, Inc

LSU Scholarly Repository (Louisiana State Univ.)

A draft genome sequence of the elusive giant squid, Architeuthis dux

Author: Albertin C. B.
Alexander G. C.
Antunes A.
Baril T.
Barrio-Hernandez I.
Blagoev B.
Brejova B.
Campos A.
Castro L. F. C.
Chu C.
Couto A.
Da Fonseca R. R.
Fedrigo O.
Frazao B.
Gardner P.
Gilbert M. T. P.
Hayward A.
Hoving H. -J.
Jarvis E.
Li Q.
Ma B.
Machado A. M.
Musacchia F.
Nielsen R.
Osorio H.
Patricio M.
Penaloza F.
Petersen B.
Pisani D.
Rahman M. Z.
Rasmussen S.
Ribeiro A. M.
Rocha S.
Sanges R.
Sicheritz-Ponten T.
Silva F.
Simakov O.
Strugnell J. M.
Tafur-Jimenez R.
Vinar T.
Vinther J.
Winkelmann I.
Wu Y.
Zhang G.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/11/2019
Field of study

Background: The giant squid (Architeuthis dux; Steenstrup, 1857) is an enigmatic giant mollusc with a circumglobal distribution in the deep ocean, except in the high Arctic and Antarctic waters. The elusiveness of the species makes it difficult to study. Thus, having a genome assembled for this deep-sea-dwelling species will allow several pending evolutionary questions to be unlocked. Findings: We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long reads, and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from 3 different tissue types from 3 other species of squid (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein-coding genes supported by evidence, and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome. Conclusions: This annotated draft genome of A. dux provides a critical resource to investigate the unique traits of this species, including its gigantism and key adaptations to deep-sea environments

OceanRep

ResearchOnline@JCU

Investigo

Woods Hole Open Access Server

ResearchOnline at James Cook University

Copenhagen University Research Information System

eScholarship - University of California

Sissa Digital Library

Open Research Exeter

Repositório Aberto da Universidade do Porto

NORA - Norwegian Open Research Archives

Explore Bristol Research

Improving model construction of profile HMMs for remote homology detection through structural alignment

Author: A Andreeva
A Bateman
A Krogh
A Krogh
AC Camproux
Alberto MR Dávila
B Brejova
B Knudsen
B Qian
C Bystroff
C Do
C Notredame
D Feng
D Haft
F Altschul
F Goyon
Gerson Zaverucha
H Mamitsuka
I Letunic
J Espadaler
J Gough
J Park
J Shi
J Söding
J Thompson
JD Thompson
JR Beck
Juliana S Bernardes
K Bae
K Karplus
K Karplus
K Katoh
K Lin
K Mizuguchi
K Sjolander
L Holm
L Rabiner
M Gribskov
M Helen
M Madera
M Mendel
M Wistrand
M Wistrand
O Sullivan
P Bourne
P Nuin
R Edgar
R Hughey
R Hughey
R Karchin
S Altschul
S Eddy
S Jones
T Attwood
T Mitchell
V Alexandrov
Vítor S Costa
W Majoros
W Taylor
WR Pearson
Y Hou
Y Hou
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the <it>Twilight Zone</it>, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The completion of the Mammalian Gene Collection (MGC)

Author: Astashyn A.
Baertsch R.
Bhat N.
Blakesley R. W.
Bonner T. I.
Bouffard G. G.
Brejova B.
Brent M.
Brown G.
Brownstein M.
Buetow K. H.
Chuah E.
Collins F. S.
Comstock C. L.
Deng A.
Deng M.
Derge J. G.
Dickson M. C.
Diekhans M.
Farrell C.
Feingold E. A.
Garcia A. M.
Gerhard D. S.
Ghamsari L.
Gibbs R. A.
Good P. J.
Green E. D.
Grimwood J.
Gruber C. E.
Gunaratne P. H.
Hart J.
Harte R.
Haussler D.
Hirst M.
Hudson J.
Jacob H.
Jang W.
Kent J.
Kloske D.
Landrum M.
Langton L.
Lazar J.
Lebeau A.
Lewis J.
Lin C.
Ma K.
Maglott D.
Mah D.
Maidak B. L.
Mandich A.
Marsh A.
McPherson J.
Mello E.
Misquitta L.
Moksa M.
Moore T.
Mullikin J.
Muratet M.
Murphy M.
Murphy T.
Murray R. R.
Muzny D.
Myers R. M.
Pang J.
Pardes E.
Pennacchio C.
Phan L.
Pruitt K. D.
Rajput B.
Rasooly R.
Riddick L.
Robinson C.
Rodriguez A. C.
Salehi-Ashtiani K.
Schaefer C. F.
Schmutz J.
Schreiber K.
Sethupathy P.
Shapiro N.
Shenmen C. M.
Shoaf D.
Sieja S.
Siepel A.
Simmons B.
Smith M. R.
Stevens M.
Taylor G.
Temple G.
Tse K.
van Baren M. J.
Wagner L.
Ward M.
Webb D.
Weber J.
Wei C.
Wu J.
Wu W.
Yankie L.
Young A. C.
Zeng T.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/12/2009
Field of study

Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide

Cold Spring Harbor Laboratory Institutional Repository