Search CORE

12 research outputs found

MSDmotif: exploring protein sites and motifs

Author: A Golovin
A Golovin
A Prilc
A Prlic
Adel Golovin
AG Murzin
AJ Shepherd
AV Efimov
AV Efimov
BL Sibanda
C Bystroff
CA Orengo
CG Hunter
CH Wu
CT Porter
D Schomburg
DCP Kuhn
DI Stuart
DJ Craik
EJ Milner-White
EJ Milner-White
EJ Milner-White
ELL Sonnhammer
ELL Sonnhammer
H Boutselakis
H Kaur
H Kawasaki
HM Berman
ID Kuntz
J Lee
JD Watson
JD Watson
JYL Questel
KB Li
Kim Henrick
M Clamp
MJ Hartshorn
MR Nelson
N Hulo
ND Rawlings
RD Dowell
RD Finn
S Hayward
S Zhirong
SF Altschul
SF Altschul
T Hubbard
TJ Oldfield
TL Bailey
WJ Duddy
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB) is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure. Results We describe here a web application for querying the PDB for ligands, binding sites, small 3D structural and sequence motifs and the underlying database. Novel algorithms for chemical fragments, 3D motifs, ϕ/ψ sequences, super-secondary structure motifs and for small 3D structural motif associations searches are incorporated. The interface provides functionality for visualization, search criteria creation, sequence and 3D multiple alignment options. MSDmotif is an integrated system where a results page is also a search form. A set of motif statistics is available for analysis. This set includes molecule and motif binding statistics, distribution of motif sequences, occurrence of an amino-acid within a motif, correlation of amino-acids side-chain charges within a motif and Ramachandran plots for each residue. The binding statistics are presented in association with properties that include a ligand fragment library. Access is also provided through the distributed Annotation System (DAS) protocol. An additional entry point facilitates XML requests with XML responses. Conclusion MSDmotif is unique by combining chemical, sequence and 3D data in a single search engine with a range of search and visualisation options. It provides multiple views of data found in the PDB archive for exploring protein structures.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning

Author: Cheng Jianlin
Deng Xin
Eickholt Jesse
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Accurate identification of protein domain boundaries is useful for protein structure determination and prediction. However, predicting protein domain boundaries from a sequence is still very challenging and largely unsolved. Results We developed a new method to integrate the classification power of machine learning with evolutionary signals embedded in protein families in order to improve protein domain boundary prediction. The method first extracts putative domain boundary signals from a multiple sequence alignment between a query sequence and its homologs. The putative sites are then classified and scored by support vector machines in conjunction with input features such as sequence profiles, secondary structures, solvent accessibilities around the sites and their positions. The method was evaluated on a domain benchmark by 10-fold cross-validation and 60% of true domain boundaries can be recalled at a precision of 60%. The trade-off between the precision and recall can be adjusted according to specific needs by using different decision thresholds on the domain boundary scores assigned by the support vector machines. Conclusions The good prediction accuracy and the flexibility of selecting domain boundary sites at different precision and recall values make our method a useful tool for protein structure determination and modelling. The method is available at <url>http://sysbio.rnet.missouri.edu/dobo/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Re-Annotation of the Saccharomyces Cerevisiae Genome

Author: A Ivens
Altschul
B. Barrell
Bairoch
Bairoch
Bateman
Berbee
Birney
Blandin
DeRisi
Dujon
Gaillardin
Goffeau
Hieter
K. M. Rutherford
Lowe
M-A Rajandream
Mackiewicz
Malpertuy
Mewes
Oliver
Oliver
Pearson
Rutherford
Sharp
Sonnhammer
Stoesser
V. Wood
Xiang
Zhang
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2001
Field of study

Discrepancies in gene and orphan number indicated by previous analyses suggest that S. cerevisiae would benefit from a consistent re-annotation. In this analysis three new genes are identified and 46 alterations to gene coordinates are described. 370 ORFs are defined as totally spurious ORFs which should be disregarded. At least a further 193 genes could be described as very hypothetical, based on a number of criteria. It was found that disparate genes with sequence overlaps over ten amino acids (especially at the N-terminus) are rare in both S. cerevisiae and Sz. pombe. A new S. cerevisiae gene number estimate with an upper limit of 5804 is proposed, but after the removal of very hypothetical genes and pseudogenes this is reduced to 5570. Although this is likely to be closer to the true upper limit, it is still predicted to be an overestimate of gene number. A complete list of revised gene coordinates is available from the Sanger Centre (S. cerevisiae reannotation: ftp://ftp/pub/yeast/SCreannotation)

Crossref

Directory of Open Access Journals

PubMed Central

The linear chromosome of the plant-pathogenic mycoplasma 'Candidatus Phytoplasma mali'

Author: Dandekar Thomas
Heitmann Katja
Kube Michael
Kuhl Heiner
Migdoll Alexander M
Reinhardt Richard
Schneider Bernd
Seemüller Erich
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

BACKGROUND: Phytoplasmas are insect-transmitted, uncultivable bacterial plant pathogens that cause diseases in hundreds of economically important plants. They represent a monophyletic group within the class Mollicutes (trivial name mycoplasmas) and are characterized by a small genome with a low GC content, and the lack of a firm cell wall. All mycoplasmas, including strains of 'Candidatus (Ca.) Phytoplasma asteris' and 'Ca. P. australiense', examined so far have circular chromosomes, as is the case for almost all walled bacteria. RESULTS: Our work has shown that 'Ca. Phytoplasma mali', the causative agent of apple proliferation disease, has a linear chromosome. Linear chromosomes were also identified in the closely related provisional species 'Ca. P. pyri' and 'Ca. P. prunorum'. The chromosome of 'Ca. P. mali' strain AT is 601,943 bp in size and has a GC content of 21.4%. The chromosome is further characterized by large terminal inverted repeats and covalently closed hairpin ends. Analysis of the protein-coding genes revealed that glycolysis, the major energy-yielding pathway supposed for 'Ca. P. asteris', is incomplete in 'Ca. P. mali'. Due to the apparent lack of other metabolic pathways present in mycoplasmas, it is proposed that maltose and malate are utilized as carbon and energy sources. However, complete ATP-yielding pathways were not identified. 'Ca. P. mali' also differs from 'Ca. P. asteris' by a smaller genome, a lower GC content, a lower number of paralogous genes, fewer insertions of potential mobile DNA elements, and a strongly reduced number of ABC transporters for amino acids. In contrast, 'Ca. P. mali' has an extended set of genes for homologous recombination, excision repair and SOS response than 'Ca. P. asteris'. CONCLUSION: The small linear chromosome with large terminal inverted repeats and covalently closed hairpin ends, the extremely low GC content and the limited metabolic capabilities reflect unique features of 'Ca. P. mali', not only within phytoplasmas, but all mycoplasmas. It is expected that the genome information obtained here will contribute to a better understanding of the reduced metabolism of phytoplasmas, their fastidious nutrition requirements that prevented axenic cultivation, and the mechanisms involved in pathogenicity

Springer - Publisher Connector

PubMed Central

MPG.PuRe

Comparative analysis of pseudogenes across three phyla

Author: Balasubramanian S
Clark W
Diekhans M
Frankish A
Gerstein MB
Harrow J
Harte R
Hubbard T
Leng J
Pei B
Rozowsky J
Rutenberg-Schoenberg M
Sisu C
Wang D
Zhang Y
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 25/08/2014
Field of study

Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than proteincoding genes, reflecting the different remodeling processes marking each organism’s genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles

Crossref

PubMed Central

King's Research Portal

Brunel University Research Archive

"Smith-Waterman" paralelo en arquitectura de many-core para búsquedas en bases de datos de secuencias

Author: Lago Cabrera Juan
Publication venue: Universidad Internacional de Andalucía
Publication date: 01/01/2011
Field of study

87 p.Trabajo fin de Máster dirigido por Sergio Gálvez Rojas, y co-tutores: Oswaldo Trelles Salazar y Gabriel Dorado Pérez. En este trabajo se ha desarrollado un algoritmo denominado MC64-S3W (MultiCore 64 – Sequence Search Smith-Waterman) para realizar el alineamiento local de una secuencia problema contra una base de datos de secuencias de ácidos nucleicos de gran tamaño (entre 80 y 260 kilobases) en arquitectura hardware de muchos núcleos. La posibilidad de realizar alineamientos de gran tamaño (obteniendo el alineamiento local óptimo) bajo arquitectura de muchos núcleos es, por tanto, uno de los elementos diferenciadores de este trabajo. En el trabajo se justifica el ahorro de tiempo que se consigue al paralelizar varios alineamientos simultáneos y se realiza un estudio comparativo con otras implementaciones paralelas ampliamente referenciadas como es el caso del algoritmo CUDASW++. También se incluye una comparativa con BLAST. El trabajo se completa con una revisión del estado del arte en la comparación de secuencias de ácidos nucleicos y péptidos, con objeto de obtener el grado de similitud entre ellas, tanto desde un punto de vista algorítmico como desde el punto de vista de estudios biológicos en los que se referencian alineamientos de secuencias de gran tamaño

Repositorio de la UNIA

Recommended from our members

Computational analysis of the Caenorhabditis elegans genome sequence.

Author: Jones Steven John Mathias
Publication venue
Publication date: 20/12/1999
Field of study

The genomic sequencing of the model genetic organism, the nematode Caenorhabditis elegans is now essentially complete, representing the first genome sequence to be derived for a multicellular organism. This thesis describes the strategies and software tools that have been utilized in the analysis of the genomic sequence: Preliminary analysis of genomic organisation is also presented. C. elegans chromosomes do not store genetic information in a uniform manner. Gene density varies between different chromosomal regions and between chromosomes. The highly recombinagenic autosomal arms possess more repetitive elements and generally have a lower gene density than the recombinationally suppressed central regions. Although, the gene density within autosomal arms is higher than had been previously expected. A positive correlation is observed between the number of genetically defined loci from a chromosomal region and the expression rate of a region as estimated by the abundance of Expressed Sequence Tags (ESTs). A similar positive correlation is observed with the proportion of genes possessing similarity to rion-nematoda proteins. Chromosomal regions with a high density of gene clusters have fewer genetically derived loci. Demonstrating that redundancy reduces the genetic accessibility of a region towards classical genetic approaches. Introns are larger on the autosomal arms than the central clusters. Exon length shows no correlation with chromosomal position but increases with expression rate. Stop codon preference is also influenced by expression rate. Clusters of similar genes are also found on the C. elegans chromosomes although their distribution is not random. The majority of gene clusters have been determined to lie on chromosome V and the left arm of II. The orientation of the genes within gene clusters suggests that inversion events are common and provide a selective advantage. Alternative splicing has also been studied and the results suggest that many alternative transcripts can be attributed to errors in splice acceptor processing

Open Research Online (The Open University)

Work ow-based systematic design of high throughput genome annotation

Author: Wu Xikun
Wu Xikun
Publication venue: Computing, Imperial College London
Publication date: 01/10/2009
Field of study

The genus Eimeria belongs to the phylum Apicomplexa, which includes many obligate intra-cellular protozoan parasites of man and livestock. E. tenella is one of seven species that infect the domestic chicken and cause the intestinal disease coccidiosis which is economy important for poultry industry. E. tenella is highly pathogenic and is often used as a model species for the Eimeria biology studies. In this PhD thesis, a comprehensive annotation system named as \WAGA" (Workflow-based Automatically Genome Annotation) was built and applied to the E. tenella genome. InforSense KDE, and its BioSense plug-in (products of the InforSense Company), were the core softwares used to build the workflows. Workflows were made by integrating individual bioinformatics tools into a single platform. Each workflow was designed to provide a standalone service for a particular task. Three major workflows were developed based on the genomic resources currently available for E. tenella. These were of ESTs-based gene construction, HMM-based gene prediction and protein-based annotation. Finally, a combining workflow was built to sit above the individual ones to generate a set of automatic annotations using all of the available information. The overall system and its three major components were deployed as web servers that are fully tuneable and reusable for end users. WAGA does not require users to have programming skills or knowledge of the underlying algorithms or mechanisms of its low level components. E. tenella was the target genome here and all the results obtained were displayed by GBrowse. A sample of the results is selected for experimental validation. For evaluation purpose, WAGA was also applied to another Apicomplexa parasite, Plasmodium falciparum, the causative agent of human malaria, which has been extensively annotated. The results obtained were compared with gene predictions of PHAT, a gene finder designed for and used in the P. falciparum genome project

Spiral - Imperial College Digital Repository

Recommended from our members

The regulatory roles of PyrR and Crc in pyrimidine metabolism in Pseudomonas aeruginosa

Author: Patel Monal V.
Publication venue: 'University of North Texas Libraries'
Publication date: 01/08/2001
Field of study

The regulatory gene for pyrimidine biosynthesis has been identified and designated pyrR. The pyrR gene product was purified to homogeneity and found to have a monomeric molecular mass of 19 kDa. The pyrR gene is located directly upstream of the pyrBC' genes in the pyrRBC' operon. Insertional mutagenesis of pyrR led to a 50- 70% decrease in the expression of pyrBC', pyrD, pyrE and pyrF while pyrC was unchanged. This suggests that PyrR is a positive activator. The upstream regions of the pyrD, pyrE and pyrF genes contain a common conserved 9 bp sequence to which the purified PyrR protein is proposed to bind. This consensus sequence is absent in pyrC but is present, as an imperfect inverted repeat separated by 11 bp, within the promoter region of pyrR. Gel retardation assays using upstream DNA fragments proved PyrR binds to the DNA of pyrD, pyrE, pyrF as well as pyrR. This suggests that expression of pyrR is autoregulated; moreover, a stable stem-loop structure was determined in the pyrR promoter region such that the SD sequence and the translation start codon for pyrR is sequestered. β-galactosidase activity from transcriptional pyrR::lacZ fusion assays, showed a two-fold in increase when expressed in a pyrR- strain compared to the isogenic pyrR+ strain. Thus, pyrR is negatively regulated while the other pyr genes (except pyrC) are positively activated by PyrR. That no regulation was seen for pyrC is in keeping with the recent discovery of a second functional pyrC that is not regulated in P. aeruginosa. Gel filtration chromatography shows the PyrR protein exists in a dynamic equilibrium, and it is proposed that PyrR functions as a monomer in activating pyrD, pyrE and pyrF and as a dimeric repressor for pyrR by binding to the inverted repeat. A related study discovered that the catabolite repression control (Crc) protein was indirectly involved in pyr gene regulation, and shown to negatively regulate expression of PyrR at the posttranscriptional level

UNT Digital Library

Internal Stipe Necrosis of Agaricus bisporus - Etiology and Molecular Genetic Studies

Author: Inglis Peter Ward
Publication venue
Publication date: 14/12/1996
Field of study

The button mushroom, Agaricus bisporus is the most popular mushroom in cultivation worldwide, and is the most valuable protected crop in the UK, with an estimated wholesale value exceeding £250 million. In 1991 a new disease emerged in mushroom crops in the UK, called Internal Stipe Necrosis (ISN). Crop losses due to this disease may reach 10 %, since affected mushrooms must be downgraded or discarded. Symptoms take the form of a variable browning reaction in the central region of the mushroom stipe, which may also demonstrate varying degrees of internal collapse. During an exhaustive study of ISN over the past 3 years, it was found that an unusual enteric bacterium was consistently associated with the disease, along with diverse members of the Pseudomonas fluorescens complex, which probably represent secondary colonisers. Several strains of the enteric bacterium reproduced ISN symptoms in trials in which mushrooms were injected with bacteria and in trials where bacteria were sprayed onto otherwise normal mushroom beds. Isolates collected from deliberate infection experiments were shown to be identical to the applied strains by the use of restriction fragment length polymorphism (RFLP) studies, using a cloned 16s rRNA gene isolated from a representative strain of the enteric bacteria. These bacteria therefore appear to satisfy Koch's Postulates as the causative agent of ISN. Conventional biochemical profiles identified the ISN causative agent as Ewingella americana, an unusual species previously unknown in mushrooms or their growing environment. This identification was confirmed by genomic DNA hybridisation using a range of reference strains taxonomically related to and including E. americana. Evidence presented suggests that E. americana produces a single endo-acting chitinase. The significance of this enzyme in ISN pathogenesis is discussed. This 33 kDa enzyme has been purified by hydrophobic interaction chromatography and the encoding gene cloned and expressed in E. coli. Sequence analysis of this gene (designated chiA) revealed an open reading frame of 921 bp, with a deduced peptide size corresponding closely to the size of the purified enzyme. The deduced amino acid sequence was most similar to the chitinase II of Aeromonas sp. No. 10S-24 and, to a lesser extent, the chitinase of Saccharopolyspora erythraeus. Alignment with other chitinases, however, revealed very low homology with the exception of two conserved motifs in the catalytic domain of these enzymes. The E. americana sequence also lacks the chitin binding and Type III fibronectin homology units common to many bacterial chitinases. Deletion of a conserved motif, which has previously been implicated as forming the active site of chitinases, produced a product retaining significant chitinolytic activity. Such evidence may lead to a reappraisal of the significance of this motif in catalysis

Nottingham eTheses