1,709 research outputs found
AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system
We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license
MoKCa database - mutations of kinases in cancer
Members of the protein kinase family are amongst the most commonly mutated genes in human cancer, and both mutated and activated protein kinases have proved to be tractable targets for the development of new anticancer therapies The MoKCa database (Mutations of Kinases in Cancer, http://strubiol.icr.ac.uk/extra/mokca) has been developed to structurally and functionally annotate, and where possible predict, the phenotypic consequences of mutations in protein kinases implicated in cancer. Somatic mutation data from tumours and tumour cell lines have been mapped onto the crystal structures of the affected protein domains. Positions of the mutated amino-acids are highlighted on a sequence-based domain pictogram, as well as a 3D-image of the protein structure, and in a molecular graphics package, integrated for interactive viewing. The data associated with each mutation is presented in the Web interface, along with expert annotation of the detailed molecular functional implications of the mutation. Proteins are linked to functional annotation resources and are annotated with structural and functional features such as domains and phosphorylation sites. MoKCa aims to provide assessments available from multiple sources and algorithms for each potential cancer-associated mutation, and present these together in a consistent and coherent fashion to facilitate authoritative annotation by cancer biologists and structural biologists, directly involved in the generation and analysis of new mutational data
Domain discovery method for topological profile searches in protein structures
We describe a method for automated domain discovery for topological profile searches in protein
structures. The method is used in a system TOPStructure for fast prediction of CATH classification
for protein structures (given as PDB files). It is important for profile searches in multi-domain
proteins, for which the profile method by itself tends to perform poorly. We also present an
O(C(n)k +nk2) time algorithm for this problem, compared to the O(C(n)k +(nk)2) time used by
a trivial algorithm (where n is the length of the structure, k is the number of profiles and C(n) is the
time needed to check for a presence of a given motif in a structure of length n). This method has
been developed and is currently used for TOPS representations of protein structures and prediction
of CATH classification, but may be applied to other graph-based representations of protein or RNA
structures and/or other prediction problems. A protein structure prediction system incorporating
the domain discovery method is available at http://bioinf.mii.lu.lv/tops/
Recent improvements to the PROSITE database
The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or profiles. Each signature is linked to documentation that provides useful biological information on the protein family, domain or functional site identified by the signature. The PROSITE web page has been redesigned and several tools have been implemented to help the user discover new conserved regions in their own proteins and to visualize domain arrangements. We also introduced the facility to search PDB with a PROSITE entry or a user's pattern and visualize matched positions on 3D structures. The latest version of PROSITE (release 18.17 of November 30, 2003) contains 1676 entries. The database is accessible at http://www.expasy.org/prosit
A Molecular Biology Database Digest
Computational Biology or Bioinformatics has been defined as the application of mathematical
and Computer Science methods to solving problems in Molecular Biology that require large scale
data, computation, and analysis [18]. As expected, Molecular Biology databases play an essential
role in Computational Biology research and development. This paper introduces into current
Molecular Biology databases, stressing data modeling, data acquisition, data retrieval, and the
integration of Molecular Biology data from different sources. This paper is primarily intended
for an audience of computer scientists with a limited background in Biology
Pattern matching and pattern discovery algorithms for protein topologies
We describe algorithms for pattern matching and pattern
learning in TOPS diagrams (formal descriptions of protein topologies).
These problems can be reduced to checking for subgraph isomorphism
and finding maximal common subgraphs in a restricted class of ordered
graphs. We have developed a subgraph isomorphism algorithm for
ordered graphs, which performs well on the given set of data. The
maximal common subgraph problem then is solved by repeated
subgraph extension and checking for isomorphisms. Despite the
apparent inefficiency such approach gives an algorithm with time
complexity proportional to the number of graphs in the input set and is
still practical on the given set of data. As a result we obtain fast
methods which can be used for building a database of protein
topological motifs, and for the comparison of a given protein of known
secondary structure against a motif database
Molecular mechanisms of the non-coenzyme action of thiamin in brain. Biochemical, structural and pathway analysis
Thiamin (vitamin B1) is a pharmacological agent boosting central metabolism through the action of the coenzyme thiamin diphosphate (ThDP). However, positive effects, including improved cognition,
of high thiamin doses in neurodegeneration may be observed without increased ThDP or ThDPdependent enzymes in brain. Here, we determine protein partners and metabolic pathways where
thiamin acts beyond its coenzyme role. Malate dehydrogenase, glutamate dehydrogenase and pyridoxal kinase were identified as abundant proteins binding to thiamin- or thiazolium-modified
sorbents. Kinetic studies, supported by structural analysis, revealed allosteric regulation of these proteins by thiamin and/or its derivatives. Thiamin triphosphate and adenylated thiamin triphosphate
activate glutamate dehydrogenase. Thiamin and ThDP regulate malate dehydrogenase isoforms and pyridoxal kinase. Thiamin regulation of enzymes related to malate-aspartate shuttle may impact
on malate/citrate exchange, responsible for exporting acetyl residues from mitochondria. Indeed, bioinformatic analyses found an association between thiamin- and thiazolium-binding proteins
and the term acetylation. Our interdisciplinary study shows that thiamin is not only a coenzyme for acetyl-CoA production, but also an allosteric regulator of acetyl-CoA metabolism including
regulatory acetylation of proteins and acetylcholine biosynthesis. Moreover, thiamin action in neurodegeneration may also involve neurodegeneration-related 14-3-3, DJ-1 and β-amyloid precursor
proteins identified among the thiamin- and/or thiazolium-binding proteins
The PROSITE database, its status in 1999
The PROSITE database (http://www.expasy.ch/sprot/prosite.html) consists of biologically significant patterns and profiles formulated in such a way that with appropriate computational tools it can help to determine to which known family of protein (if any) a new sequence belongs, or which known domain(s) it contain
- …