485 research outputs found
Systematic Characterisation of Cellular Localisation and Expression Profiles of Proteins Containing MHC Ligands
Presentation of peptides on Major Histocompatibility Complex (MHC) molecules is the cornerstone in immune system activation and increased knowledge of the characteristics of MHC ligands and their source proteins is highly desirable.In the present large-scale study, we used a large data set of proteins containing experimentally identified MHC class I or II ligands and examined the proteins according to their expression profiles at the mRNA level and their Gene Ontology (GO) classification within the cellular component ontology. Proteins encoded by highly abundant mRNA were found to be much more likely to be the source of MHC ligands. Of the 2.5% most abundant mRNAs as much as 41% of the proteins encoded by these mRNAs contained MHC class I ligands. For proteins containing MHC class II ligands, the corresponding percentage was 11%. Furthermore, we found that most proteins containing MHC class I ligands were localised to the intracellular parts of the cell including the cytoplasm and nucleus. MHC class II ligand donors were, on the other hand, mostly membrane proteins.The results contribute to the ongoing debate concerning the nature of MHC ligand-containing proteins and can be used to extend the existing methods for MHC ligand predictions by including the source protein's localisation and expression profile. Improving the current methods is important in the growing quest for epitopes that can be used for vaccine or diagnostic purposes, especially when it comes to large DNA viruses and cancer
Comparative Methods for Gene Structure Prediction in Homologous Sequences
The increasing number of sequenced genomes motivates the use of evolutionary patterns to detect genes. We present a series of comparative methods for gene finding in homologous prokaryotic or eukaryotic sequences. Based on a model of legal genes and a similarity measure between genes, we find the pair of legal genes of maximum similarity. We develop methods based on genes models and alignment based similarity measures of increasing complexity, which take into account many details of real gene structures, e.g. the similarity of the proteins encoded by the exons. When using a similarity measure based on an exiting alignment, the methods run in linear time. When integrating the alignment and prediction process which allows for more fine grained similarity measures, the methods run in quadratic time. We evaluate the methods in a series of experiments on synthetic and real sequence data, which show that all methods are competitive but that taking the similarity of the encoded proteins into account really boost the performance
Predicting Secondary Structures, Contact Numbers, and Residue-wise Contact Orders of Native Protein Structure from Amino Acid Sequence by Critical Random Networks
Prediction of one-dimensional protein structures such as secondary structures
and contact numbers is useful for the three-dimensional structure prediction
and important for the understanding of sequence-structure relationship. Here we
present a new machine-learning method, critical random networks (CRNs), for
predicting one-dimensional structures, and apply it, with position-specific
scoring matrices, to the prediction of secondary structures (SS), contact
numbers (CN), and residue-wise contact orders (RWCO). The present method
achieves, on average, accuracy of 77.8% for SS, correlation coefficients
of 0.726 and 0.601 for CN and RWCO, respectively. The accuracy of the SS
prediction is comparable to other state-of-the-art methods, and that of the CN
prediction is a significant improvement over previous methods. We give a
detailed formulation of critical random networks-based prediction scheme, and
examine the context-dependence of prediction accuracies. In order to study the
nonlinear and multi-body effects, we compare the CRNs-based method with a
purely linear method based on position-specific scoring matrices. Although not
superior to the CRNs-based method, the surprisingly good accuracy achieved by
the linear method highlights the difficulty in extracting structural features
of higher order from amino acid sequence beyond that provided by the
position-specific scoring matrices.Comment: 20 pages, 1 figure, 5 tables; minor revision; accepted for
publication in BIOPHYSIC
Organizing research data
Research relies on ever larger amounts of data from experiments, automated production equipment, questionnaries, times series such as weather records, and so on. A major task in science is to combine, process and analyse such data to obtain evidence of patterns and correlations
Cyclebase.org: version 2.0, an updated comprehensive, multi-species repository of cell cycle experiments and derived analysis results
Cell division involves a complex series of events orchestrated by thousands of molecules. To study this process, researchers have employed mRNA expression profiling of synchronously growing cell cultures progressing through the cell cycle. These experiments, which have been carried out in several organisms, are not easy to access, combine and evaluate. Complicating factors include variation in interdivision time between experiments and differences in relative duration of each cell-cycle phase across organisms. To address these problems, we created Cyclebase, an online resource of cell-cycle-related experiments. This database provides an easy-to-use web interface that facilitates visualization and download of genome-wide cell-cycle data and analysis results. Data from different experiments are normalized to a common timescale and are complimented with key cell-cycle information and derived analysis results. In Cyclebase version 2.0, we have updated the entire database to reflect changes to genome annotations, included information on cyclin-dependent kinase (CDK) substrates, predicted degradation signals and loss-of-function phenotypes from genome-wide screens. The web interface has been improved and provides a single, gene-centric graph summarizing the available cell-cycle experiments. Finally, key information and links to orthologous and paralogous genes are now included to further facilitate comparison of cell-cycle regulation across species. Cyclebase version 2.0 is available at http://www.cyclebase.org
Sequence-based feature prediction and annotation of proteins
The combination of prediction tools in complex workflows and pipelines facilitates prediction of protein features from sequence
Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts
Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks
- …