42 research outputs found
Application of regulatory sequence analysis and metabolic network analysis to the interpretation of gene expression data
We present two complementary approaches for the interpretation of clusters of
co-regulated genes, such as those obtained from DNA chips and related methods.
Starting from a cluster of genes with similar expression profiles, two basic
questions can be asked:
1. Which mechanism is responsible for the coordinated transcriptional response
of the genes? This question is approached by extracting motifs that are shared
between the upstream sequences of these genes. The motifs extracted are putative
cis-acting regulatory elements.
2. What is the physiological meaning for the cell to express together these
genes? One way to answer the question is to search for potential metabolic
pathways that could be catalyzed by the products of the genes. This can be
done by selecting the genes from the cluster that code for enzymes, and trying
to assemble the catalyzed reactions to form metabolic pathways.
We present tools to answer these two questions, and we illustrate their use with
selected examples in the yeast Saccharomyces cerevisiae. The tools are available
on the web (http://ucmb.ulb.ac.be/bioinformatics/rsa-tools/;
http://www.ebi.ac.uk/research/pfbp/; http://www.soi.city.ac.uk/~msch/)
Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics
Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process
Infinite-Order Percolation and Giant Fluctuations in a Protein Interaction Network
We investigate a model protein interaction network whose links represent
interactions between individual proteins. This network evolves by the
functional duplication of proteins, supplemented by random link addition to
account for mutations. When link addition is dominant, an infinite-order
percolation transition arises as a function of the addition rate. In the
opposite limit of high duplication rate, the network exhibits giant structural
fluctuations in different realizations. For biologically-relevant growth rates,
the node degree distribution has an algebraic tail with a peculiar rate
dependence for the associated exponent.Comment: 4 pages, 2 figures, 2 column revtex format, to be submitted to PRL 1;
reference added and minor rewording of the first paragraph; Title change and
major reorganization (but no result changes) in response to referee comments;
to be published in PR
Classification of protein interaction sentences via gaussian processes
The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption
Mitochondrial and chloroplast localization of FtsH-like proteins in sugarcane based on their phylogenetic profile
Ranking for Medical Annotation: Investigating Performance, Local Search and Homonymy Recognition
A role for central spindle proteins in cilia structure and function
Cytokinesis and ciliogenesis are fundamental cellular processes that require strict coordination of microtubule organization and directed membrane trafficking. These processes have been intensely studied, but there has been little indication that regulatory machinery might be extensively shared between them. Here, we show that several central spindle/midbody proteins (PRC1, MKLP-1, INCENP, centriolin) also localize in specific patterns at the basal body complex in vertebrate ciliated epithelial cells. Moreover, bioinformatic comparisons of midbody and cilia proteomes reveal a highly significant degree of overlap. Finally, we used temperature-sensitive alleles of PRC1/spd-1 and MKLP-1/zen-4 in C. elegans to assess ciliary functions while bypassing these proteins' early role in cell division. These mutants displayed defects in both cilia function and cilia morphology. Together, these data suggest the conserved reuse of a surprisingly large number of proteins in the cytokinetic apparatus and in cilia
Finding all common intervals of k permutations
1 Introduction Let \Pi = (ss1; : : : ; ssk) be a family of k permutations of N = f1; 2; : : : ; ng. A k-tuple of intervals of these permutations consisting of the same set of elements is called a common interval
Gene fusion in Helicobacter pylori: Making the ends meet
10.1007/s10482-005-9021-2Antonie van Leeuwenhoek, International Journal of General and Molecular Microbiology891169-18