98 research outputs found
Examining the Classification Accuracy of TSVMs with Feature Selection in Comparison with the GLAD Algorithm
Gene expression data sets are used to classify and predict patient diagnostic categories. As we know, it is extremely difficult and expensive to obtain gene expression labelled examples. Moreover, conventional supervised approaches cannot function properly when labelled data (training examples) are insufficient using Support Vector Machines (SVM) algorithms. Therefore, in this paper, we suggest Transductive Support Vector Machines (TSVMs) as semi-supervised learning algorithms, learning with both labelled samples data and unlabelled samples to perform the classification of microarray data. To prune the superfluous genes and samples we used a feature selection method called Recursive Feature Elimination (RFE), which is supposed to enhance the output of classification and avoid the local optimization problem. We examined the classification prediction accuracy of the TSVM-RFE algorithm in comparison with the Genetic Learning Across Datasets (GLAD) algorithm, as both are semi-supervised learning methods. Comparing these two methods, we found that the TSVM-RFE surpassed both a SVM using RFE and GLAD
Unbiased Functional Proteomics Strategy for Protein Kinase Inhibitor Validation and Identification of bona fide Protein Kinase Substrates: Application to Identification of EEF1D as a Substrate for CK2
bS Supporting Informatio
Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach
Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein–protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes
Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics
<p>Abstract</p> <p>Background</p> <p>High-throughput peptide and protein identification technologies have benefited tremendously from strategies based on tandem mass spectrometry (MS/MS) in combination with database searching algorithms. A major problem with existing methods lies within the significant number of false positive and false negative annotations. So far, standard algorithms for protein identification do not use the information gained from separation processes usually involved in peptide analysis, such as retention time information, which are readily available from chromatographic separation of the sample. Identification can thus be improved by comparing measured retention times to predicted retention times. Current prediction models are derived from a set of measured test analytes but they usually require large amounts of training data.</p> <p>Results</p> <p>We introduce a new kernel function which can be applied in combination with support vector machines to a wide range of computational proteomics problems. We show the performance of this new approach by applying it to the prediction of peptide adsorption/elution behavior in strong anion-exchange solid-phase extraction (SAX-SPE) and ion-pair reversed-phase high-performance liquid chromatography (IP-RP-HPLC). Furthermore, the predicted retention times are used to improve spectrum identifications by a <it>p</it>-value-based filtering approach. The approach was tested on a number of different datasets and shows excellent performance while requiring only very small training sets (about 40 peptides instead of thousands). Using the retention time predictor in our retention time filter improves the fraction of correctly identified peptide mass spectra significantly.</p> <p>Conclusion</p> <p>The proposed kernel function is well-suited for the prediction of chromatographic separation in computational proteomics and requires only a limited amount of training data. The performance of this new method is demonstrated by applying it to peptide retention time prediction in IP-RP-HPLC and prediction of peptide sample fractionation in SAX-SPE. Finally, we incorporate the predicted chromatographic behavior in a <it>p</it>-value based filter to improve peptide identifications based on liquid chromatography-tandem mass spectrometry.</p
Cultura de Inovação: Conceitos e Modelos Teóricos
This study portrays the state of the art in scientific literature on the culture of innovation, with the objective of
characterizing its meaning and especially describing different theoretical models that seek to understand how it
occurs in an organizational environment. To enrich the analysis, research results show the relationship between
organizational culture and innovation. The literature review was carried out in 2011 using the following databases:
Coordination for the Improvement of Higher Education Personnel (CAPES), Proquest and Directory of Open
Access Journals (DOAJ). The keywords used were the expression culture of innovation and the joint terms
culture and innovation, only full articles were included in the research. Culture of innovation articles that were
cited in the papers identified in the literature search were also considered. The analysis consisted of 40 articles,
based on the predefined criteria, and showed that this is a topic of interest for researchers in different world regions.
It is a complex theme determined by factors with a systemic character. There is a predominance of quantitative
research and strong evidence of a relationship between organizational culture and innovation, which requires
further research to test the theoretical models proposed by these different authors
Support Vector Machines and Kernels for Computational Biology
ISSN:1553-734XISSN:1553-735
PathFinder: mining signal transduction pathway segments from protein-protein interaction networks
<p>Abstract</p> <p>Background</p> <p>A Signal transduction pathway is the chain of processes by which a cell converts an extracellular signal into a response. In most unicellular organisms, the number of signal transduction pathways influences the number of ways the cell can react and respond to the environment. Discovering signal transduction pathways is an arduous problem, even with the use of systematic genomic, proteomic and metabolomic technologies. These techniques lead to an enormous amount of data and how to interpret and process this data becomes a challenging computational problem.</p> <p>Results</p> <p>In this study we present a new framework for identifying signaling pathways in protein-protein interaction networks. Our goal is to find biologically significant pathway segments in a given interaction network. Currently, protein-protein interaction data has excessive amount of noise, e.g., false positive and false negative interactions. First, we eliminate false positives in the protein-protein interaction network by integrating the network with microarray expression profiles, protein subcellular localization and sequence information. In addition, protein families are used to repair false negative interactions. Then the characteristics of known signal transduction pathways and their functional annotations are extracted in the form of association rules.</p> <p>Conclusion</p> <p>Given a pair of starting and ending proteins, our methodology returns candidate pathway segments between these two proteins with possible missing links (recovered false negatives). In our study, <it>S. cerevisiae </it>(yeast) data is used to demonstrate the effectiveness of our method.</p
Detection and characterization of 3D-signature phosphorylation site motifs and their contribution towards improved phosphorylation site prediction in proteins
<p>Abstract</p> <p>Background</p> <p>Phosphorylation of proteins plays a crucial role in the regulation and activation of metabolic and signaling pathways and constitutes an important target for pharmaceutical intervention. Central to the phosphorylation process is the recognition of specific target sites by protein kinases followed by the covalent attachment of phosphate groups to the amino acids serine, threonine, or tyrosine. The experimental identification as well as computational prediction of phosphorylation sites (P-sites) has proved to be a challenging problem. Computational methods have focused primarily on extracting predictive features from the local, one-dimensional sequence information surrounding phosphorylation sites.</p> <p>Results</p> <p>We characterized the spatial context of phosphorylation sites and assessed its usability for improved phosphorylation site predictions. We identified 750 non-redundant, experimentally verified sites with three-dimensional (3D) structural information available in the protein data bank (PDB) and grouped them according to their respective kinase family. We studied the spatial distribution of amino acids around phosphorserines, phosphothreonines, and phosphotyrosines to extract signature 3D-profiles. Characteristic spatial distributions of amino acid residue types around phosphorylation sites were indeed discernable, especially when kinase-family-specific target sites were analyzed. To test the added value of using spatial information for the computational prediction of phosphorylation sites, Support Vector Machines were applied using both sequence as well as structural information. When compared to sequence-only based prediction methods, a small but consistent performance improvement was obtained when the prediction was informed by 3D-context information.</p> <p>Conclusion</p> <p>While local one-dimensional amino acid sequence information was observed to harbor most of the discriminatory power, spatial context information was identified as relevant for the recognition of kinases and their cognate target sites and can be used for an improved prediction of phosphorylation sites. A web-based service (Phos3D) implementing the developed structure-based P-site prediction method has been made available at <url>http://phos3d.mpimp-golm.mpg.de</url>.</p
Robust computational reconstitution – a new method for the comparative analysis of gene expression in tissues and isolated cell fractions
BACKGROUND: Biological tissues consist of various cell types that differentially contribute to physiological and pathophysiological processes. Determining and analyzing cell type-specific gene expression under diverse conditions is therefore a central aim of biomedical research. The present study compares gene expression profiles in whole tissues and isolated cell fractions purified from these tissues in patients with rheumatoid arthritis and osteoarthritis. RESULTS: The expression profiles of the whole tissues were compared to computationally reconstituted expression profiles that combine the expression profiles of the isolated cell fractions (macrophages, fibroblasts, and non-adherent cells) according to their relative mRNA proportions in the tissue. The mRNA proportions were determined by trimmed robust regression using only the most robustly-expressed genes (1/3 to 1/2 of all measured genes), i.e. those showing the most similar expression in tissue and isolated cell fractions. The relative mRNA proportions were determined using several different chip evaluation methods, among which the MAS 5.0 signal algorithm appeared to be most robust. The computed mRNA proportions agreed well with the cell proportions determined by immunohistochemistry except for a minor number of outliers. Genes that were either regulated (i.e. differentially-expressed in tissue and isolated cell fractions) or robustly-expressed in all patients were identified using different test statistics. CONCLUSION: Robust Computational Reconstitution uses an intermediate number of robustly-expressed genes to estimate the relative mRNA proportions. This avoids both the exclusive dependence on the robust expression of individual, highly cell type-specific marker genes and the bias towards an equal distribution upon inclusion of all genes for computation
- …